It’s impossible, i got this instance to just see lemmy from my own instance, but no, it was slow as hell the whole week, i got new pods, put postgres on a different pod, pictrs on another, etc.
But it was slow as hell. I didn’t know what it was until a few hours before now. 500 GETs in a MINUTE by ClaudeBot and GPTBot, wth is this? why? I blocked the user agents, etc, using a blocking extension on NGINX and now it works.
WHY? So google can say that you should eat glass?
Life is now hell, if before at least someone could upload a website, now even that is painfull.
Sorry for the rant.
You can enable
Private Instance
in your admin settings, this will mean only logged in users can see content. This will prevent AI scrapers from slowing down your instance as all they’ll see is an empty homepage, so no DB calls. As long as you’re on 0.19.11, federation will still work.Enabled, thanks for the tip!
At some point they’re going to try to evade detection to continue scraping the web. The cat and mouse game continues except now the “pirates” are big tech.
They already do. (“They” meaning AI generally, I don’t know about Claude or ChatGPT’s bots specifically). There are a number of tools server admins can use to help deal with this.
See also:
these solutions have the side effect of making the bots stay on your site longer and generate more traffic. it’s not for everyone.
Patience, AI crash bubble burst will be soon.
🤞
deleted by creator
Article for whoever was unaware like me.
Use Anubis. That’s pretty much the only thing you can do against bots that they have no way of circumventing.
So I just had a look at your robots.txt:
User-Agent: * Disallow: /login Disallow: /login_reset Disallow: /settings Disallow: /create_community Disallow: /create_post Disallow: /create_private_message Disallow: /inbox Disallow: /setup Disallow: /admin Disallow: /password_change Disallow: /search/ Disallow: /modlog Crawl-delay: 60
You explicitly allow searching your content by bots… That’s likely one of the reasons why you get bot traffic.
AI crawlers ignore robots.txt. The only way to get them to stop is with active counter measures.
I highly recommend using Anubis as a proxy for your entire instance. It’s a little complicated to get going, but it stops any and all AI scrapers with a denial of access. Having a robots.txt works, but only so much, because some of these bots do not respect it. And, honestly, with the way Sam Altman talks about the people he’s stolen and scraped from, I don’t think anyone should be surprised.
But, I have Anubis running on my personal website and I’ve tested to see if ChatGPT can see it, and it cannot. Good enough for me
You can either use Cloudflare(proprietary) or anubis (Foss)
Don’t do this
Why?
Because it harms marginalized folks’ ability to access content while also letting evil corp (and their fascist government) view (and modify) all encrypted communication with your site and its users.
It’s bad.
For clarity, you are referring to Cloudflare and not anaubis?
I am referring to cf, but I would expect anaubis would be the same if it provides DoS fronting
Anubis work in a very different way than cloudflare