Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?
Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?
I am currently watching several malicious crawlers be stuck in a 404 hole I created. Check it out yourself at https://drkt.eu/asdfasd
I respond to all 404s with a 200 and then serve them that page full of juicy bot targets. A lot of bots can’t get out of it and I’m hoping that the driveby bots that look for login pages simply mark it (because it responded with 200 instead of 404) so a real human has to go and check and waste their time.
This is pretty slick, but doesn’t this just mean the bots hammer your server looping forever? How much processing do you do of those forms for example?
Best is to redirect them to a 1TB file served by hetzner’s cache. There’s some nginx configs that do this
Yes
None
It costs me nothing to have bots spending bandwidth on me because I’m not on a metered connection and electricity is cheap enough that the tiny overhead of processing their requests might amount to a dollar or two per year.
That’s pretty neat. Thanks!