Anyone got a link to a robots.txt that “blocks” all the “AI” stuff?
… or maybe I should do this based on allowlisting rather than blocklisting. 🤔 Only allow a couple of bots that I think are fine …
@movq@www.uninformativ.de Only found 3 results for “robotst.xt” and OpenAI 😢 I seem to recall an effort (I cannot find) to build a standard for AI Crawlers similar to robots.txt
@prologic@twtxt.net Ahhh, I right, now I remember. That ai.txt
boils down to this, I guess:
User-Agent: *
Disallow: /
@aelaraji@aelaraji.com Yeah, there is no guarantee with any of these things, it can all be faked or ignored. 🫤 I’m still going to do it in the hopes that some of those bots respect it.
@aelaraji@aelaraji.com Hmmm looks like the core idea is to intercept requests, Inspect the UserAgent
header and respond accordingly.
Can we trust the bots not to fake their identity? 🤔
@aelaraji@aelaraji.com @prologic@twtxt.net Hmm, yeah, looks a bit better than ai.txt
/ robots.txt
, but I wouldn’t trust that they don’t spoof their user agent. 🤔
@movq@www.uninformativ.de me neither 🤦♂️