txt.sour.is sdk@codevoid.de "I have trouble with a web crawler using the TOR network. It's misusing the gopher proxy on my page. I don't want to disable/block tor (that woul ..."

Sun, Nov 24 15:52 2019 (6y ago)

I have trouble with a web crawler using the TOR network. It’s misusing the gopher proxy on my page. I don’t want to disable/block tor (that would be the easy way out). It’s permanently changing user agents and ignoring robots.txt. It ignores HTTP status codes. I’m currently serving it 4MB binary garbage in form of Link. It sucked in about 40GB of data now, but it doesn’t explode and keeps crawling. Any other idea about what to do with it?

⤋ Read More