@movq@www.uninformativ.de Right now I’m basically just blocking entire ASN(s) at this point and large blocks of IP(s) from Anthropic, OPenAI, Microsoft and others.
@movq@www.uninformativ.de Haha 🤣 Figures 🤦♂️ Also no need to be concerned with that here, I’ve personally blocked the ASN(s) of Microsoft, OpenAI, Claude and Google 😂
It would appear that Google’s web crawlers are ignoring the robots.txt
that I have on https://git.mills.io/robots.txt with content:
User-agent: *
Disallow: /
Evidence attached (see screenshots):
– I think its the the Small Web community band together and file a class action suit(s) against Microsoft.com Google.com and any other assholes out there (OpenAI?) that violate our rights and ignore requests to be “polite” on the web. Thoughts? 💭
reviewing logs this morning and found i have been spammed hard by bots not respecting the robots.txt
file. only noticed it because the OpenAI bot was hitting me with a lot of nonsensical requests. here is the list from last month:
- (810) bingbot
- (641) Googlebot
- (624) http://www.google.com/bot.html
- (545) DotBot
- (290) GPTBot
- (106) SemrushBot
- (84) AhrefsBot
- (62) MJ12bot
- (60) BLEXBot
- (55) wpbot
- (37) Amazonbot
- (28) YandexBot
- (22) ClaudeBot
- (19) AwarioBot
- (14) https://domainsbot.com/pandalytics
- (9) https://serpstatbot.com
- (6) t3versionsBot
- (6) archive.org_bot
- (6) Applebot
- (5) http://search.msn.com/msnbot.htm
- (4) http://www.googlebot.com/bot.html
- (4) Googlebot-Mobile
- (4) DuckDuckGo-Favicons-Bot
- (3) https://turnitin.com/robot/crawlerinfo.html
- (3) YandexNews
- (3) ImagesiftBot
- (2) Qwantify-prod
- (1) http://www.google.com/adsbot.html
- (1) http://gais.cs.ccu.edu.tw/robot.php
- (1) YaK
- (1) WBSearchBot
- (1) DataForSeoBot
i have placed some middleware to reject these for now but it is not a full proof solution.
OpenAI Says It Has Evidence DeepSeek Used Its Model To Train Competitor
OpenAI says it has evidence suggesting Chinese AI startup DeepSeek used its proprietary models to train a competing open-source system through “distillation,” a technique where smaller models learn from larger ones’ outputs.
The San Francisco-based company, along with partner Microsoft, blocked suspected DeepSeek accounts from accessing … ⌘ Read more
Fuck me OpenAI sucks ass. ChatGPT has to be the most stupidest fucking thing ever invented. It is so bad it’s not even funny.
@prologic@twtxt.net the new product was GPTs. A way to create tailored bots for specific use cases. https://openai.com/blog/introducing-gpts (fun fact: I did an internal hackathon where we made something like this for $work onboarding. And I won a prize!)
The competed project is poe https://quorablog.quora.com/Introducing-creator-monetization-for-Poe which is basically the same idea. Make a AI bot tailored to a specific domain of knowledge. And monitize it.
The timing fits very well as openAI announced it just a few weeks ago.
@prologic@twtxt.net the going theory is that openAI announced a new product that pretty much blew up the project of one of the board members. So that board member got 3 others to vote to fire Sam.
wtf is going on with Microsoft and OpenAI of late?! LIke Microsoft bought into OpenAI for some shocking $10bn USD, then Sam Altman gor fired, now he’s been hired by Microsoft to run up a new “AI” division. wtf/! seriously?! 🤔 #Microsoft #OpenAI #Scandal
@prologic@twtxt.net The hackathon project that I did recently used openai and embedded the response info into the prompt. So basically i would search for the top 3 most relevant search results to feed into the prompt and the AI would summarize to answer their question.
Most of the can run locally have such a small training set they arnt worth it. Are more like the Markov chains from the subreddit simulator days.
There is one called orca that seems promising that will be released as OSS soon. Its running at comparable numbers to OpenAI 3.5.
ChatGPT is good, but it’s not that good 🤣 I asked it to write a program in Go that performs double ratcheting and well the code is total garbage 😅 – Its only as good as the inputs it was trained on 🤣 #OpenAI #GPT3
The sample they chose to highlight here resembles the kind of paper a 14 year old would try & fail to bullshit after staying up all night partying right before the due date: https://blog.openai.com/better-language-models/#sample1
OpenAI Trains Language Model, Mass Hysteria Ensues – Approximately Correct http://approximatelycorrect.com/2019/02/17/openai-trains-language-model-mass-hysteria-ensues/
It looks like OpenAI has announced their marginal progress on the coherence problem in narrative prose generation in the most clickbaity possible way again: https://www.bbc.com/news/technology-47249163 https://www.theguardian.com/technology/2019/feb/14/elon-musk-backed-ai-writes-convincing-news-fiction
Discovering Types for Entity Disambiguation https://blog.openai.com/discovering-types-for-entity-disambiguation/