@movq@www.uninformativ.de I’d guess the same goes for all twtxt.social feeds… I can’t see bender’s archived twts either, didn’t check for the others.
@prologic@twtxt.net Yeah, but I reckon we can kill both birds with one stone. If we change it to support edits, it should be fairly easy to also tweak it to support feed URL changes. Like outlined in my first reply: https://twtxt.net/twt/n4omfvq The URL part sounds way easier to me. :-)
This is how my original message shows up on jenny
:
From: quark <quark>
Subject: (#o) @prologic this was your first twtxt. Cool! :-P
Date: Mon, 16 Sep 2024 12:42:27 -0400
Message-Id: <k7imvia@twtxt>
X-twtxt-feed-url: https://ferengi.one/twtxt.txt
(#o) @<prologic https://twtxt.net/user/prologic/twtxt.txt> this was your first twtxt. Cool! :-P
@sorenpeter@darch.dk There was or maybe still is a competing proposal for multiline twts that combines all twts with the same timestamp to one logical multiline twt. Not sure what happened to that, if it is used in the wild and whether anyone “here” follows a feed with that convention. “Our” solution for multiline twts is to use U+2028 Unicode LINE SEPARATOR as a newline: https://dev.twtxt.net/doc/multilineextension.html.
Hmm… I replied to this message:
From: prologic <prologic>
Subject: Hello World! 😊
Date: Sat, 18 Jul 2020 08:39:52 -0400
Message-Id: <o6dsrga>
X-twtxt-feed-url: https://twtxt.net/user/prologic/twtxt.txt
Hello World! 😊
And see how the hash shows… Is it because that hash isn’t longer used?
@prologic@twtxt.net this was your first twtxt. Cool! :-P
The bug in jenny that @aelaraji@aelaraji.com found:
Jenny has to look for the metadata fields, it must find the # prev = ...
line. To do so, I naively wrote something along these lines:
for line in content.splitlines():
if line.startswith('# prev = '):
...
Problem is, we use \u2028 a lot in twtxt feeds and Python interprets those as line separators as well. That’s not what we want here. Jenny must only split at a \n
.
Now @prologic@twtxt.net had a quote/copy of some of his metadata fields in a twt. Like so:
# prev = foo bar
Perfectly legitimate, but now jenny found the # prev =
twice (once in the actual header, once in a twt), didn’t know what to do, and thus did not fetch the archived feeds. 🤦
Should be fixed in this commit: https://www.uninformativ.de/git/jenny/commit/6e8ce5afdabd5eac22eae4275407b3bd2a167daf.html
@movq@www.uninformativ.de What’s you definition of “complete thread”? ;-) There might be feeds participating in the conversation that you have no idea of.
But yes, this has a nice discoverability bonus. And even simpler than a hash, that’s right.
@movq@www.uninformativ.de use @xuu@txt.sour.is pod as default instead, as he keeps the cache as long as I used to keep mine when I ran Yarn. @prologic@twtxt.net’s pod expires then way too soon.
@movq@www.uninformativ.de Yeah, I think so.
This is a bug in jenny. 🤦
@prologic@twtxt.net Oh so that’s how it works? The front page only shows the latest twt of each feed? 🤔
No, something is fishy. It didn’t fetch @prologic@twtxt.net’s archived feeds and now only 969 of his twts are in my maildir. 🤔
@aelaraji@aelaraji.com Yep, I just tried. It’s not that easy to verify, though. 😅 It looks fine to me. The number of twts in the maildir has gone down from 61759 to 34787 – but that’s probably because I unfollowed lots of (presumably dead) feeds in the last few weeks. 🥴
@movq@www.uninformativ.de I wiped both ~/.cache/jenny
and my maildir_target
when I tried to reset things. Still got wrecked 😅
If it’s not too much to ask, could you backup or/change your maildir_target
and give it a try with an empty directory?
@aelaraji@aelaraji.com What was going on here? 🥴 Wiping the maildir and ~/.cache/jenny
should reset everything, it doesn’t store any other state. 🤔
PS: I still can’t get your and bender’s archived twts (at least the ones I’ve noticed), nor can I --fetch-context
on replays to them. your oldest is the one from 2024-06-14 18:22
… I can see lyse’s tho! but I doubt this is related the edit issue but this helps with something.
@prologic@twtxt.net I can’t pinpoint the exact cause but here are a couple of symptoms I observed:
- It all started with a LOT of his old twts starting back in 2020 showing in a weird way, some were empty others were duplicates and a lot more got marked for deletion by neomutt with the
D
tag.
- After trying to restart things with a fresh Maildir, I couldn’t fetch a lot of twts, even mine which was a replay to one of his. but then I was able to after temporarily deleting his link from my follow file.
then @quark@ferengi.one and @bender@twtxt.net pointed out the inconsistent from: + feed url and the twt edit
@movq@www.uninformativ.de we can shorten it by six characters, with (r:https://...)
. 😅
(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
I think I like this a lot. 🤔
The problem with using hashes always was that they’re “one-directional”: You can construct a hash from URL + timestamp + twt, but you cannot do the inverse. When I see #weadxga
, I have no idea what that could possibly refer to.
But of course something like (replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
has all the information you need. This could simplify twt/feed discovery quite a bit, couldn’t it? 🤔 That thing that I just implemented – jenny asking some Yarn pod for some twt hash – would not be necessary anymore. Clients could easily and automatically fetch complete threads instead of requiring the user to follow all relevant feeds.
Only using the timestamp to identify a twt also solves the edit problem.
It even is better for non-Yarn clients, because you now don’t have to read, understand, and implement a “twt hash specification” before you can reply to someone.
The only problem, really, is that (replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
is so long. Clients would have to try harder to hide this. 😅
@quark@ferengi.one Meh I lost the plot ages ago 🤣
@prologic@twtxt.net I am going to light some candles this weekend to “La Virgen de Macarena” to make it happen! :-D
@prologic@twtxt.net you need to catch up with my twtxts, mate. :-P
--fetch-context
thingy: It can now ask Yarn pods for twt hashes.
@movq@www.uninformativ.de Bah you’re right, that’s a mistake and easily fixed 😅
@movq@www.uninformativ.de I tend to agree too, I think the focus should be on fixing and supporting Edits first 👌
@quark@ferengi.one We will fix this soon™ 🔜
@aelaraji@aelaraji.com So what is it about @sorenpeter@darch.dk’s feed that’s screwed with your client? (Jenny?) 🤔 Kind of curious now 🤣
@aelaraji@aelaraji.com Yes, according to the spec we wrote for Archived Extension:
The second value of prev is a name relative to the base directory of the feed’s URL in url (more specifically, in the URL that the client used to retrieve the feed). In the example above, prev would evaluate to the full URL https://example.com/twtxt-2021-10-18.txt for HTTPS and gopher://example.com/0/twtxt-2021-10-18.txt for Gopher.
@prologic@twtxt.net by the way and just in case… is the metadata in tour twtxt.txt file, pointing at your rotated feed files formatted as prev = hash twtxt.txt/n
instead of a link by design? I couldn’t fetch any, nor can I do a –fetch-context on replays to your old twts.
@lyse@lyse.isobeef.org I think I’m with you on this. 🤔 I mean, it’s a cool and interesting topic, but it also adds lots of overhead. (And I’m not yet convinced that we actually need it. People don’t change URLs on a daily basis (but they do edit twts all the time).)
--fetch-context
thingy: It can now ask Yarn pods for twt hashes.
@quark@ferengi.one Yep, it’s a list, you can define several pods.
@prologic@twtxt.net Oh, interesting. It doesn’t serve JSON, though, does it? curl -s -H 'Accept: application/json' https://search.twtxt.net/twt/j7f652q
gets me an HTML page. 🤔
@aelaraji@aelaraji.com grats! See how much trouble an edited twtxt can cause? Wish there was a simpler solution. Alas, I don’t have much hope.
Done and done! everything is back to normal! 🥳
@aelaraji@aelaraji.com LOL 😂
FIX: Temporarily removed sorenpeter’s twtxt link from my follow list, whipped my twtxt Maildir and jenny Cache. Only then I was able to fetch everything as usual (I think). Now I’ll backup things and see what happens if I pull sorenpeter’s feed.
No keyboards were harmed during this experiment… yet.
--fetch-context
thingy: It can now ask Yarn pods for twt hashes.
@quark@ferengi.one It would also be possible to use the search engine here too I think 🤔 i.e: https://search.twtxt.net
--fetch-context
thingy: It can now ask Yarn pods for twt hashes.
@quark@ferengi.one Looks like that would work according to the patch I just read 👌
These then become useful in filters like what you see here:
It’s useful to know however that such feeds are actually marked as type=rss
(e.g: https://feeds.twtxt.net/slashdot/twtxt.txt), just as feeds like @tiktok@feeds.twtxt.net are marked as type=bot
@aelaraji@aelaraji.com Ahh that’s interesting! 🧐 One of my original goals when I started out building Yarn.social was to also be a source of news, blogs, and whatever else that could be roughly/easily translated into a Twtxt feed. I’m not sure if others do something similar, but that’s why I built feeds.twtxt.net, which consumes RSS/Atom and produces Twtxt feeds.
My only desire one day is to build a “Feed Builder” of sorts that allows one to say, for example, construct a Slashdot feed but without AI hype, or as another example, a BBC/ABC feed that’s a digest once or twice per day.
@prologic@twtxt.net Nah! I don’t do news feeds 🤣 I gave some a try back then but it was just way too much noise. I have a separate app for RSS feeds I want to follow. None of them mention AI except for one article about the author’s fight back against the crawlers, I believe I’ve mentioned it before.
@bender@twtxt.net Ack 👌
@aelaraji@aelaraji.com Good man 🤣 I keep getting this bloody AI hype from various news feeds I subscribe to via Twtxt like Slashdot cough 🤦♂️
The wiered thing is Twtxt fetches everything just fine (I think) except for not having the convenience of having replays grouped into threads.
--fetch-context
thingy: It can now ask Yarn pods for twt hashes.
@movq@www.uninformativ.de I can have more than one Yarn, correct? Like:
"yarn_pods_for_discovery": ["https://twtxt.net", "https://txt.sour.is"],
Tangential, @prologic@twtxt.net, mentioning is still broken in Yarn. See parent, @aelaraji@aelaraji.com is not linked (probably the mention on this twtxt will also be not linked).
Namely, the numbered list was wrong on the original twtxt, and the closing back ticks on the numbered list items were also wrong.