about:compat
in Firefox.
@movq@www.uninformativ.de Wow, I use Firefox and didnāt realize this existed! Thanks for pointing it out. I noticed at least one bug cited a webcompat.com report; I wonder if someone at Mozilla monitors those. https://webcompat.com/issues?page=1&per_page=50&state=open&stage=all&sort=created&direction=desc
@javivf@adn.org.es Welcome to twtxt!
@lyse@lyse.isobeef.org Thanks for taking a look, and for pointing out the mixture of tabs and spaces.
I think Iāll leave reachability.c alone, since my intention there was to use an indent level of one tab, and the spaces are just there to line up a few extra things. I fixed reachability_with_stack.cc though.
@prologic@twtxt.net Those arenāt actually serving anything public-facing. Iāve thought about it, but for now Iām sticking with VPSs, partly because I donāt relish the risk of weeks of downtime if something goes wrong while Iām travelling.
Okay, I wonāt park there.
@lyse@lyse.isobeef.org I donāt remember exactly. They might have been growing all winter. The trick is to have a badly insulated extension to the house.
@lyse@lyse.isobeef.org I am a big fan of āobviousā math facts that turn out to be wrong. If you want to understand how reusing space actually works, you are mostly stuck reading complexity theory papers right now. Ian wrote a good survey: https://iuuk.mff.cuni.cz/~iwmertz/papers/m23.reusing_space.pdf . Itās written for complexity theorists, but some of will make sense to programmers comfortable with math. Alternatively, I wrote an essay a few years ago explaining one technique, with (math-loving) programmers as the intended audience: https://www.falsifian.org/blog/2021/06/04/catalytic/ .
@lyse@lyse.isobeef.org Still melting!
Iām in an article in Quanta Magazine! Itās about the bizarre world of algorithms that re-use memory thatās already full. https://www.quantamagazine.org/catalytic-computing-taps-the-full-power-of-a-full-hard-drive-20250218/ Iām the one with all the snow in the background.
@sorenpeter@darch.dk Sorry, I realized that shortly after posting. Hereās another attempt to post the images:
Some satisfying icicle-breaking in our backyard: photos.falsifian.org/video/sM7G3vfS6yuc/VID_20250217_203250.mp4
I couldnāt resist taking home a prize:
Itās been snowy here in #Toronto.
(I tried formatting the images in markdown for the benefit of yarn and any other clients that understand it.)
robots.txt
that I have on https://git.mills.io/robots.txt with content:
@prologic@twtxt.net Have you tried Googleās robots.txt report? https://support.google.com/webmasters/answer/6062598?hl=en . I would expect Google to be pretty good about this sort of thing. If you have the energy to dig into it and, for example, post on support.google.com, Iād be curious to hear what you find out.
@eapl.me@eapl.me I like this idea. Another option would be to show a limited number of posts, with an option to see the omitted ones by user. Either way, I wonder how well that works with threading.
I have a paper deadline coming up, so will everyone please stop writing twts for the next 48 hours, thanks.
@lyse@lyse.isobeef.org Thanks for sharing. I really enjoyed it. The beginning part about the history of life on Earth was fun to watch having just read Dawkinās old book The Selfish Geene, and now I want to read more about archaea. The end of the talk about what might be going on on Mars made me a bit hopeful someone will find some good evidence.
@jost@jost.sdfeu.org Hello jost!
It turns out my ISP supports ipv6. After 4-5 months with only ipv4, I thought to ask customer support, and they told me how to turn it on. (Iām pretty happy with ebox so far. Low-priced fibre with no issues so far. Though all my traffic goes through Montreal, 500km away from me in Toronto, which adds a few ms to network latency.)
@andros@twtxt.andros.dev Sorry I missed your messages to #twtxt on IRC. There are people there, but it can take several hours to get a response. E.g. I check it every day or two. I recommend using an IRC bouncer. To answer your question about registries, I used a couple of registries when I first started out, to try to find feeds to follow, but havenāt since then. I donāt remember which ones, but they were easy to find with web searches.
@movq@www.uninformativ.de Looks fun. Also kind of looks like APL and Forth had a baby on Jupyter.
@andros@twtxt.andros.dev Hello!
@kh1b@kh1b.org Welcome to twtxt!
@wbknl@twtxt.net Welcome to the twtxt-iverse!
@lyse@lyse.isobeef.org Beautiful pictures, and beautiful HTML for a photo album!
@prologic@twtxt.net Iām grateful for this accident. I find browsing twtxt.net useful even though I donāt have an account there. I do it when I canāt use Jenny because I only have my phone, or if I want to see messages I might have missed. I know itās not guaranteed to catch everything, but itās pretty good, even if itās not intentional.
@Codebuzz@www.codebuzz.nl I use Jenny to add to a local copy of my twtxt.txt file, and then manually push it to my web servers. I prefer timestamps to end with āZā rather than ā+00:00ā so I modified Jenny to use that format. I mostly follow conversations using Jenny, but sometimes I check twtxt.net, which could catch twts I missed.
1/4
to mean "first out of four".
@bender@twtxt.net I try to avoid editing. I guess I would write 5/4, 6/4, etc, and hopefully my audience would be sympathetic to my failing.
Anyway, I donāt think my eccentric decision to number my twts in the style of other social media platforms is the only context where someone might write Ā¼ not meaning a quarter. E.g. January 4, to Americans.
Iām happy to keep overthinking this for as long as you are :-P
@bender@twtxt.net @prologic@twtxt.net Iām not exactly asking yarnd to change. If you are okay with the way it displayed my twts, then by all means, leave it as is. I hope you wonāt mind if I continue to write things like 1/4
to mean āfirst out of fourā.
What has text/markdown
got to do with this? I donāt think Markdown says anything about replacing 1/4
with Ā¼, or other similar transformations. Itās not needed, because Ā¼ is already a unicode character that can simply be directly inserted into the text file.
Whatās wrong with my original suggestion of doing the transformation before the text hits the twtxt.txt file? @prologic@twtxt.net, I think it would achieve what you are trying to achieve with this content-type thing: if someone writes 1/4
on a yarnd instance or any other client that wants to do this, it would get transformed, and other clients simply wouldnāt do the transformation. Every client that supports displaying unicode characters, including Jenny, would then display Ā¼ as Ā¼.
Alternatively, if you prefer yarnd to pretty-print all twts nicely, even ones from simpler clients, thatās fine too and you donāt need to change anything. My 1/4
-> Ā¼ thing is nothing more than a minor irritation which probably isnāt worth overthinking.
I installed GrapheneOS for the first time on Wednesday last week on a used Pixel 7a, and Iām impressed. Installation was almost seamless, and I was able to do it from another Android phone. Iāve run into very few wrinkles, even using Googleās proprietary apps with GrapheneOSās āsandboxedā version of Google Play Services. The main problems Iāve noticed: I canāt cast, and Google Timeline doesnāt seem to work (though I imagine the intersection between people keen to use GrapheneOS and keen to have Google log their location history is pretty small).
@prologic@twtxt.net Iām not a yarnd user, so it doesnāt matter a whole lot to me, but FWIW Iām not especially keen on changing how I format my twts to work around yarndās quirks.
I wonder if this kind of postprocessing would fit better between composing (via yarndās UI) and publishing. So, if a yarnd user types Ā¼, it could get changed to Ā¼ in the twtxt.txt file for everyone to see, not just people reading through yarnd. But when I type Ā¼, meaning first out of four, as a non-yarnd user, the meaning wouldnāt get corrupted. I can always type Ā¼ directly if thatās what I really intend.
(This twt might be easier to understand if you read it without any transformations :-P)
Anyway, again, Iām not a yarnd user, so do what you will, just know you might not be seeing exactly what I meant.
@prologic@twtxt.net I wrote Ā¼ (one slash four) by which I meant āthe first out of fourā. twtxt.net is showing it as Ā¼, a single character that IMO doesnāt have that same meaning (it means 0.25). Similarly, Ā¾ got replaced with Ā¾ in another twt. Itās not a big deal. It just looks a little wrong, especially beside the 2/4 and 4/4 in my other two twts.
@prologic@twtxt.net One could argue twtxt.netās display formatting is a little over-eager here.
Inversion by Aric McBay was another random library pick. Like The Fall of Io, itās the most recent in a series, though I think this series is pretty loosely connected. In contrast, the villain in this book is simple and cartoonishly evil. The book presents a design for utopia which was interesting but a little cloying. Iām not sure if Iām supposed to want to live there, but I donāt think I do. I enjoyed the book as easy reading, and might try the others in the series some time. (4/4)
I read Starter Villain by John Scalzi. Enjoyable, like his other books that Iāve read. Somewhat sillier. (Ā¾)
Iām enjoying Wesley Chuās Tao and Io series. Spies, action, ancient aliens. Some funny parts, some interesting world-building parts, some action-filled parts. I picked up The Fall of Io at random from a library a few weeks ago, and it turned out to be the last in a series of six (technically two series), so after finishing that I read the first and am partway through the second. Usually I try to read series in order, but this way is interesting. One thing I liked about The Fall of Io was that it it followed many points of view with somewhat conflicting interests, some more evil than others, and I felt sympathy for most of them. (I was kind of hoping it would be about Jupiterās moon Io, but it wasnāt, but Iām satisfied with what I ended up with.) (2/4)
@prologic@twtxt.net I think printf is a more portable option than echo -e for interpreting \t as tab. E.g. printf ā%s\t%s\t%sā ā$urlā ā$timeā ā$textā. In general I always prefer printf over echo for anything non-trivial in unix shell scripts. See last paragraph of https://en.wikipedia.org/wiki/Echo_(command)#History
f
:
@aelaraji@aelaraji.com You could just remove the {getuser()} part because you added ~.
Diving into mblaze, I think Iāve nearly* reached peek email geek.
Just a bunch of shell commands I can pipe together to search, list, view and reply to email (after syncing it to a local Maildir).
EXAMPLES at https://git.vuxu.org/mblaze/tree/README
So far Iām using most of the tools directly from the command line, but I might take inspiration from https://sr.ht/~rakoo/omail/ to make my workflow a bit more efficient.
*To get any closer, I think Iād have to hand-craft my own SMTP client or something.
@bender@twtxt.net Itās the experience of an ordinary person in a strange place where memories are disappearing with the help of the Memory Police. The setting feels contemporary (to the bookās 1994 publication date) rather than futuristic, except for some unexplained stuff about memories.
Yes, that is exactly what I meant. I like that collection and ātwtxt v2ā feels like a departure.
Maybe thereās an advantage to grouping it into one spec, but IMO that shouldnāt be done at the same time as introducing new untested ideas.
See https://yarn.social (especially this section: https://yarn.social/#self-host) ā It really doesnāt get much simpler than this š¤£
Again, I like this existing simplicity. (I would even argue you donāt need the metadata.)
That page says āFor the best experience your client should also support some of the Twtxt Extensionsā¦ā but it is clear you donāt need to. I would like it to stay that way, and publishing a big long spec and calling it ātwtxt v2ā feels like a departure from that. (I think the content of the document is valuable; Iām just carping about how itās being presented.)
Recent #fiction #scifi #reading:
The Memory Police by YÅko Ogawa. Lovely writing. Very understated; reminded me of Kazuo Ishiguro. Sort of like Nineteen Eighty-Four but not. (I first heard it recommended in comparison to that work.)
Subcutanean by Aaron Reed; https://subcutanean.textories.com/ . Every copy of the book is different, which is a cool idea. I read two of them (one from the library, actually not different from the other printed copies, and one personalized e-book). I donāt read much horror so managed to be a little creeped out by it, which was fun.
The Wind from Nowhere, a 1962 novel by J. G. Ballard. A random pick from the sci-fi section; I think I picked it up because it made me imagine some weird 4-dimensional effect (āfrom nowhereā meaning not in a normal direction) but actually (spoiler) it was just about a lot of wind for no reason. The book was moderately entertaining but there was nothing special about it.
Currently reading Scale by Greg Egan and Inversion by Aric McBay.
More thoughts about changes to twtxt (as if we havenāt had enough thoughts):
- There are lots of great ideas here! Is there a benefit to putting them all into one document? Seems to me this could more easily be a bunch of separate efforts that can progress at their own pace:
1a. Better and longer hashes.
1b. New possibly-controversial ideas like edit: and delete: and location-based references as an alternative to hashes.
1c. Best practices, e.g. Content-Type: text/plain; charset=utf-8
1d. Stuff already described at dev.twtxt.net that doesnāt need any changes.
We wonāt know what will and wonāt work until we try them. So Iām inclined to think of this as a bunch of draft ideas. Maybe later when weāve seen it play out it could make sense to define a group of recommended twtxt extensions and give them a name.
Another reason for 1 (above) is: I like the current situation where all you need to get started is these two short and simple documents:
https://twtxt.readthedocs.io/en/latest/user/twtxtfile.html
https://twtxt.readthedocs.io/en/latest/user/discoverability.html
and everything else is an extension for anyone interested. (Deprecating non-UTC times seems reasonable to me, though.) Having a big long ātwtxt v2ā document seems less inviting to people looking for something simple. (@prologic@twtxt.net you mentioned an anonymous comment āyouāve ruined twtxtā and while I donāt completely agree with that commenterās sentiment, I would feel like twtxt had lost something if it moved away from having a super-simple core.)All that being said, these are just my opinions, and Iām not doing the work of writing software or drafting proposals. Maybe I will at some point, but until then, if youāre actually implementing things, youāre in charge of what you decide to make, and Iām grateful for the work.
@prologic@twtxt.net Done. Also, I went ahead and made two changes: changed hexadecimal to base64 for hashes (wasnāt sure if anyone objected), and changed āMUST follow the chainā to āSHOULD follow the chain.
@prologic@twtxt.net Thanks for pointing out it lasts four hours. Thatās a big window! I wonder when most people will be on. I might aim for halfway through unless I hear otherwise. (12:00Z is a bit early for me.)
@movq@www.uninformativ.de Yes, the tools are surprisingly fast. Still, magrep takes about 20 seconds to search through my archive of 140K emails, so to speed things up I would probably combine it with an indexer like mu, mairix or notmuch.
#fzf is the new emacs: a tool with a simple purpose that has evolved to include an #email client. https://sr.ht/~rakoo/omail/
Iām being a little silly, of course. fzf doesnāt actually check your email, but it appears to be basically the whole user interface for that mail program, with #mblaze wrangling the emails.
Iāve been thinking about how I handle my email, and am tempted to make something similar. (When I originally saw this linked the author was presenting it as an example tweaked to their own needs, encouraging people to make their own.)
This approach could surely also be combined with #jenny, taking the place of (neo)mutt. For example mblazeās mthread tool presents a threaded discussion with indentation.
@bender@twtxt.net Ha! Maybe I should get on the Markdown train. Youāre taking away my excuses.
Sorry, youāre right, I should have used numbers!
Iām donāt understand what āpreserve the original hashā could mean other than āmake sure thereās still a twt in the feed with that hashā. Maybe the text could be clarified somehow.
Iām also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But itās not universal; e.g. as a jenny user I just see the plain text.
@prologic@twtxt.net Do you feel the same about published vs. privately stored data?
For me thereās a distinction. I feel very strongly that I should be able to retain whatever private information I like. On the other hand, I do have some sympathy for requests not to publish or propagate (though I personally feel itās still morally acceptable to ignore such requests).
@lyse@lyse.isobeef.org Iād suggest making the whole content-type thing a SHOULD, to accommodate people just using some hosting service they donāt have much control over. (The same situation could make detecting followers hard, but IMO āplease email me if you follow meā is still legit twtxt, even if inconvenient.)
@prologic@twtxt.net Thanks for writing that up!
I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.
I am not sure how I feel about all this being done at once, vs. letting conventions arise.
For example, even today I could reply to twt abc1234 with ā(#abc1234) Edit: ā¦ā and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.
Similarly we could just start using 11-digit hashes. We should iron out whether itās sha256 or whatever but thereās no need get all the other stuff right at the same time.
I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).
However I recognize that Iām not the one implementing this stuff, and itās less work to just have everything determined up front.
Misc comments (I havenāt read the whole thing):
Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. Iād suggest gaining 11 bits with base64 instead.
āClients MUST preserve the original hashā ā do you mean they MUST preserve the original twt?
Thanks for phrasing the bit about deletions so neutrally.
I donāt like the MUST in āClients MUST follow the chain of reply-to referencesā¦ā. If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldnāt declare the client non-conforming just because they didnāt get to all the bells and whistles.
Similarly I donāt like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (Iām again thinking again of the 40-line shell script).
For āwho followsā lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?
Why canāt feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasnāt too bad, but 1.0 would have been slightly simpler.
Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.
Iām a little sad about other protocols being not recommended.
I donāt know how I feel about including markdown. I donāt mind too much that yarn users emit twts full of markdown, but Iām more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.
@prologic@twtxt.net I have no specifics, only hopes. (I have seen some articles explaining the GDPR doesnāt apply to a āpurely personal or household activityā but I donāt really know what that means.)
I donāt know if itās worth giving much thought to the issue unless either you expect to get big enough for the GDPR to matter a lot (I imagine making money is a prerequisite) or someone specifically brings it up. Unless you enjoy thinking through this sort of thing, of course.
@david@collantes.us Thanks, thatās good feedback to have. I wonder to what extent this already exists in registry servers and yarn pods. I havenāt really tried digging into the past in either one.
How interested would you be in changes in metadata and other comments in the feeds? Iām thinking of just permanently saving every version of each twtxt file that gets pulled, not just the twts. It wouldnāt be hard to do (though presenting the information in a sensible way is another matter). Compression should make storage a non-issue unless someone does something weird with their feed like shuffle the comments around every time I fetch it.
(replyto:ā¦)
over (edit:#)
: (replyto:ā¦)
relies on clients always processing the entire feed ā otherwise they wouldnāt even notice when a twt gets updated. a) This is more expensive, b) you cannot edit twts once they get rotated into an archived feed, because there is nothing signalling clients that they have to re-fetch that archived feed.
@movq@www.uninformativ.de I donāt think it has to be like that. Just make sure the new version of the twt is always appended to your current feed, and have some convention for indicating itās an edit and which twt it supersedes. Keep the original twt as-is (or delete it if you donāt want new followers to see it); doesnāt matter if itās archived because you arenāt changing that copy.
@prologic@twtxt.net Do you have a link to some past discussion?
Would the GDPR would apply to a one-person client like jenny? I seriously hope not. If someone asks me to delete an email they sent me, I donāt think I have to honour that request, no matter how European they are.
I am really bothered by the idea that someone could force me to delete my private, personal record of my interactions with them. Would I have to delete my journal entries about them too if they asked?
Maybe a public-facing client like yarnd needs to consider this, but that also bothers me. I was actually thinking about making an Internet Archive style twtxt archiver, letting you explore past twts, including long-dead feeds, see edit histories, deleted twts, etc.
@david@collantes.us Well, I wouldnāt recommend using my code for your main jenny use anyway. If you want to try it out, set XDG_CONFIG_HOME and XDG_CACHE_HOME to some sandbox directories and only run my code there. If @movq@www.uninformativ.de is interested in any of this getting upstreamed, Iād be happy to try rebasing the changes, but otherwise itās a proof of concept and fun exercise.
I forgot to git add a new test file. Added to the patch now at https://www.falsifian.org/a/oDtr/patch0.txt
@david@collantes.us Hello!
BTW this code doesnāt incorporate existing twts into jennyās database. Itās best used starting from scratch. Iāve been testing it using a custom XDG_CACHE_HOME and XDG_CONFIG_HOME to avoid messing with my ārealā jenny data.
I wrote some code to try out non-hash reply subjects formatted as (replyto ), while keeping the ability to use the existing hash style.
I donāt think we need to decide all at once. If clients add support for a new method then people can use it if they like. The downside of course is that this costs developer time, so I decided to invest a few hours of my own time into a proof of concept.
With apologies to @movq@www.uninformativ.de for corrupting jennyās beautiful code. I donāt write this expecting you to incorporate the patch, because it does complicate things and might not be a direction you want to go in. But if you like any part of this approach feel free to use bits of it; I release the patch under jennyās current LICENCE.
Supporting both kinds of reply in jenny was complicated because each email can only have one Message-Id, and because itās possible the target twt will not be seen until after the twt referencing it. The following patch uses an sqlite database to keep track of known (url, timestamp) pairs, as well as a separate table of (url, timestamp) pairs that havenāt been seen yet but are wanted. When one of those āwantedā twts is finally seen, the mail file gets rewritten to include the appropriate In-Reply-To header.
Patch based on jenny commit 73a5ea81.
https://www.falsifian.org/a/oDtr/patch0.txt
Not implemented:
- Composing twts using the (replyto ā¦) format.
- Probably other important things Iām forgetting.
@prologic@twtxt.net Wikipedia claims sha1 is vulnerable to a āchosen-prefix attackā, which I gather means I can write any two twts I like, and then cause them to have the exact same sha1 hash by appending something. I guess a twt ending in random junk might look suspcious, but perhaps the junk could be worked into an image URL like
. If thatās not possible now maybe it will be later.git only uses sha1 because theyāre stuck with it: migrating is very hard. There was an effort to move git to sha256 but I donāt know its status. I think there is progress being made with Game Of Trees, a git clone that uses the same on-disk format.
I canāt imagine any benefit to using sha1, except that maybe some very old software might support sha1 but not sha256.
@movq@www.uninformativ.de Agreed that hashes have a benefit. I came up with a similar example where when I twted about an 11-character hash collision. Perhaps hashes could be made optional somehow. Like, you could use the āreplytoā idea and then additionally put a hash somewhere if you want to lock in which version of the twt you are replying to.
@quark@ferengi.one Oh, sure, it would be nice if edits didnāt break threads. I was just pondering the circumstances under which I get annoyed about data being irrecoverably deleted or otherwise lost.
@quark@ferengi.one I donāt really mind if the twt gets edited before I even fetch it. I think itās the idea of my computer discarding old versions itās fetched, especially if itās shown them to me, that bugs me.
But I do like @movq@www.uninformativ.deās suggestion on this thread that feeds could contain both the original and the edited twt. I guess it would be up to the author.
@quark@ferengi.one None. I like being able to see edit history for the same reason.
@prologic@twtxt.net Why sha1 in particular? There are known attacks on it. sha256 seems pretty widely supported if youāre worried about support.
@prologic@twtxt.net I wouldnāt want my client to honour delete requests. I like my computerās memory to be better than mine, not worse, so it would bug me if I remember seeing something and my computer canāt find it.
Thereās a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isnāt divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).
So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash functionās output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.
Other than the divisible-by-5 thing, my current intuition is it doesnāt matter what part you take.
Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.
@quark@ferengi.one It looks like the part about traditional topics has been removed from that page. Here is an old version that mentions it: https://web.archive.org/web/20221211165458/https://dev.twtxt.net/doc/twtsubjectextension.html . Still, I donāt see any description of what is actually allowed between the parentheses. May be worth noting that twtxt.net is displaying the twts with the subject stripped, so some piece of code is recognizing it as a subject (or, at least, something to be removed).
Hmm, but yarnd also isnāt showing these twts as being part of a thread. @prologic@twtxt.net you said yarnd respects customs subjects. Shouldnāt these twts count as having a custom subject, and get threaded together?
yarnd just doesnāt render the subject. Fair enough. Itās (replyto http://darch.dk/twtxt.txt 2024-09-15T12:50:17Z), and if you donāt want to go on a hunt, the twt hash is weadxga: https://twtxt.net/twt/weadxga
@sorenpeter@darch.dk I like this idea. Just for fun, Iām using a variant in this twt. (Also because Iām curious how it non-hash subjects appear in jenny and yarn.)
URLs can contain commas so I suggest a different character to separate the url from the date. Is this twt Iāve used space (also after āreplytoā, for symmetry).
I think this solves:
- Changing feed identities: although @mckinley@twtxt.net points out URLs can change, I think this syntax should be okay as long as the feed at that URL can be fetched, and as long as the current canonical URL for the feed lists this one as an alternate.
- editing, if you donāt care about message integrity
- finding the root of a thread, if youāre not following the author
An optional hash could be added if message integrity is desired. (E.g. if you donāt trust the feed author not to make a misleading edit.) Other recent suggestions about how to deal with edits and hashes might be applicable then.
People publishing multiple twts per second should include sub-second precision in their timestamps. As you suggested, the timestamp could just be copied verbatim.
It should be fixed now. Just needed some unusual quoting in my httpd.conf: https://mail-archive.com/misc@openbsd.org/msg169795.html
@lyse@lyse.isobeef.org Sorry, I donāt think I ever had charset=utf8. I just noticed that a few days ago. OpenBSDās httpd might not support including a parameter with the mime type, unfortunately. Iām going to look into it.
(#hash;#originalHash)
would also work.
Maybe Iām being a bit too purist/minimalistic here. As I said before (in one of the 1372739 posts on this topic ā or maybe I didnāt even send that twt, I donāt remember š ), I never really liked hashes to begin with. They arenāt super hard to implement but they are kind of against the beauty of the original twtxt ā because you need special client support for them. Itās not something that you could write manually in your
twtxt.txt
file. With @sorenpeter@darch.dkās proposal, though, that would be possible.
Tangentially related, I was a bit disappointed to learn that the twt subject extension is now never used except with hashes. Manually-written subjects sounded so beautifully ad-hoc and organic as a way to disambiguate replies. Maybe Iāll try it some time just for fun.
(#w4chkna) @falsifian@www.falsifian.org You mean the idea of being able to inline
# url =
changes in your feed?
Yes, that one. But @lyse@lyse.isobeef.org pointed out suffers a compatibility issue, since currently the first listed url is used for hashing, not the last. Unless your feed is in reverse chronological order. Heh, I guess another metadata field could indicate which version to use.
Or maybe url changes could somehow be combined with the archive feeds extension? Could the url metadata field be local to each archive file, so that to switch to a new url all you need to do is archive everything youāve got and start a new file at the new url?
I donāt think itās that likely my feed url will change.
@mckinley@twtxt.net Yes, changing domains is be a problem if you tie your identity to an https url. But I also worry about being stuck with a key I canāt rotate. Whatever gets used, it would be nice to be able to rotate identities. I like @lyse@lyse.isobeef.orgās idea for that.
@prologic@twtxt.net Brute force. I just hashed a bunch of versions of both tweets until I found a collision.
I mostly just wanted an excuse to write the program. I donāt know how I feel about actually using super-long hashes; could make the twts annoying to read if you prefer to view them untransformed.
@prologic@twtxt.net earlier you suggested extending hashes to 11 characters, but hereās an argument that they should be even longer than that.
Imagine I found this twt one day at https://example.com/twtxt.txt :
2024-09-14T22:00Z Useful backup command: rsync -a ā$HOMEā /mnt/backup
and I responded with ā(#5dgoirqemeq) Thanks for the tip!ā. Then Iāve endorsed the twt, but it could latter get changed to
2024-09-14T22:00Z Useful backup command: rm -rf /some_important_directory
which also has an 11-character base32 hash of 5dgoirqemeq. (Iām using the existing hashing method with https://example.com/twtxt.txt as the feed url, but Iām taking 11 characters instead of 7 from the end of the base32 encoding.)
Thatās what I meant by āspoofingā in an earlier twt.
I donāt know if preventing this sort of attack should be a goal, but if it is, the number of bits in the hash should be at least two times log2(number of attempts we want to defend against), where the ātwo timesā is because of the birthday paradox.
Side note: current hashes always end with āaā or āqā, which is a bit wasteful. Maybe we should take the first N characters of the base32 encoding instead of the last N.
Code I used for the above example: https://fossil.falsifian.org/misc/file?name=src/twt_collision/find_collision.c
I only needed to compute 43394987 hashes to find it.
@prx@si3t.ch I havenāt messed with rdomains, but still it might help if you included the command that produced that error (and whether you ran it as root).
Theyāre in Section 6:
Receiver should adopt UDP GRO. (Something about saving CPU processing UDP packets; Iām a but fuzzy about it.) And they have suggestions for making GRO more useful for QUIC.
Some other receiver-side suggestions: āsending delayed QUICK ACKsā; āusing recvmsg to read multiple UDF packets in a single system callā.
Use multiple threads when receiving large files.
HTTPS is supposed to do [verification] anyway.
TLS provides verification that nobody is tampering with or snooping on your connection to a server. It doesnāt, for example, verify that a file downloaded from server A is from the same entity as the one from server B.
I was confused by this response for a while, but now I think I understand what youāre getting at. You are pointing out that with signed feeds, I can verify the authenticity of a feed without accessing the original server, whereas with HTTPS I canāt verify a feed unless I download it myself from the origin server. Is that right?
I.e. if the HTTPS origin server is online and I donāt mind taking the time and bandwidth to contact it, then perhaps signed feeds offer no advantage, but if the origin server might not be online, or I want to download a big archive of lots of feeds at once without contacting each server individually, then I need signed feeds.
feed locations [being] URLs gives some flexibility
It does give flexibility, but perhaps we should have made them URIs instead for even more flexibility. Then, you could use a tag URI,
urn:uuid:*
, or a regular old URL if you wanted to. The spec seems to indicate that theurl
tag should be a working URL that clients can use to find a copy of the feed, optionally at multiple locations. Iām not very familiar with IP{F,N}S but if it ensures you own an identifier forever and that identifier points to a current copy of your feed, it could be a great way to fix it on an individual basis without breaking any specs :)
Iām also not very familiar with IPFS or IPNS.
I havenāt been following the other twts about signatures carefully. I just hope whatever you smart people come up with will be backwards-compatible so it still works if Iām too lazy to change how I publish my feed :-)
@xuu Thanks for the link. I found a pdf on one of the authorsā home pages: https://ahmadhassandebugs.github.io/assets/pdf/quic_www24.pdf . I wonder how the protocol was evaluated closer to the time it became a standard, and whether anything has changed. I wonder if network speeds have grown faster than CPU speeds since then. The paper says the performance is around the same below around 600 Mbps.
To be fair, I donāt think QUIC was ever expected to be faster for transferring a single stream of data. I think QUIC is supposed to reduce the impact of a dropped packet by making sure it only affects the stream itās part of. I imagine QUIC still has that advantage, and this paper is showing the other side of a tradeoff.
@lyse@lyse.isobeef.org This looks like a nice way to do it.
Another thought: if clients canāt agree on the url (for example, if we switch to this new way, but some old clients still do it the old way), that could be mitigated by computing many hashes for each twt: one for every url in the feed. So, if a feed has three URLs, every twt is associated with three hashes when it comes time to put threads together.
A client stills need to choose one url to use for the hash when composing a reply, but this might add some breathing room if thereās a period when clients are doing different things.
(From what I understand of jenny, this would be difficult to implement there since each pseudo-email can only have one msgid to match to the in-reply-to headers. I donāt know about other clients.)
@movq@www.uninformativ.de Another idea: just hash the feed url and time, without the message content. And donāt twt more than once per second.
Maybe you could even just use the time, and rely on @-mentions to disambiguate. Not sure how that would work out.
Though I kind of like the idea of twts being immutable. At least, itās clear which version of a twt youāre replying to (assuming nobody is engineering hash collisions).
In fact, maybe your public key idea is compatible with my last point. Just come up with a url scheme that means āthis feedās primary URL is actually a public keyā, and then feed authors can optionally switch to that.
@prologic@twtxt.net Some criticisms and a possible alternative direction:
Key rotation. Iām not a security person, but my understanding is that itās good to be able to give keys an expiry date and replace them with new ones periodically.
It makes maintaining a feed more complicated. Now instead of just needing to put a file on a web server (and scan the logs for user agents) I also need to do this. What brought me to twtxt was its radical simplicity.
Instead, maybe we should think about a way to allow old urls to be rotated out? Like, my metadata could somehow say that X used to be my primary URL, but going forward from date D onward my primary url is Y. (Or, if you really want to use public key cryptography, maybe something similar could be used for key rotation there.)
Itās nice that your scheme would add a way to verify the twts you download, but https is supposed to do that anyway. If you donāt trust https to do that (maybe you donāt like relying on root CAs?) then maybe your preferred solution should be reflected by your primary feed url. E.g. if you prefer the security offered by IPFS, then maybe an IPNS url would do the trick. The fact that feed locations are URLs gives some flexibility. (But then rotation is still an issue, if I understand ipns right.)
@movq@www.uninformativ.de @prologic@twtxt.net Another option would be: when you edit a twt, prefix the new one with (#[old hash]) and some indication that itās an edited version of the original tweet with that hash. E.g. if the hash used to be abcd123, the new version should start ā(#abcd123) (redit)ā.
What I like about this is that clients that donāt know this convention will still stick it in the same thread. And I feel itās in the spirit of the old pre-hash (subject) convention, though thatās before my time.
I guess it may not work when the edited twt itself is a reply, and there are replies to it. Maybe that could be solved by letting twts have more than one (subject) prefix.
But the great thing about the current system is that nobody can spoof message IDs.
I donāt think twtxt hashes are long enough to prevent spoofing.
@lyse@lyse.isobeef.org Thanks
@prologic@twtxt.net Perfect, thanks. For my own future reference: curl -H āAccept: application/jsonā https://twtxt.net/twt/st3wsda
@bender@twtxt.net So far Iāve been following feeds fairly liberally. Iāll check to see if we have anything in common and lean toward following, just because this is new to me and it feels like a small community. But Iām still figuring out what I want. Later Iāll probably either trim my follower list or come up with some way to prioritize the feeds Iām more interested in.
@prologic@twtxt.net Specifically, I could view yarndās copy here, but only as rendered for a human to view: https://twtxt.net/twt/st3wsda
@movq@www.uninformativ.de thanks for getting to the bottom of it. @prologic@twtxt.net is there a way to view yarndās copy of the raw twt? The edit didnāt result in a visible change; being able to see what yarnd originally downloaded would have helped me debug.
The actual end-user problem is that I canāt see the thread properly when using neomutt+jenny.
@prologic@twtxt.net One of your twts begins with (#st3wsda): https://twtxt.net/twt/bot5z4q
Based on the twtxt.net web UI, it seems to be in reply to a twt by @cuaxolotl@sunshinegardens.org which begins āIāve been sketching outā¦ā.
But jenny thinks the hash of that twt is 6mdqxrq. At least, thereās a very twt in their feed with that hash that has the same text as appears on yarn.social (except with ā instead of ā).
Based on this, it appears jenny and yarnd disagree about the hash of the twt, or perhaps the twt was edited (though I canāt see any difference, assuming ā vs ā is just a rendering choice).
@prologic@twtxt.net I believe you when you say registries as designed today do not crawl. But when I first read the spec, it conjured in my mind a search engine. Now I donāt know how things work out in practice, but just based on reading, I donāt see why it canāt be an API for a crawling search engine. (In fact I donāt see anything in the spec indicating registry servers shouldnāt crawl.)
(I also noticed that https://twtxt.readthedocs.io/en/latest/user/registry.html recommends āThe registries should sync each others user list by using the users endpointā. If I understood that right, registering with one should be enough to appear on others, even if they donāt crawl.)
Does yarnd provide an API for finding twts? Is it similar?
@prologic@twtxt.net I guess I thought they were search engines. Anyway, the registry API looks like a decent one for searching for tweets. Could/should yarn.social pods implement the same API?
I just manually followed the steps at https://dev.twtxt.net/doc/twthashextension.html and got 6mdqxrq. I wonder what happened. Did @cuaxolo@sunshinegardens.org edit the twt in some subtle way after twtxt.net downloaded it? I couldnāt spot a diff, other than ā appearing as ā on yarn.social, which I assume is a transformation done by twtxt.net.
@prologic@twtxt.net Whatās the difference between search.twtxt.net and the /api/plain/tweets endpoint of a registry? In my mind, a registry is a twtxt search engine. Or are registries not supposed to do their own crawling to discover new feeds?