Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GB—an increase of roughly 7.06 GB!
@bender@twtxt.net Ha! Maybe I should get on the Markdown train. You’re taking away my excuses.
@falsifian@www.falsifian.org you can colorise things in Mutt/Neomutt. I have have colours for bold, italics, code, and blockquotes. In a way, I can “see” markdown! 😊
@falsifian@www.falsifian.org No worries! Fell few to contribute to the doc directly I’d you wish 👌
@falsifian@www.falsifian.org Hmmm not sure sorry 🤔
Sorry, you’re right, I should have used numbers!
I’m don’t understand what “preserve the original hash” could mean other than “make sure there’s still a twt in the feed with that hash”. Maybe the text could be clarified somehow.
I’m also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But it’s not universal; e.g. as a jenny user I just see the plain text.
@xuu Goos to know! 👌 So as long as we remain decentralized and non-commercial (I assume non/profit works too?) we’re good?
@falsifian@www.falsifian.org The GDPR does not apply to the processing of data for a purely personal or household activity that is not connected to a professional or commercial activity.
@falsifian@www.falsifian.org it would be easier if instead of a bulleted list you would have used a numbered one. That way it would be easier to refer to the specific miscellaneous comment.
I have little to contribute on this reply. On bullet two, he meant the original hash. On the last bullet, markdown is already part of it (after all, it is plain text). Yarn, being a web client/server, simply renders it.
@prologic@twtxt.net Do you feel the same about published vs. privately stored data?
For me there’s a distinction. I feel very strongly that I should be able to retain whatever private information I like. On the other hand, I do have some sympathy for requests not to publish or propagate (though I personally feel it’s still morally acceptable to ignore such requests).
@lyse@lyse.isobeef.org I’d suggest making the whole content-type thing a SHOULD, to accommodate people just using some hosting service they don’t have much control over. (The same situation could make detecting followers hard, but IMO “please email me if you follow me” is still legit twtxt, even if inconvenient.)
@prologic@twtxt.net Thanks for writing that up!
I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.
I am not sure how I feel about all this being done at once, vs. letting conventions arise.
For example, even today I could reply to twt abc1234 with “(#abc1234) Edit: …” and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.
Similarly we could just start using 11-digit hashes. We should iron out whether it’s sha256 or whatever but there’s no need get all the other stuff right at the same time.
I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).
However I recognize that I’m not the one implementing this stuff, and it’s less work to just have everything determined up front.
Misc comments (I haven’t read the whole thing):
Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. I’d suggest gaining 11 bits with base64 instead.
“Clients MUST preserve the original hash” — do you mean they MUST preserve the original twt?
Thanks for phrasing the bit about deletions so neutrally.
I don’t like the MUST in “Clients MUST follow the chain of reply-to references…”. If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldn’t declare the client non-conforming just because they didn’t get to all the bells and whistles.
Similarly I don’t like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (I’m again thinking again of the 40-line shell script).
For “who follows” lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?
Why can’t feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasn’t too bad, but 1.0 would have been slightly simpler.
Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.
I’m a little sad about other protocols being not recommended.
I don’t know how I feel about including markdown. I don’t mind too much that yarn users emit twts full of markdown, but I’m more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.
Meanwhile in Florida we are having a very Autumnal Equinox day, with temperatures 10-14° cooler than normal. That, on its own, isn’t normal at all, but I taketh! 😂
@lyse@lyse.isobeef.org Wet and warm, yeah. 🫤 There were flies everywhere, lots of them, on all windows of the apartment. Never seen anything like that. 😳🪰 Like the building was a dead carcass. 😂
@movq@www.uninformativ.de Heaps of mozzies and other stuff that wants to eats you. Yeah, I noticed that as well. But I don’t know if it’s really more than usual. I might just have forgotten how bad it was in the past by now. :-?
With the wet beginning this year, water-loving insects certainly got a head start.
There are so many insects this year. Flies, ants, bugs. This isn’t normal. It’s almost like the ecosystem is getting out of balance.
@lyse@lyse.isobeef.org Nice ! 🙏
@doesnm@doesnm.p.psf.lt Hello! 👋
Hello!
@prologic@twtxt.net Correct. The plan is that operators have to manually trust a peer before it is used for fetching missing conversation roots from. Preview of the horrible UI:
@bender@twtxt.net Yeah, it was nice. 23°C and a bit of wind. Quite acceptable in my opinion. :-)
@lyse@lyse.isobeef.org Yes let’s make UTF-8 mandatory 👌
@lyse@lyse.isobeef.org Agreed
@prologic@twtxt.net @movq@www.uninformativ.de In all reality, even seconds precision would be enough for this new feed announcement bot. It just has to delay or predate its messages. It hopefully does not find new feeds all the time. :-)
Let’s try this pill for Twtxt v2 (no account required)
@prologic@twtxt.net What should happen if the archive chain is detected to be broken? I don’t think that including the hash in the prev
field does really help us in reality. What if messages in the archive feed themselves got lost? You can’t detect this unless you’ve already known about them. I reckon we can simply use the relative path and call it good. I know, I know, we have this format already today. But in my opinion, the hash does not add value.
@prologic@twtxt.net The Content-Type
should probably even include the charset=utf-8
as we learned recently. :-) Iff you want to keep the UTF-8 encoding mandatory. It doesn’t say anything about it in that document.
@lyse@lyse.isobeef.org I’m a bit indifferent whether it’s at the beginning or end tbh.
@prologic@twtxt.net The reply-to
can come anywhere in the message text? Most examples even put it at the very end. Why relax that? It currently has to be at the beginning, which I think makes parsing easier. I have to admit, at the end makes reading the raw feed nicer. But multi-line messages with U+2028 ruin the raw feed reading experience very quickly.
This is still a draft! Feel free to edit it 👌
@movq@www.uninformativ.de That’s what I was afraid of 🤣
yarnd
to see how many things would break and how many assumptions there are around the idea of "Content Addressing"; here's where I'm at so far:
@movq@www.uninformativ.de Makes sense 👌 I think it’s fair to implement any spec changes incrementaly for sure 👌
And yea since yarnd has a store it’s a bit easier to support edit / delete actions 😅
@prologic@twtxt.net For hash calculation we could maybe rethink the newlines and use tabs instead. This is more in line with the twtxt file format itself. With tabs it also is much closer to the registry format (minus the nick).
What about the timestamp format? Just verbatim as it appears in the feed (what I would recommend) or any other shenanigans with normalization, like +00:00 → Z
?
An append style is not required, btw. If one uses prepend style feeds, the new URL simply comes at the beginning of the file, where the old URL is further down.
Clients must use the full-length hash in their storages, but only use the first eleven digits when referencing? This differentiation is a bit odd.
@prologic@twtxt.net You can’t. The timestamps have to be unique. Including milliseconds or nanoseconds would be an easy way out, that’s allowed in RFC3339: https://datatracker.ietf.org/doc/html/rfc3339#section-5.6
@prologic@twtxt.net The multline example is broken. I don’t see any “pipes”.
So I’m a location based system, how exactly do I reply to one of these two Twts from @Yarns@search.twtxt.net ? 🤔
2024-09-07T12:55:56Z 🥳 NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z 🥳 NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
yarnd
to see how many things would break and how many assumptions there are around the idea of "Content Addressing"; here's where I'm at so far:
This scope of changes is much easier to implement for
yarnd
and I suspectjenny
too.
No, (edit:)
is a lot of work for jenny and also adds a lot of overhead.
Right now, jenny itself has no idea which twt hashes are present on which feed, because it doesn’t need to. This information only exists in the mail files. This means I can’t check if an (edit:)
operation is legal. jenny will have to get an sqlite database (or read/parse/write 1-2 MB of JSON on every invocation, which isn’t great either).
I’ve already spent several hours this morning rewriting the feed fetching/parsing code in an effort to pave the way to even be able to support any of this. I’ll spare you the details. Until now, twts were individual items in a feed, they could be processed in any order and they didn’t reference each other from jennys point of view. A lot of the heavy lifting happens in the mail client.
Honestly, the database thing bugs me the most. The whole concept of “just create some mail files” doesn’t really work anymore. I now have to duplicate state between the mail files and an internal database. This is a big “meh”.
Of course, if we were to switch to location-based addressing, then you would have to do a lot of work. There’s no easy way out.
Maybe I could say jenny does not support (edit:)
for now. That’s the good thing about this proposal: I don’t have to implement it right away. Users will see spurious twts (or I could hide them as a workaround) and they won’t see twt updates, but nothing will break.
@prologic@twtxt.net I notice that in your document it says reply-to
, where in the ReplyTo Extension it’s without the hyphen. (But they also use different values after the colon. :-))
@lyse@lyse.isobeef.org Yup, this is why you started seeing if you could improve the “trust” of peers right? 😅
yarnd
to see how many things would break and how many assumptions there are around the idea of "Content Addressing"; here's where I'm at so far:
@movq@www.uninformativ.de Yeah I think what I’m proposing here is a more pragmatic approach to improvements that will last much longer than our first interaction (~4 years and going strong, but running into minor issues with edit/identify and some collssions_). This scope of changes is much easier to implement for yarnd
and I suspect jenny
too. and as indicated in here quite easy to have a reference implementation written in Bash with standard UNIX tools.
Thanks again for typing it up, @movq@www.uninformativ.de! I left a few comments there. Currently, I’m in favor of the location-based adressing, that’s heaps simpler.
It’s even sorta/somewhat compatible with our existing feeds (kind of) 🤣 – Bit too stupid to figure out how to write enough correct Bash to make threads display inline nicely in an indented/tree-like fashion, but oh well 😅
Example:
$ ./twtxt-v2.sh reply 242561ce02d "Cool! 👌"
Posted twt with hash: b2c938f9838
...
$ ./twtxt-v2.sh timeline
...
prologic@twtxt.net [2024-09-22T07:26:37Z] <242561ce02d> Okay folks, I've spent all day on this today, and I _think_ its in "good enough"™ shape to share:
**Twtxt v2**:
- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b
![](https://twtxt.net/media/Wb9MtAiQyEkzNQB5dyVvUR.png)
prologic@localhost [2024-09-22T07:51:16Z] <b2c938f9838> Cool! 👌 (reply-to:242561ce02d)
@sorenpeter@darch.dk Excellent point! I agree.
@bender@twtxt.net @prologic@twtxt.net @aelaraji@aelaraji.com Everything entering over Pod Gossiping is only cached temporarily, but never archived. So, it eventually fell off the cache. If my fake feeds were still up, yarnd would have pulled it from me again. I ran into the situation locally as well and then got it back, though.
Okay folks, I’ve spent all day on this today, and I think its in “good enough”™ shape to share:
Twtxt v2:
- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b
@aelaraji@aelaraji.com No that is absolutely correct. Without cryptographic identities and signatures there is no way to verify authenticity. That is correct. And I don’t think we need to necessarily. What I was just showing and proving was that I didn’t write that spoofed Twt in the first place, which was only provable at the time of @lyse@lyse.isobeef.org short-lived attack 🤣 He essentially forked yarnd
, hosted it temporarily (I think locally) and used it to poison the caches of a few production pods.
Thankfully the gossip protocol used by yarnd
as part of its “peering” between pods isn’t fully trusted, twts are not archived for example into permanent storage. So the moment my pod re-fetched my own feed, the spoofed Twt was obliterated 😅
Eventual consistency 🤣
LOl 😂 Not only have a tried to write up a full Twtxt v2 specification, I’ve also written a Bash shell script that implements the new spec 😅
@movq@www.uninformativ.de Haha 😝 Nice one! And yes I’m also aware of some collisions too!