@prologic@twtxt.net The Content-Type
should probably even include the charset=utf-8
as we learned recently. :-) Iff you want to keep the UTF-8 encoding mandatory. It doesnāt say anything about it in that document.
@prologic@twtxt.net The reply-to
can come anywhere in the message text? Most examples even put it at the very end. Why relax that? It currently has to be at the beginning, which I think makes parsing easier. I have to admit, at the end makes reading the raw feed nicer. But multi-line messages with U+2028 ruin the raw feed reading experience very quickly.
@prologic@twtxt.net For hash calculation we could maybe rethink the newlines and use tabs instead. This is more in line with the twtxt file format itself. With tabs it also is much closer to the registry format (minus the nick).
What about the timestamp format? Just verbatim as it appears in the feed (what I would recommend) or any other shenanigans with normalization, like +00:00 ā Z
?
An append style is not required, btw. If one uses prepend style feeds, the new URL simply comes at the beginning of the file, where the old URL is further down.
Clients must use the full-length hash in their storages, but only use the first eleven digits when referencing? This differentiation is a bit odd.
@prologic@twtxt.net The multline example is broken. I donāt see any āpipesā.
@prologic@twtxt.net I notice that in your document it says reply-to
, where in the ReplyTo Extension itās without the hyphen. (But they also use different values after the colon. :-))
Thanks again for typing it up, @movq@www.uninformativ.de! I left a few comments there. Currently, Iām in favor of the location-based adressing, thatās heaps simpler.
@sorenpeter@darch.dk Excellent point! I agree.
@bender@twtxt.net @prologic@twtxt.net @aelaraji@aelaraji.com Everything entering over Pod Gossiping is only cached temporarily, but never archived. So, it eventually fell off the cache. If my fake feeds were still up, yarnd would have pulled it from me again. I ran into the situation locally as well and then got it back, though.
@movq@www.uninformativ.de Awesome, thank you very much! Iāll have a look at it tomorrow.
It was beautiful in nature: https://lyse.isobeef.org/waldspaziergang-2024-09-21/
@prologic@twtxt.net Let me try:
Invent anything you want, say feed A writes message text B at timestamp C. You simply create the hash D for it and reply to precisely that D as subject in your own feed E with your message text F at timestamp G. This gets hashed to H.
Now then, some a client J fetches your feed E. It sees your response from time G with text F where in the subject you reference hash D. Since client J does not know about hash D, it simply asks some peers about it. If it happens to query your yarnd for it, you could happily serve it your invention: āYou wanna know about hash D? Oh, thatās easy, feed A wrote B at time C.ā
The client J then verifies it and since everthing lines up, it looks legitimate and puts this record in its cache or displays it to the user or whatever. It does not even matter, if the client J follows feed A or not. The message text B at C with hash D could have just deleted or edited in the meantime.
Congrats, you successfully spread rumors. :-D
@prologic@twtxt.net This does not hold if the edit happened before I even got the original.
@falsifian@www.falsifian.org Something similar exists over at https://search.twtxt.net/. But a usable search engine would be actually nice (to be fair, yarns improved a bit). :-) I donāt care about feed changes over time. In fact, it would even feel creepy to me. Of course, anyone could still surveil, but Iām not looking forward to these stats.
@movq@www.uninformativ.de We could still let the client display a warning if it cannot verify it. But yeah.
@movq@www.uninformativ.de Reminds me of this beautiful face recognition failure: https://qz.com/823820/carnegie-mellon-made-a-special-pair-of-glasses-that-lets-you-steal-a-digital-identity :-D
@prologic@twtxt.net Just what @bender@twtxt.net did. :-D If heād additionally serve the fake message from his yarnd twt endpoint, everybody querying that hash from him (or any other yarnd that synced it in the meantime) would believe, that I didnāt like Australians.
In fact, I really donāt. I loveāem! 8-)
We would need to sign each message in a feed, so others could verify that this was actually part of that feed and not made up. But then we end up in the crypto debate for identities again, which Iām not a big fan of. :-)
I just want to highlight, one might get a false sense of message authenticity, if one just briefly looks at the hashes.
@movq@www.uninformativ.de Ah, cool. :-)
It just occurs to me weāre now building some kind of control structures or commands with (edit:ā¦)
and (delete:ā¦)
into feeds. Itās not just a simple āadd this to your cacheā or āreplace the cache with this set of messagesā anymore. Hmm. We might need to think about the consequences of that, can this be exploited somehow, etc.
@movq@www.uninformativ.de Not sure if I like the idea of keeping the original message around. It goes against the spirit of an edit in my mind.
If thatās what we want to enforce, forget about my other message above in the thread.
@prologic@twtxt.net @movq@www.uninformativ.de I still donāt understand it. If the original message has been replaced with the edited one, I cannot verify that the original was in the same feed. I donāt know the original text.
Hahahahahaahaaaahaaaaaa, brilliant! I love it, @bender@twtxt.net! :ā-D
@movq@www.uninformativ.de Thanks for the summary!
So, what would happen if there is no original message anymore in the feed and you encounter an āeditā subject? Since you cannot verify that the feed contained it in the first place, would you obey it?
Some feed could just make a client update something from a different feed. In the cache, the client would need to store in a flag that this message was updated, so that when it later encounters the message from the real feed, it has a chance of reverting that bogus edit. Hmm. The devil is in the detail.
Itās much easier with a delete subject. When it finds the message in its cache and the feeds match, remove it. Otherwise, just ignore it.
@movq@www.uninformativ.de Right. Thatās why, Iād bite the bullet and go for huge URLs. :-)
I haventāt looked at the code and Iām too lazy right now, does jenny also verify the fetched result against the hash?
@movq@www.uninformativ.de Yeah, but hashing also uses the main feed URL or whatever is written in the feedās first url
metadata field. So, itās not a new problem, itās exactly the same.
@movq@www.uninformativ.de @david@collantes.us Yeah, he got a bit older but I could still easily recognize him.
Another thing: At the moment, anyone could claim that some feed contained a certain message which was then removed again by just creating the hash over the fake message in said feed and invented timestamp themselves. Nobody can ever verify that this was never the case in the first place and completely made up. So, our twt hashes have to be taken with a grain of salt.
@david@collantes.us Cool idea actually! The hash would also be shorter than the raw URL and timestamp.
@prologic@twtxt.net I get where youāre coming from. But is it really that bad in practice? If you follow any link somewhere in the web, you also donāt know if its contents has been changed in the meantime. Is that a problem? Almost never in my experience.
Granted, itās a nice property when one can tell that it was not messed with since the author referenced it.
@movq@www.uninformativ.de The more I think about it, the more do I like the location-based addressing. That feels fairly in line with the spirit of twtxt, just like you stated somewhere else.
The big downside for me is that the subjects then become super long.
And if the feed relocates, we end up with broken conversation trees again. Just like nowadays. At least itās not getting worse. :-)
Using the feed URL in there might become a little challenging for new folks, when the twt rotates away into archive feeds. But I reckon, we already have a similar situation with the hashes. So, probably not too bad.
@quark@ferengi.one Yeah, letās see what they reveal!
Nice, @david@collantes.us! The winter palms look nice. And the sky is full of snow.
Yesterday, both temperature and wind picked up. There was even wind in the night, which is rare over here. Today, we also got a lot of sunshine, around 22Ā°C and heaps of wind. The leaves and twigs were blown at the house door, it reminded me of a snow drift, basically a leave bank. I should have taken a photo before I swept it, it looked quite bizarre.
But I photographed something else instead:
My mate and I went out in the woods earlier and we came across 08 which broke off in roughly 6, 7Ā meters from 09. When it hit the ground, it made a 30Ā cm deep hole. Quite impressive. https://lyse.isobeef.org/waldspaziergang-2024-09-19/
@falsifian@www.falsifian.org Yeah, delete requests feel very odd.
@prologic@twtxt.net I wish that was true! But I reckon there is still heaps of old stuff out there, that was created on a Windows machine. :-D And I wouldnāt be surprised if even today in that environment a new file does not make use of UTF-8.
@quark@ferengi.one Iām not convinced. :-D
@quark@ferengi.one @movq@www.uninformativ.de Yep, theyāre all RFC3339. Obviously, +02:00
and +01:00
are best, because I use them! :-P In all seriousness, Z
might be the best timezone, as it is shortest. And regarding privacy, it leaks the least information about the userās rough location. But of course, one can just look at the activity and narrow down plausible regions, so thatās a weak argument.
@falsifian@www.falsifian.org I can confirm, itās fixed. Thank you! Indeed, this is some wild quoting.
I still do not understand why the encoding suddenly broke, though. :-? Anyway. I concentrate on my rewrite and do things the rightā¢ way. ;-) Still long ways to go.
main.go
(but it can be done on a template now, so no reason to touch the code):
@bender@twtxt.net I know, I knowā¦ A relative time in a static HTML document is questionable at best. ;-)
Now WTF!? Suddenly, @falsifian@www.falsifian.orgās feed renders broken in my tt Python implementation. Exactly what I had with my Go rewrite. I havenāt touched the Python stuff in ages, though. Also, tt and tt2 do not share any data at all.
By any chance, did you remove the ; charset=utf-8
from your Content-Type: text/plain
header, falsifian?
@movq@www.uninformativ.de Non-ASCII characters were broken. Like U+2028, degrees (Ā°), etc.
Turns out I used a silly library to detect the encoding and transform to UTF-8 if needed. When there is no Content-Type header, like for local files, it looks at the first 1024 bytes. Since it only saw ASCII in that region, the damn thing assumed the data to be in Windows-1252 (which for web pages kinda makes sense):
// TODO: change default depending on user's locale?
return charmap.Windows1252, "windows-1252", false
https://cs.opensource.google/go/x/net/+/master:html/charset/charset.go;l=102
This default is hardcoded and cannot be changed.
Trying to be smart and adding automatic support for other encodings turned out to be a bad move on my end. At least I can reduce my dependency list again. :-)
I now just reject everything that explicitly specifies something different than text/plain
and an optional charset other than utf-8
(ignoring casing). Otherwise I assume itās in UTF-8 (just like the twtxt file format specification mandates) and hope for the best.
Hmmmm, I somehow run into an encoding problem where my inserted data end up mangled in the database. But, both SQLite and Go use UTF-8. Whatās happening here? :-?
@prologic@twtxt.net Correct. :-D
ssh-keygen -Y sign
or ssh-keygen -Y verify
tools already available? Maybe in combination with @xuu 's idea of generating a random unique ID for your feed, say # id =
and signing it with your ED25519 key? š
@prologic@twtxt.net Iām basically with @movq@www.uninformativ.de, but in contrast to him, Iām not looking forward to implement something like that. :-)
A feed URL is plenty good enough for me. Since I only fetch feeds that I explicity follow, there is some basic trust in those feeds already. Spoofing, impersonation and what not are no issues for me. If I were to find out otherwise, I just unsubscribe from the evil feed. Done.
To retrieve public feeds, I just rely on TLS. Most are served via HTTPS. If a feed is down, Iām not trying to fetch it from some other source, I just wait and try again later. So signed messages/feeds are not a use case Iām particularly benefitting from.
To me, itās just not worth at all adding this crypto complexity on top.
@prologic@twtxt.net Yeah, but I reckon we can kill both birds with one stone. If we change it to support edits, it should be fairly easy to also tweak it to support feed URL changes. Like outlined in my first reply: https://twtxt.net/twt/n4omfvq The URL part sounds way easier to me. :-)
@sorenpeter@darch.dk There was or maybe still is a competing proposal for multiline twts that combines all twts with the same timestamp to one logical multiline twt. Not sure what happened to that, if it is used in the wild and whether anyone āhereā follows a feed with that convention. āOurā solution for multiline twts is to use U+2028 Unicode LINE SEPARATOR as a newline: https://dev.twtxt.net/doc/multilineextension.html.
@movq@www.uninformativ.de Whatās you definition of ācomplete threadā? ;-) There might be feeds participating in the conversation that you have no idea of.
But yes, this has a nice discoverability bonus. And even simpler than a hash, thatās right.
@movq@www.uninformativ.de Yeah, I think so.
Keys for identity are too much for me. This steps up the complexity by a lot. Simplicity is what made me join twtxt with its extensions. A feed URL is all I need.
Eventually, twt hashes have to change (lengthen at least), no doubt about that. But Iād like to keep it equally simple.