Alright, let’s call attention to this fact and let’s hear your opinions on this:
With the (replyto:…)
proposal, clients cannot indicate that a twt was edited in the long run. Clients can, of course, show that right now, but when they clean their cache and refetch feeds, the information is lost. This can be abused by malicious actors if sufficient time has passed (clients must have purged their cache): Malicious actors can change root twts and thus change the meaning of thread/replies.
Is this a showstopper for you? 🤔
(replyto:…)
. It’s easier to implement and the whole edits-breaking-threads thing resolves itself in a “natural” way without the need to add stuff to the protocol.
@prologic@twtxt.net Kind of. But the (edit:)
spec has similar problems in its current form:
- Post a normal twt with nonsense content, let’s say the content is just a dot “.”.
- Post an update to that twt, this time filling it with actual content, let’s say: “Birds are great!”
- Wait for people to reply to your twt (which is the edited one). You might get lots of replies along the lines of “ohhhh, yeah!” or “😍😍😍” or other stuff wholeheartedly agreeing with you.
- Post another update to the first twt, again changing the content completely, let’s say: “The earth is flat!”
- Delete your first update from your feed, the one with the birds. Not with
(delete:)
, just remove the line.
- There’s now a thread with lots of people agreeing to a twt that says “The earth is flat!”
You might be able to see that the original content was just a dot “.”, but the twt that people actually replied to is gone for good and no way to detect that.
This raises two questions:
- The easy question: What do we do when the twt that an
(edit:)
line refers to is removed later on from a feed? We would have to delete that original twt from our caches, including the edit operation. This should be part of the spec.
- The result being a thread without a root, just like it is today. That’s fine.
- The result being a thread without a root, just like it is today. That’s fine.
- The hard question: How do we deal with multiple (potentially malicious or misleading) edits? Do we even want to open that can of worms? People only ever use the original twt hash in their replies, so nobody really knows to which edited version they’re replying. That is very similar to the
(replyto:)
situation, I think. 🤔
(replyto:…)
. It’s easier to implement and the whole edits-breaking-threads thing resolves itself in a “natural” way without the need to add stuff to the protocol.
I just realized the other big property you lose is:
What if someone completely changes the content of the root of the thread?
Does the Subject reference the feed and timestamp only or the intent too?
Then the content of that root twt changes. Just like it would with (edit:…)
. The only difference is that you cannot go back to that person’s feed and find out what the original content was.
In other words, we can’t (reliably) show a little star *
like on Mastodon to indicate edits.
(replyto:…)
. It’s easier to implement and the whole edits-breaking-threads thing resolves itself in a “natural” way without the need to add stuff to the protocol.
Regarding the URL changing issue: That is not a new issue and not addressed by either PR. Do you have some plans to solve this that only works with hashes? 🤔 Is it feed signing? I have to admit here, I forgot most about the feed signing ideas. 🙈
- Twt Subjects lose their meaning
You mean existing threads in the past? Yeah.
- Twt Subjects cannot be verified without looking up the feed.
- Which may or may not exist anymore or may change.
Not sure what you mean? 🤔 But yes, things can change (that’s the point).
- Two persons cannot reply to a Twt independently of each other anymore.
How so? 🤔 That would be a total show-stopper, I agree. But are you sure that’s going to happen? For example, if people were to reply to this very twt of yours, they would do this:
(replyto:https://twtxt.net/user/prologic/twtxt.txt,2024-09-21T15:22:18Z) foobar
Am I missing something?
I’m still more in favor of (replyto:…)
. It’s easier to implement and the whole edits-breaking-threads thing resolves itself in a “natural” way without the need to add stuff to the protocol.
I’d love to try this out in practice to see how well it performs. 🤔 It’s all very theoretical at the moment.
Guess you could say:
(replyto:…)
is twtxt-style
(edit:…)
and(delete:…)
is Yarn-style
Alright, before I go and watch Formula 1 😅, I made two PRs regarding the two “competing” ideas:
- https://git.mills.io/yarnsocial/yarn/pulls/1179 –
(replyto:…)
- https://git.mills.io/yarnsocial/yarn/pulls/1180 –
(edit:…)
and(delete:…)
As a first step, this summarizes my current understanding. Please comment! 😊
@prologic@twtxt.net I only saw your previous twt right now. You said:
In order for this to be true,
yarnd
would have to be maliciously fabricating a Twt with the Hash D.
Yep, that’s one way.
Now, I have no idea how any of the gossipping stuff in Yarn works, but maybe a malicious pod could also inject such a fabricated twt into your cache by gossipping it?
Either way, hashes are just integrity checks basically, not proof that a certain feed published a certain twt.
@lyse@lyse.isobeef.org Yeah, makes sense. You don’t even need hash collisions for that. 🤔 (I guess only individually signed twts would prevent that. 🙈 Yet another can of worms.)
@falsifian@www.falsifian.org I’m curious myself now and might look it up (or even ask some of our legal guys/gals 😅).
I think none of this matters to people outside the EU anyway. These aren’t your laws. Even if you were to start a company in the US, it would only be a marketing instrument for you: “Hey, look, we follow GDPR!” EU people might then be more inclined to become your customers. But that’s it.
That said, I’m not sure anymore if there are any other treaties between the EU and the US which cover such things …
@lyse@lyse.isobeef.org I think that’s what we would have to enforce – otherwise we’d run into the problem you’ve outlined. 😃
(replyto:…)
over (edit:#)
: (replyto:…)
relies on clients always processing the entire feed – otherwise they wouldn’t even notice when a twt gets updated. a) This is more expensive, b) you cannot edit twts once they get rotated into an archived feed, because there is nothing signalling clients that they have to re-fetch that archived feed.
@falsifian@www.falsifian.org I think we’re talking about different ideas here. 🤔
Maybe it’s time to draft all this into a spec or, rather, two different specs. I might do that over the weekend.
@prologic@twtxt.net Nah, just language barrier and/or me being a big stupid. 🥴 All good. 👌
@prologic@twtxt.net Okay, looks like I misunderstood/misinterpreted your previous message then. 👌
@prologic@twtxt.net So, this is either me nit-picking or me not understanding the hash system after all. 😃
An edit twt would look like this:
2024-09-20T14:57:11Z (edit:#123467) foobar
So we now have to verify that #123467
actually exists in this same feed. How do we do that? We must build a list of all twts/hashes of this feed and then check if #123467
is in that list. Right?
You’re kind of implying that it would be possible to cryptographically validate that this hash belongs to this feed. That’s not possible, is it? 🤔
One distinct disadvantage of (replyto:…)
over (edit:#)
: (replyto:…)
relies on clients always processing the entire feed – otherwise they wouldn’t even notice when a twt gets updated. a) This is more expensive, b) you cannot edit twts once they get rotated into an archived feed, because there is nothing signalling clients that they have to re-fetch that archived feed.
I guess neither matters that much in practice. It’s still a disadvantage.
Held another “talk” about Git today at work. It was covering some “basics” about what’s going on in the .git
directory. Last time I did that was over 11 years ago. 😅 (I often give introductions about Git, but they’re about day to day usage and very high-level.)
I’ve gotta say, Git is one of the very few pieces of software that I love using and teaching. The files on your disk follow a simple enough format/pattern and you can actually teach people how it all works and, for example, why things like rebasing produce a particular result. 👌
@lyse@lyse.isobeef.org I’m gonna do some self-tests on face blindness. 😂
So, what would happen if there is no original message anymore in the feed and you encounter an “edit” subject?
We’d have to classify this as invalid and discard it. If the referenced twt is not present in the feed (or any archived feed), then it might potentially belong to some other feed, and feeds overwriting the contents of other feeds is pretty bad. 😅
As @prologic@twtxt.net said, clients must always check that twts referenced by edit
and delete
are actually present in that very feed.
What about edits of edits? Do we want to “chain” edits or does the latest edit simply win?
Chained edits:
[#abcd111] [2024-09-20T12:00:00Z] [Hello!]
[#abcd222] [2024-09-20T12:10:00Z] [(edit:#abcd111) Hello World!]
[#abcd333] [2024-09-20T12:20:00Z] [(edit:#abcd222) Hello Birds!]
Latest edit wins:
[#abcd111] [2024-09-20T12:00:00Z] [Hello!]
[#abcd222] [2024-09-20T12:10:00Z] [(edit:#abcd111) Hello World!]
[#abcd333] [2024-09-20T12:20:00Z] [(edit:#abcd111) Hello Birds!]
Does the first version have any benefits? I don’t think so … ?
@prologic@twtxt.net Yeah, you’re right. That’s an implementation detail of jenny. Right now, the order of twts doesn’t matter at all, because it’s only relevant at display time – and that’s the job of mutt. 😅
@falsifian@www.falsifian.org Oof, yeah, I haven’t even started thinking about supporting two schemes at the same time. 😅 I’d be hoping for not having to use something like an sqlite database, if it can’t be avoided.
By the way: Since we have so few modern twtxt/Yarn clients, forking jenny might not be the worst idea. If you wanted to take it into a very different direction, then by all means, go for it. 👍
@lyse@lyse.isobeef.org When it asks a Yarn pod, you mean? Yeah, it does so implicitly. It builds a tiny dummy feed from the JSON response and then looks for the specified twt hash in that feed.
--fetch-context
, which asks a Yarn pod for a twt, wouldn’t break, but jenny would not be able anymore to verify that it actually got the correct twt. That’s a concrete example where we would lose functionality.
@prologic@twtxt.net Wouldn’t work in what way? Could you elaborate? 🤔
Do you consider crawling archived feeds a problem/failure? 🤔
@david@collantes.us Such a funny picture – we’ve been to Florida once some ~30 years ago and it looked almost exactly like that. 😅
@david@collantes.us Yeah, but it happened so fast with him. 😅 I remember watching some of his talks 1-3 years ago, looked completely different, I think. 😅
Luckily I can still recognize the voice, so I know it’s him, lol.
@lyse@lyse.isobeef.org The hash/thread-id would be shorter, but you’d lose two other benefits of (replyto:…)
:
- You need a special client again to compute hashes.
- The original feed URL is no longer visible, thus you might need to ask a Yarn pod occasionally for missing twts (I do that surprisingly often, now that I’ve implemented it) – but now you’ve lost the guarantee that Yarn gives you the correct information, because you can no longer verify it.
@lyse@lyse.isobeef.org Right, feed rotation gets ugly. We’d have (replyto:example.com/tw.txt,$timestamp)
but maybe that feed doesn’t actually contain that stamp, so you have to got further back … but you should NOT reference an archived feed in your (replyto:…)
thingy, it should still be the “main feed URL” (because the contents of archived feeds aren’t stable, see @prologic@twtxt.net’s feeds for example). That’s not too great.
Man, I’m completely torn on this. I’d almost prefer not to decide anything. 😂
--fetch-context
, which asks a Yarn pod for a twt, wouldn’t break, but jenny would not be able anymore to verify that it actually got the correct twt. That’s a concrete example where we would lose functionality.
… then, of course, I wouldn’t need to ask a Yarn pod for a certain twt if we used (replyto:…)
instead of (#123467)
, because the original source of the twt is no longer obscured by a hash value and I can just pull the original feed. Asking a Yarn pod is only interesting at the moment because I have no idea where to get (#123467)
from.
Only when the original feed has gone offline will querying a Yarn pod become relevant again.
I have to admit here that some of the goals/philosophy of Yarn simply don’t apply to my use cases. 😅 I don’t run a daemon that speaks a gossipping protocol with neighboring pods or stuff like that. I think I don’t have a hard time accepting that feeds might go offline in two months, so be it. Digging up ancient twts from some sort of globally distributed file system isn’t one of my goals. It’s a completely different thing for me. Hmmm. 🤔
I’m bad with faces, I know that. But I’m having a really hard time recognizing Linus in this video:
https://www.youtube.com/watch?v=4WCTGycBceg
Basically a different person to me. Is it just me or has he really changed that much? 😳
compressed_subject(msg_singlelined)
be configurable, so only a certain number of characters get displayed, ending on ellipses? Right now the entire twtxt is crammed into the Subject:
. This request aims to make twtxts display on mutt
/neomutt
, etc. more like emails do.
@david@collantes.us Glad you like it. 😅
compressed_subject(msg_singlelined)
be configurable, so only a certain number of characters get displayed, ending on ellipses? Right now the entire twtxt is crammed into the Subject:
. This request aims to make twtxts display on mutt
/neomutt
, etc. more like emails do.
@david@collantes.us Aye, I’ve pushed some commits. (And this is really going to be the last non-trivial change. 😂)
compressed_subject(msg_singlelined)
be configurable, so only a certain number of characters get displayed, ending on ellipses? Right now the entire twtxt is crammed into the Subject:
. This request aims to make twtxts display on mutt
/neomutt
, etc. more like emails do.
@david@collantes.us Like that, right? https://movq.de/v/80f888d381/s.png
Okay, the recently implemented --fetch-context
, which asks a Yarn pod for a twt, wouldn’t break, but jenny would not be able anymore to verify that it actually got the correct twt. That’s a concrete example where we would lose functionality.
compressed_subject(msg_singlelined)
be configurable, so only a certain number of characters get displayed, ending on ellipses? Right now the entire twtxt is crammed into the Subject:
. This request aims to make twtxts display on mutt
/neomutt
, etc. more like emails do.
@david@collantes.us Yeah, I was annoyed by this myself lately. twts have become so long nowadays, it really gets in the way.
@prologic@twtxt.net Can you come up with actual scenarios where it would break? Or is it more of a gut feeling?
The thing that keeps bugging me is this:
If we were to switch to location-based addressing and (replyto:…)
, the edit problem would resolve itself. Implementations could use that exact string (e.g., https://example.com/tw.txt,2024-09-18T12:45Z
) as the internal identifier of a twt and that is pretty much the only change that you have to make. And then you could throw away all code and tests currently required for calculating hashes. (In jenny, I would also be able to and actually have to remove that code that skips over twts with a timestamp older than $last_fetch
. This only got added as a workaround “to avoid broken threads all the time”.) The net result would be less code.
Implementing this whole (edit:#hash)
thing means more code. (For jenny, specifically, a lot more code, if I want to allow users to create such twts.)
Do you see why I’m so reluctant to jump on this bandwagon? 😅
I haven’t come up yet with good, concrete examples where (replyto:…)
would break. As soon as that happens, I’ll change my mind. 🤔
For implementations, it would be nice if “update twts” always came after the twt they are referring to. So I thought about using this opportunity to mandate append-style feeds. But that’s just me being lazy. Implementations will have to be able to cope with any order, because feeds cannot/should not be trusted. 🫤
Trying to sum up the current proposal (keeping hashes):
- Extend the hash length to avoid collisions.
- Introduce the concept of, what shall we call it, “update twts”.
- A twt starting with
(edit:#3f36byq)
tells clients to update the twt#3f36byq
with the content of this particular twt.
- A twt starting with
(delete:#3f36byq)
advises clients to delete#3f36byq
from their storage.
- A twt starting with
Right?
you’d never have been able to find let alone pull up that ~3yr old Twt of me (my very first), hell I’d even though I lost my first feed file or it became corrupted or something
I get what you mean, but to be fair, it’s much less mysterious than that. 😅 The twt in question exists in your archived feed. It’s not like I pulled it out of some cache of an unrelated Yarn pod.
But, yes, I could have done that and I could have verified that it actually is the twt I was looking for. So that’s clearly an advantage of the current system.
(Or maybe I’m talking nonsense. That’s known to happen. I’ll go to bed. 😂)
jenny
, a -v
switch. That way when you twtxt "That’s an older format that was used before jenny version v23.04", I can go and run jenny -v
, and "duh!" myself on the way to a git pull
. :-D
@quark@ferengi.one Printing a version? I’ll think about it. 🤔
It would be easy to do for releases, but it’s a little hard to do for all the commits in between – jenny has no build process, so there’s no easy way to incorporate the output of git describe
, for example.
I’m not advocating in either direction, btw. I haven’t made up my mind yet. 😅 Just braindumping here.
The (replyto:…)
proposal is definitely more in the spirit of twtxt, I’d say. It’s much simpler, anyone can use it even with the simplest tools, no need for any client code. That is certainly a great property, if you ask me, and it’s things like that that brought me to twtxt in the first place.
I’d also say that in our tiny little community, message integrity simply doesn’t matter. Signed feeds don’t matter. I signed my feed for a while using GPG, someone else did the same, but in the end, nobody cares. The community is so tiny, there’s enough “implicit trust” or whatever you want to call it.
If twtxt/Yarn was to grow bigger, then this would become a concern again. But even Mastodon allows editing, so how much of a problem can it really be? 😅
I do have to “admit”, though, that hashes feel better. It feels good to know that we can clearly identify a certain twt. It feels more correct and stable.
Hm.
I suspect that the (replyto:…)
proposal would work just as well in practice.
@falsifian@www.falsifian.org @prologic@twtxt.net @lyse@lyse.isobeef.org
- editing, if you don’t care about message integrity
So that’s the big question, because that’s the only real difference between hashes and the (replyto:…)
proposal.
Do we care about message integrity?
With (replyto:…)
, someone could write a twt, then I reply to it, like “you’re absolutely right!”, and then that person could change their twt to something malicious like “the earth is flat!” And then it would look like I’m a nutcase agreeing with that person. 😅
Hashes (in their current form) prevent that. The thread is broken and my reply clearly refers to something else. That’s good, right?
But now take into account that we want to allow editing anyway. Is there even a point to using hashes anymore? Isn’t message integrity ignored anyway now, at least in practice?
There’s no difference (in practice) between someone writing
2024-09-18T12:34Z Brds are great!
and then editing it to either
2024-09-18T12:34Z (original:#12379) Birds are great! (Whoops, fixed a typo.)
or
2024-09-18T12:34Z (original:#12379) The earth is flat!
The actual original message is (potentially) gone. The only thing that we can be sure of now is that the twt was edited in some way. Essentially, the actual twt message is no longer part of the hash, is it? What does #12379
refer to? The edited message or the original one? We want it to refer to the edited one, because we don’t want to break threads, so … what’s the point of using a hash?
Regarding jenny development: There have been enough changes in the last few weeks, imo. I want to let things settle for a while (potential bugfixes aside) and then I’m going to cut a new release.
And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂
@aelaraji@aelaraji.com Looks like your shell didn’t turn the \n
into actual newlines:
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
@quark@ferengi.one They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.
@prologic@twtxt.net So the feed would contain two twts, right?
2024-09-18T23:08:00+10:00 Hllo World
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
@prologic@twtxt.net I don’t get paid for “standing by” and “waiting for a call”, that’s right. But I’m fine with that, because I don’t have to be available, either. 😅 If someone were to call me (or send me a text message), I wouldn’t be obliged to help them out. If I have the time and energy, I will do it, though. And that extra time will be paid.
It works for us because there are enough people around and there’s a good chance that someone will be able to help.
Really, I am glad that we have this model. The alternative would be actual on-call duty, like, this week you’re the poor bastard who is legally required to fix shit. That’s just horrible, I don’t want that. 😅
What I was referring to in the OP: Sometimes I check the workphone simply out of curiosity. 😂
@prologic@twtxt.net text/plain without an explicit charset is still just US-ASCII:
The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.
https://www.rfc-editor.org/rfc/rfc2046.html#section-4.1.2
https://www.rfc-editor.org/rfc/rfc6657#section-4
@lyse@lyse.isobeef.org Ouch. 🥴 Well, jenny always decodes as UTF-8 (because the spec says so) and this hasn’t caused any issues – yet.