🧮 USERS:1 FEEDS:2 TWTS:1120 ARCHIVED:79904 CACHE:2492 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1119 ARCHIVED:79896 CACHE:2501 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1118 ARCHIVED:79884 CACHE:2512 FOLLOWERS:17 FOLLOWING:14
@2024-10-09T08:11:00Z@twtxt.net It an easy way of twt-adressing by using the timestamp instead of a nick, which is arbitrary anyhow. Just my suggestion for a new reply-model ;)
🧮 USERS:1 FEEDS:2 TWTS:1117 ARCHIVED:79878 CACHE:2526 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1116 ARCHIVED:79864 CACHE:2542 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1115 ARCHIVED:79710 CACHE:2585 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1114 ARCHIVED:79694 CACHE:2581 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1113 ARCHIVED:79690 CACHE:2578 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1112 ARCHIVED:79684 CACHE:2598 FOLLOWERS:17 FOLLOWING:14
I share I did write up an algorithm for it at some point I think it is lost in a git comment someplace. I’ll put together a pseudo/go code this week.
Super simple:
Making a reply:
- If yarn has one use that. (Maybe do collision check?)
- Make hash of twt raw no truncation.
- Check local cache for shortest without collision
- in SQL:
select len(subject) where head_full_hash like subject || '%'
- in SQL:
Threading:
- Get full hash of head twt
- Search for twts
- in SQL:
head_full_hash like subject || '%' and created_on > head_timestamp
- in SQL:
The assumption being replies will be for the most recent head. If replying to an older one it will use a longer hash.
🧮 USERS:1 FEEDS:2 TWTS:1111 ARCHIVED:79666 CACHE:2610 FOLLOWERS:17 FOLLOWING:14
If we stuck with Blake2b for Twt Hash(es); what do we think we need to reasonably go to in bit length/size?
=> https://gist.mills.io/prologic/194993e7db04498fa0e8d00a528f7be6
e.g: (turns out @xuu@txt.sour.is is right about Blak2b being easy/simple too!):
$ printf "%s\t%s\t%s" "https://example.com/twtxt.txt" "2024-09-29T13:30:00Z" "Hello World!" | b2sum -l 32 -t | awk '{ print $1 }'
7b8b79dd
🧮 USERS:1 FEEDS:2 TWTS:1110 ARCHIVED:79632 CACHE:2622 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1109 ARCHIVED:79618 CACHE:2654 FOLLOWERS:17 FOLLOWING:14
@prologic@twtxt.net Regarding the new way of generating twt-hashes, to me it makes more sense to use tabs as separator instead of spaces, since the you can just copy/past a line directly from a twtxt-file that already go a tab between timestamp and message. But tabs might be hard to “type” when you are in a terminal, since it will activate autocomplete…🤔
Another thing, it seems that you sugget we only use the domain in the hash-creation and not the full path to the twtxt.txt
$ echo -e "https://example.com 2024-09-29T13:30:00Z Hello World!" | sha256sum - | awk '{ print $1 }' | base64 | head -c 12
🧮 USERS:1 FEEDS:2 TWTS:1108 ARCHIVED:79604 CACHE:2650 FOLLOWERS:17 FOLLOWING:14
I believe I’d missed an f
:
~/src/jenny $ git diff
diff --git a/jenny b/jenny
index ada8da2..8ae9a06 100755
--- a/jenny
+++ b/jenny
@@ -1194,7 +1194,7 @@ if __name__ == '__main__':
if args.edit:
edit_twt_file(app)
elif args.fetch:
- with DirectoryLock(f'/tmp/jenny-{getuser()}.run'):
+ with DirectoryLock(expanduser(f'~/tmp/jenny-{getuser()}.run')):
retrieve_all(app)
elif args.last_seen:
print('Feeds last seen at (times are local time), oldest first:')
@doesnm@doesnm.p.psf.lt I’ve just given it a try on android/termux and got it to work, I can’t promise it won’t break something else (because i definitely don’t know what I’m doing) but here’s what I broke 😅:
~/src/jenny $ git diff
diff --git a/jenny b/jenny
index ada8da2..8ae9a06 100755
--- a/jenny
+++ b/jenny
@@ -1194,7 +1194,7 @@ if __name__ == '__main__':
if args.edit:
edit_twt_file(app)
elif args.fetch:
- with DirectoryLock(f'/tmp/jenny-{getuser()}.run'):
+ with DirectoryLock(expanduser('~/tmp/jenny-{getuser()}.run')):
retrieve_all(app)
elif args.last_seen:
print('Feeds last seen at (times are local time), oldest first:')
and of course make sure you mkdir ~/tmp
twt
probably isn't the best client I'm afraid. It doesn't really cache twts by their key (hash) to display threads properly. Jenny however does 👌
It has twts cache which used if timeline is set to jew. Maybe i.should fork twet to make wishes like newlines (i see two squares), showing conversations, showing twts if not found in cache and parsing medata to configure url, nick and followers (currenly it duplicated in config and twtxt file)
@doesnm@doesnm.p.psf.lt twt
probably isn’t the best client I’m afraid. It doesn’t really cache twts by their key (hash) to display threads properly. Jenny however does 👌
twet display twts in raw format with some formatting (sadly no newlines). And for reply messages i just seen (#hash). But which text hidden on hash? currenly im open twtxt.net/twt/hash to see this
🧮 USERS:1 FEEDS:2 TWTS:1107 ARCHIVED:79577 CACHE:2662 FOLLOWERS:17 FOLLOWING:14
How to read twts without browser? I dont understand context in reply messages
👋 Thanks for joining us on our Sept monthly Yarn.social meetup today y’all 🙇♂️ We had @david@collantes.us @sorenpeter@darch.dk @doesnm@doesnm.p.psf.lt @falsifian@www.falsifian.org and @xuu@txt.sour.is 💪 Nice turn out! (not all at once of course, as we normally run this over 4 hours as we span many time zones!)
Things we talked about:
- Decentralised vs. Distributed
- Use of SHA256 for Twt Hash(es)
- We solved Edits! 🥳
- UUID(s) probably won’t work! (susceptible to sppofing)
- Helped @sorenpeter@darch.dk write some PHP to process/parse
User-Agent
and service his feed via a custom PHP script 😅
- @falsifian@www.falsifian.org introduced himself 👌
- Talked about Merkle Trees 🌳
Did I miss anything? 🤔
🧮 USERS:1 FEEDS:2 TWTS:1106 ARCHIVED:79449 CACHE:2645 FOLLOWERS:17 FOLLOWING:14
We:
- Drop
# url=
from the spec.
- We don’t adopt
# uuid =
– Something @anth@a.9srv.net also mentioned (see below)
We instead use the @nick@domain
to identify your feed in the first place and use that as the identify when calculating Twt hashes <id> + <timestamp> + <content>
. Now in an ideal world I also agree, use WebFinger for this and expect that for the most part you’ll be doing a WebFinger lookup of @user@domain
to fetch someone’s feed in the first place.
The only problem with WebFinger is should this be mandated or a recommendation?
@bender@twtxt.net Re that broken thread (#bqor23a)
. Its the same one. My pod doesn’t have the Root Twt: https://twtxt.net/twt/bqor23a => 404 Not Found.
How in the hell did you even reply to this in the first place?
🧮 USERS:1 FEEDS:2 TWTS:1105 ARCHIVED:79381 CACHE:2634 FOLLOWERS:17 FOLLOWING:14
Don’t mind this twt!
🧮 USERS:1 FEEDS:2 TWTS:1104 ARCHIVED:79360 CACHE:2633 FOLLOWERS:17 FOLLOWING:14
Good writeup, @anth@a.9srv.net! I agree to most of your points.
3.2 Timestamps: I feel no need to mandate UTC. Timezones are fine with me. But I could also live with this new restriction. I fail to see, though, how this change would make things any easier compared to the original format.
3.4 Multi-Line Twts: What exactly do you think are bad things with multi-lines?
4.1 Hash Generation: I do like the idea with with a new uuid
metadata field! Any thoughts on two feeds selecting the same UUID for whatever reason? Well, the same could happen today with url
.
5.1 Reply to last & 5.2 More work to backtrack: I do not understand anything you’re saying. Can you rephrase that?
8.1 Metadata should be collected up front: I generally agree, but if the uuid
metadata field were a feed URL and no real UUID, there should be probably an exception to change the feed URL mid-file after relocation.
🧮 USERS:1 FEEDS:2 TWTS:1103 ARCHIVED:79345 CACHE:2623 FOLLOWERS:17 FOLLOWING:14
And finally the legibility of feeds when viewing them in their raw form are worsened as you go from a Twt Subject of (#abcdefg12345)
to something like (https://twtxt.net/user/prologic/twtxt.txt 2024-09-22T07:51:16Z)
.
There is also a ~5x increase cost in memory utilization for any implementations or implementors that use or wish to use in-memory storage (yarnd
does for example) and equally a 5x increase in on-disk storage as well. This is based on the Twt Hash going from a 13 bytes (content-addressing) to 63 bytes (on average for location-based addressing). There is roughly a ~20-150% increase in the size of individual feeds as well that needs to be taken into consideration (on the average case).
So really your argument is just that switching to a location-based addressing “just makes sense”. Why? Without concrete pros/cons of each approach this isn’t really a strong argument I’m afraid. In fact I probably need to just sit down and detail the properties of both approaches and the pros/cons of both.
I also don’t really buy the argument of simplicity either personally, because I don’t technically see it much more difficult to take a echo -e "<url>\t<timestamp>\t<content>" | sha256sum | base64
as the Twt Subject or concatenating the <url> <timestamp>
– The “effort” is the same. If we’re going to argue that SHA256 or cryptographic hashes are “too complicated” then I’m not really sure how to support that argument.
Some more arguments for a local-based treading model over a content-based one:
The format:
(#<DATE URL>)
or(@<DATE URL>)
both makes sense: # as prefix is for a hashtag like we allredy got with the(#twthash)
and @ as prefix denotes that this is mention of a specific post in a feed, and not just the feed in general. Using either can make implementation easier, since most clients already got this kind of filtering.Having something like
(#<DATE URL>)
will also make mentions via webmetions for twtxt easier to implement, since there is no need for looking up the#twthash
. This will also make it possible to make 3th part twt-mentions services.Supporting twt/webmentions will also increase discoverability as a way to know about both replies and feed mentions from feeds that you don’t follow.
🧮 USERS:1 FEEDS:2 TWTS:1102 ARCHIVED:79309 CACHE:2611 FOLLOWERS:17 FOLLOWING:14
@falsifian@www.falsifian.org I believe the preserve means to include the original subject hash in the start of the twt such as (#somehash)
Sorry, you’re right, I should have used numbers!
I’m don’t understand what “preserve the original hash” could mean other than “make sure there’s still a twt in the feed with that hash”. Maybe the text could be clarified somehow.
I’m also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But it’s not universal; e.g. as a jenny user I just see the plain text.
🧮 USERS:1 FEEDS:2 TWTS:1101 ARCHIVED:79243 CACHE:2591 FOLLOWERS:17 FOLLOWING:14
@prologic@twtxt.net Thanks for writing that up!
I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.
I am not sure how I feel about all this being done at once, vs. letting conventions arise.
For example, even today I could reply to twt abc1234 with “(#abc1234) Edit: …” and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.
Similarly we could just start using 11-digit hashes. We should iron out whether it’s sha256 or whatever but there’s no need get all the other stuff right at the same time.
I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).
However I recognize that I’m not the one implementing this stuff, and it’s less work to just have everything determined up front.
Misc comments (I haven’t read the whole thing):
Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. I’d suggest gaining 11 bits with base64 instead.
“Clients MUST preserve the original hash” — do you mean they MUST preserve the original twt?
Thanks for phrasing the bit about deletions so neutrally.
I don’t like the MUST in “Clients MUST follow the chain of reply-to references…”. If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldn’t declare the client non-conforming just because they didn’t get to all the bells and whistles.
Similarly I don’t like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (I’m again thinking again of the 40-line shell script).
For “who follows” lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?
Why can’t feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasn’t too bad, but 1.0 would have been slightly simpler.
Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.
I’m a little sad about other protocols being not recommended.
I don’t know how I feel about including markdown. I don’t mind too much that yarn users emit twts full of markdown, but I’m more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.
So I’m a location based system, how exactly do I reply to one of these two Twts from @Yarns@search.twtxt.net ? 🤔
2024-09-07T12:55:56Z 🥳 NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z 🥳 NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
Had to build a list of all feeds (that I follow) and all twts in them and there are two collisions already:
$ ./stats
Saw 58263 hashes
7fqcxaa
https://twtxt.net/user/justamoment/twtxt.txt
https://twtxt.net/user/prologic/twtxt.txt
ntnakqa
https://twtxt.net/user/prologic/twtxt.txt
https://twtxt.net/user/thecanine/twtxt.txt
Namely:
$ jenny -D https://twtxt.net/user/justamoment/twtxt.txt | grep 7fqcxaa
[7fqcxaa] [2022-12-28 04:53:30+00:00] [(#pmuqoca) @prologic@twtxt.net I checked the GitHub discussion, it became a request to join forces.
Do you plan on having them join?
Also for the name, how about:
- “progit” or “prologit” (prologic official hard fork)
- “git-stance” (git instance)
- “GitTree” (Gitea inspired, maybe to related)
- “Gitomata” (git automata)
- “Git.Source”
- “Forgor” (forgit is taken so I forgor) 🤣
- “SweetGit” (as salty chat)
- “Pepper Git” (other ingredients) 😉
- “GitHeart” (core of git with a GitHub sounding name)
- “GitTaka” (With music in mind)
Ok, enough fun… Hope this helps sprout some ideas from others if nothing is to your taste.]
$ jenny -D https://twtxt.net/user/prologic/twtxt.txt/5 | grep 7fqcxaa
[7fqcxaa] [2022-02-25 21:14:45+00:00] [(#bqq6fxq) It’s handled by blue Monday]
And:
$ jenny -D https://twtxt.net/user/thecanine/twtxt.txt | grep ntnakqa
[ntnakqa] [2022-01-23 10:24:09+00:00] [(#2wh7r4q) <a href="https://txt.sour.is/external?uri=https://twtxt.net/user/prologic/twtxt.txt">@prologic<em>@twtxt.net</em></a> I know, I was just hoping it might have also gotten fixed by that change, by some kind of backend miracles. 😂]
$ jenny -D https://twtxt.net/user/prologic/twtxt.txt/1 | grep ntnakqa
[ntnakqa] [2024-02-27 05:51:50+00:00] [(#otuupfq) <a href="https://txt.sour.is/external?uri=https://twtxt.net/user/shreyan/twtxt.txt">@shreyan<em>@twtxt.net</em></a> Ahh 👌]
👋 Reminder folks of the upcoming Yarn.social monthly online meetup:
I hope to see @david@collantes.us @movq@www.uninformativ.de @lyse@lyse.isobeef.org @xuu@txt.sour.is @sorenpeter@darch.dk and hopefully others too @aelaraji@aelaraji.com @falsifian@www.falsifian.org and anyone else that sees this! 🙏 We’re hopefully going to primarily discuss the future of Twtxt and the last few weeks of discussions 🤣
- Event: Yarn.social Online Meetup
- When: 28th September 2024 at 12:00pm UTC (midday)
- Where: Mills Meet : Yarn.social
- Cadence: 4th Saturday of every Month
Agenda:
- Let’s talk about the upcoming changes to the Twtxt spec(s)
- See #xgghhnq
- See #xgghhnq
🧮 USERS:1 FEEDS:2 TWTS:1100 ARCHIVED:79197 CACHE:2589 FOLLOWERS:17 FOLLOWING:14
Been thinking about it for the last couple of days and I would say we can make do with the shorter (#<DATETIME URL>)
since it mirrors the twt-mention syntax and simply points to the OP as the topic identified by the time of posting it. Do we really need and (edit:...)
and (delete:...)
also?
@aelaraji@aelaraji.com This is one of the reasons why yarnd
has a couple of settings with some sensible/sane defaults:
I could already imagine a couple of extreme cases where, somewhere, in this peaceful world one’s exercise of freedom of speech could get them in Real trouble (if not danger) if found out, it wouldn’t necessarily have to involve something to do with Law or legal authorities. So, If someone asks, and maybe fearing fearing for… let’s just say ‘Their well being’, would it heart if a pod just purged their content if it’s serving it publicly (maybe relay the info to other pods) and call it a day? It doesn’t have to be about some law/convention somewhere … 🤷 I know! Too extreme, but I’ve seen news of people who’d gone to jail or got their lives ruined for as little as a silly joke. And it doesn’t even have to be about any of this.
There are two settings:
$ ./yarnd --help 2>&1 | grep max-cache
--max-cache-fetchers int set maximum numnber of fetchers to use for feed cache updates (default 10)
-I, --max-cache-items int maximum cache items (per feed source) of cached twts in memory (default 150)
-C, --max-cache-ttl duration maximum cache ttl (time-to-live) of cached twts in memory (default 336h0m0s)
So yarnd
pods by default are designed to only keep Twts around publicly visible on either the anonymous Frontpage or Discover View or your Timeline or the feed’s Timeline for up to 2 weeks with a maximum of 150 items, whichever get exceeded first. Any Twts over this are considered “old” and drop off the active cache.
It’s a feature that my old man @off_grid_living@twtxt.net was very strongly in support of, as was I back in the day of yarnd
’s design (nothing particularly to do with Twtxt per se) that I’ve to this day stuck by – Even though there are some 😉 that have different views on this 🤣
@movq@www.uninformativ.de @falsifian@www.falsifian.org @prologic@twtxt.net Maybe I don’t know what I’m talking about and You’ve probably already read this: Everything you need to know about the “Right to be forgotten” coming straight out of the EU’s GDPR Website itself. It outlines the specific circumstances under which the right to be forgotten applies as well as reasons that trump the one’s right to erasure …etc.
I’m no lawyer, but my uneducated guess would be that:
A) twts are already publicly available/public knowledge and such… just don’t process children’s personal data and MAYBE you’re good? Since there’s this:
… an organization’s right to process someone’s data might override their right to be forgotten. Here are the reasons cited in the GDPR that trump the right to erasure:
- The data is being used to exercise the right of freedom of expression and information.
- The data is being used to perform a task that is being carried out in the public interest or when exercising an organization’s official authority.
- The data represents important information that serves the public interest, scientific research, historical research, or statistical purposes and where erasure of the data would likely to impair or halt progress towards the achievement that was the goal of the processing.
B) What I love about the TWTXT sphere is it’s Human/Humane element! No deceptive algorithms, no Corpo B.S …etc. Just Humans. So maybe … If we thought about it in this way, it wouldn’t heart to be even nicer to others/offering strangers an even safer space.
I could already imagine a couple of extreme cases where, somewhere, in this peaceful world one’s exercise of freedom of speech could get them in Real trouble (if not danger) if found out, it wouldn’t necessarily have to involve something to do with Law or legal authorities. So, If someone asks, and maybe fearing fearing for… let’s just say ‘Their well being’, would it heart if a pod just purged their content if it’s serving it publicly (maybe relay the info to other pods) and call it a day? It doesn’t have to be about some law/convention somewhere … 🤷 I know! Too extreme, but I’ve seen news of people who’d gone to jail or got their lives ruined for as little as a silly joke. And it doesn’t even have to be about any of this.
P.S: Maybe make X
tool check out robots.txt? Or maybe make long-term archives Opt-in? Opt-out?
P.P.S: Already Way too many MAYBE’s in a single twt! So I’ll just shut up. 😅
🧮 USERS:1 FEEDS:2 TWTS:1099 ARCHIVED:79147 CACHE:2577 FOLLOWERS:17 FOLLOWING:14
@david@collantes.us Thanks, that’s good feedback to have. I wonder to what extent this already exists in registry servers and yarn pods. I haven’t really tried digging into the past in either one.
How interested would you be in changes in metadata and other comments in the feeds? I’m thinking of just permanently saving every version of each twtxt file that gets pulled, not just the twts. It wouldn’t be hard to do (though presenting the information in a sensible way is another matter). Compression should make storage a non-issue unless someone does something weird with their feed like shuffle the comments around every time I fetch it.
@falsifian@www.falsifian.org “I was actually thinking about making an Internet Archive style twtxt archiver, letting you explore past twts” — that’s an awesome idea for a project. Something I would certainly use!
(replyto:…)
over (edit:#)
: (replyto:…)
relies on clients always processing the entire feed – otherwise they wouldn’t even notice when a twt gets updated. a) This is more expensive, b) you cannot edit twts once they get rotated into an archived feed, because there is nothing signalling clients that they have to re-fetch that archived feed.
@movq@www.uninformativ.de I don’t think it has to be like that. Just make sure the new version of the twt is always appended to your current feed, and have some convention for indicating it’s an edit and which twt it supersedes. Keep the original twt as-is (or delete it if you don’t want new followers to see it); doesn’t matter if it’s archived because you aren’t changing that copy.
@prologic@twtxt.net Do you have a link to some past discussion?
Would the GDPR would apply to a one-person client like jenny? I seriously hope not. If someone asks me to delete an email they sent me, I don’t think I have to honour that request, no matter how European they are.
I am really bothered by the idea that someone could force me to delete my private, personal record of my interactions with them. Would I have to delete my journal entries about them too if they asked?
Maybe a public-facing client like yarnd needs to consider this, but that also bothers me. I was actually thinking about making an Internet Archive style twtxt archiver, letting you explore past twts, including long-dead feeds, see edit histories, deleted twts, etc.
One distinct disadvantage of (replyto:…)
over (edit:#)
: (replyto:…)
relies on clients always processing the entire feed – otherwise they wouldn’t even notice when a twt gets updated. a) This is more expensive, b) you cannot edit twts once they get rotated into an archived feed, because there is nothing signalling clients that they have to re-fetch that archived feed.
I guess neither matters that much in practice. It’s still a disadvantage.
🧮 USERS:1 FEEDS:2 TWTS:1098 ARCHIVED:79066 CACHE:2530 FOLLOWERS:17 FOLLOWING:14
@prologic@twtxt.net Hi. i have noticed sometimes when i hit the back button i lose all the surrounding layout and just have a list of twts.
BTW this code doesn’t incorporate existing twts into jenny’s database. It’s best used starting from scratch. I’ve been testing it using a custom XDG_CACHE_HOME and XDG_CONFIG_HOME to avoid messing with my “real” jenny data.
I wrote some code to try out non-hash reply subjects formatted as (replyto ), while keeping the ability to use the existing hash style.
I don’t think we need to decide all at once. If clients add support for a new method then people can use it if they like. The downside of course is that this costs developer time, so I decided to invest a few hours of my own time into a proof of concept.
With apologies to @movq@www.uninformativ.de for corrupting jenny’s beautiful code. I don’t write this expecting you to incorporate the patch, because it does complicate things and might not be a direction you want to go in. But if you like any part of this approach feel free to use bits of it; I release the patch under jenny’s current LICENCE.
Supporting both kinds of reply in jenny was complicated because each email can only have one Message-Id, and because it’s possible the target twt will not be seen until after the twt referencing it. The following patch uses an sqlite database to keep track of known (url, timestamp) pairs, as well as a separate table of (url, timestamp) pairs that haven’t been seen yet but are wanted. When one of those “wanted” twts is finally seen, the mail file gets rewritten to include the appropriate In-Reply-To header.
Patch based on jenny commit 73a5ea81.
https://www.falsifian.org/a/oDtr/patch0.txt
Not implemented:
- Composing twts using the (replyto …) format.
- Probably other important things I’m forgetting.
@prologic@twtxt.net Wikipedia claims sha1 is vulnerable to a “chosen-prefix attack”, which I gather means I can write any two twts I like, and then cause them to have the exact same sha1 hash by appending something. I guess a twt ending in random junk might look suspcious, but perhaps the junk could be worked into an image URL like
. If that’s not possible now maybe it will be later.git only uses sha1 because they’re stuck with it: migrating is very hard. There was an effort to move git to sha256 but I don’t know its status. I think there is progress being made with Game Of Trees, a git clone that uses the same on-disk format.
I can’t imagine any benefit to using sha1, except that maybe some very old software might support sha1 but not sha256.
🧮 USERS:1 FEEDS:2 TWTS:1097 ARCHIVED:79000 CACHE:2495 FOLLOWERS:17 FOLLOWING:14
@movq@www.uninformativ.de Agreed that hashes have a benefit. I came up with a similar example where when I twted about an 11-character hash collision. Perhaps hashes could be made optional somehow. Like, you could use the “replyto” idea and then additionally put a hash somewhere if you want to lock in which version of the twt you are replying to.
There is nothing wrong with how we currently run a diff to see what has been removed. if i build a merkle tree off all the twt hashes in a feed i can use that to verify a twt should be in a feed or not. and gossip that to my peers.
So.. basically a rehash of the email “unsend” requests? What if i was to make a (delete: 5vbi2ea)
.. would it delete someone elses twt?
I’m not advocating in either direction, btw. I haven’t made up my mind yet. 😅 Just braindumping here.
The (replyto:…)
proposal is definitely more in the spirit of twtxt, I’d say. It’s much simpler, anyone can use it even with the simplest tools, no need for any client code. That is certainly a great property, if you ask me, and it’s things like that that brought me to twtxt in the first place.
I’d also say that in our tiny little community, message integrity simply doesn’t matter. Signed feeds don’t matter. I signed my feed for a while using GPG, someone else did the same, but in the end, nobody cares. The community is so tiny, there’s enough “implicit trust” or whatever you want to call it.
If twtxt/Yarn was to grow bigger, then this would become a concern again. But even Mastodon allows editing, so how much of a problem can it really be? 😅
I do have to “admit”, though, that hashes feel better. It feels good to know that we can clearly identify a certain twt. It feels more correct and stable.
Hm.
I suspect that the (replyto:…)
proposal would work just as well in practice.
@falsifian@www.falsifian.org “I don’t really mind if the twt gets edited before I even fetch it.”, right, that’s never the problem. Editing a twtxt before anyone fetches it isn’t even editing, right? :-P The problem we are trying to fix is the havoc is causes editing twtxts that have already been replied to, often ad nauseam. That’s the real problem.
@quark@ferengi.one I don’t really mind if the twt gets edited before I even fetch it. I think it’s the idea of my computer discarding old versions it’s fetched, especially if it’s shown them to me, that bugs me.
But I do like @movq@www.uninformativ.de’s suggestion on this thread that feeds could contain both the original and the edited twt. I guess it would be up to the author.
An alternate idea for supporting (properly) Twt Edits is to denoate as such and extend the meaning of a Twt Subject (which would need to be called something better?); For example, let’s say I produced the following Twt:
2024-09-18T23:08:00+10:00 Hllo World
And my feed’s URI is https://example.com/twtxt.txt
. The hash for this Twt is therefore 229d24612a2
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of 026d77e03fa
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this edit:#229d24612a2
to mean, this Twt is an edit of 229d24612a2
and should be replaced in the client’s cache, or indicated as such to the user that this is the intended content.
🧮 USERS:1 FEEDS:2 TWTS:1096 ARCHIVED:78954 CACHE:2471 FOLLOWERS:17 FOLLOWING:14
@quark@ferengi.one It looks like the part about traditional topics has been removed from that page. Here is an old version that mentions it: https://web.archive.org/web/20221211165458/https://dev.twtxt.net/doc/twtsubjectextension.html . Still, I don’t see any description of what is actually allowed between the parentheses. May be worth noting that twtxt.net is displaying the twts with the subject stripped, so some piece of code is recognizing it as a subject (or, at least, something to be removed).
@falsifian@www.falsifian.org based on Twt Subject Extension, your subject is invalid. You can have custom subjects, that is, not a valid hash, but you simply can’t put anything, and expect it to be treated as a TwtSubject
, me thinks.
Hmm, but yarnd also isn’t showing these twts as being part of a thread. @prologic@twtxt.net you said yarnd respects customs subjects. Shouldn’t these twts count as having a custom subject, and get threaded together?
yarnd just doesn’t render the subject. Fair enough. It’s (replyto http://darch.dk/twtxt.txt 2024-09-15T12:50:17Z), and if you don’t want to go on a hunt, the twt hash is weadxga: https://twtxt.net/twt/weadxga
@sorenpeter@darch.dk I like this idea. Just for fun, I’m using a variant in this twt. (Also because I’m curious how it non-hash subjects appear in jenny and yarn.)
URLs can contain commas so I suggest a different character to separate the url from the date. Is this twt I’ve used space (also after “replyto”, for symmetry).
I think this solves:
- Changing feed identities: although @mckinley@twtxt.net points out URLs can change, I think this syntax should be okay as long as the feed at that URL can be fetched, and as long as the current canonical URL for the feed lists this one as an alternate.
- editing, if you don’t care about message integrity
- finding the root of a thread, if you’re not following the author
An optional hash could be added if message integrity is desired. (E.g. if you don’t trust the feed author not to make a misleading edit.) Other recent suggestions about how to deal with edits and hashes might be applicable then.
People publishing multiple twts per second should include sub-second precision in their timestamps. As you suggested, the timestamp could just be copied verbatim.
@aelaraji@aelaraji.com this is my change on main.go
(but it can be done on a template now, so no reason to touch the code):
<time class="dt-published" datetime="{{ $twt.Created | date "2006-01-02T15:04:05Z07:00" }}">
{{ $twt.Created | date "2006-01-02 15:04:05 MST" }}
</time>
See https://ferengi.one. I am going to further customise things, but that’s a start.
(#hash;#originalHash)
would also work.
Maybe I’m being a bit too purist/minimalistic here. As I said before (in one of the 1372739 posts on this topic – or maybe I didn’t even send that twt, I don’t remember 😅), I never really liked hashes to begin with. They aren’t super hard to implement but they are kind of against the beauty of the original twtxt – because you need special client support for them. It’s not something that you could write manually in your
twtxt.txt
file. With @sorenpeter@darch.dk’s proposal, though, that would be possible.
Tangentially related, I was a bit disappointed to learn that the twt subject extension is now never used except with hashes. Manually-written subjects sounded so beautifully ad-hoc and organic as a way to disambiguate replies. Maybe I’ll try it some time just for fun.
@aelaraji@aelaraji.com I just added support for passing a custom template file via -T/--template
in case you need a custom template 👌
prologic@JamessMacStudio
Wed Sep 18 01:27:29
~/Projects/yarnsocial/twtxt2html
(main) 130
$ ./twtxt2html --help
Usage: twtxt2html [options] FILE|URL
twtxt2html converts a twtxt feed to a static HTML page
-d, --debug enable debug logging
-l, --limit int limit number ot twts (default all) (default -1)
-n, --noreldate do now show twt relative dates
-r, --reverse reverse the order of twts (oldest first)
-T, --template string path to template file
-t, --title string title of generated page (default "Twtxt Feed")
-v, --version display version information
pflag: help requested
@quark@ferengi.one At the moment, the twt in question exists in the sixth archive:
$ jenny -D https://twtxt.net/user/prologic/twtxt.txt/6 | head
[o6dsrga] [2020-07-18 12:39:52+00:00] [Hello World! 😊]
Does that work for you? 🤔
@prologic@twtxt.net Yeah, that thing with (#hash;#originalHash)
would also work.
Maybe I’m being a bit too purist/minimalistic here. As I said before (in one of the 1372739 posts on this topic – or maybe I didn’t even send that twt, I don’t remember 😅), I never really liked hashes to begin with. They aren’t super hard to implement but they are kind of against the beauty of the original twtxt – because you need special client support for them. It’s not something that you could write manually in your twtxt.txt
file. With @sorenpeter@darch.dk’s proposal, though, that would be possible.
I don’t know … maybe it’s just me. 🥴
I’m also being a bit selfish, to be honest: Implementing (#hash;#originalHash)
in jenny for editing your own feed would not be a no-brainer. (Editing is already kind of unsupported, actually.) It wouldn’t be a problem to implement it for fetching other people’s feeds, though.
@bender@twtxt.net It’s just a simple twtxt2html and scp … it goes like:
twtxt2html $HOME/path/to/local_twtxt_dir/twtxt.txt > $HOME/path/to/local_twtxt_dir/log.html && \
scp $HOME/path/to/local_twtxt_dir/log.html user@remotehost:/path/to/static_files_dir/
I’ve been lazy to add it to my publish_command script, now I can just copy/pasta from the twt 😅
@movq@www.uninformativ.de I’m glad you like it. A mention (@<movq https://www.uninformativ.de/twtxt.txt>
) is also long, but we live with it anyway. In a way a replyto:
is just a mention of a twt instead of a feed/person. Maybe we chould even model the syntax for replies on mentions: (#<2024-09-17T08:39:18Z https://www.eksempel.dk/twtxt.txt>)
?!
This scheme also only support threading off a specific Twt of someone’s feed. What if you’re not replying to anyone in particular?
🧮 USERS:1 FEEDS:2 TWTS:1095 ARCHIVED:78843 CACHE:2434 FOLLOWERS:17 FOLLOWING:14
(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
I think I like this a lot. 🤔
The problem with using hashes always was that they’re “one-directional”: You can construct a hash from URL + timestamp + twt, but you cannot do the inverse. When I see “, I have no idea what that could possibly refer to.
But of course something like (replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
has all the information you need. This could simplify twt/feed discovery quite a bit, couldn’t it? 🤔 That thing that I just implemented – jenny asking some Yarn pod for some twt hash – would not be necessary anymore. Clients could easily and automatically fetch complete threads instead of requiring the user to follow all relevant feeds.
Only using the timestamp to identify a twt also solves the edit problem.
It even is better for non-Yarn clients, because you now don’t have to read, understand, and implement a “twt hash specification” before you can reply to someone.
The only problem, really, is that (replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
is so long. Clients would have to try harder to hide this. 😅
Bonus: On his Pod/Profile it shows as if his last twt is from 4 Months ago.
🧮 USERS:1 FEEDS:2 TWTS:1094 ARCHIVED:78808 CACHE:2451 FOLLOWERS:17 FOLLOWING:14
Something odd just happened to my twtxt timeline… A bunch of twts dissapered, others were marked to be deleted in mutt. so I nuked my whole twtxt Maildir and deleted my ~/.cache/jenny in order to start with a fresh Pull. I pulled feed as usual. Now like HALF the twts aren’t there 😂 even my my last replay. WTF IS GOING ON? 🤣🤣🤣
Alright, I saw enough broken threads lately to be motivated enough to extend the --fetch-context
thingy: It can now ask Yarn pods for twt hashes.
https://www.uninformativ.de/git/jenny/commit/eefd3fa09083e2206ed0d71887d2ef2884684a71.html
This is only done as a last resort if there’s no other way to find the missing twt. Like, when there’s a twt that begins with just a hash and no user mention, there’s no way for jenny to know on which feed that twt can be found, so it’ll ask some Yarn pod in that case.
@prologic@twtxt.net Brute force. I just hashed a bunch of versions of both tweets until I found a collision.
I mostly just wanted an excuse to write the program. I don’t know how I feel about actually using super-long hashes; could make the twts annoying to read if you prefer to view them untransformed.
🧮 USERS:1 FEEDS:2 TWTS:1093 ARCHIVED:78768 CACHE:2438 FOLLOWERS:17 FOLLOWING:14
@prologic@twtxt.net earlier you suggested extending hashes to 11 characters, but here’s an argument that they should be even longer than that.
Imagine I found this twt one day at https://example.com/twtxt.txt :
2024-09-14T22:00Z Useful backup command: rsync -a “$HOME” /mnt/backup
and I responded with “(#5dgoirqemeq) Thanks for the tip!”. Then I’ve endorsed the twt, but it could latter get changed to
2024-09-14T22:00Z Useful backup command: rm -rf /some_important_directory
which also has an 11-character base32 hash of 5dgoirqemeq. (I’m using the existing hashing method with https://example.com/twtxt.txt as the feed url, but I’m taking 11 characters instead of 7 from the end of the base32 encoding.)
That’s what I meant by “spoofing” in an earlier twt.
I don’t know if preventing this sort of attack should be a goal, but if it is, the number of bits in the hash should be at least two times log2(number of attempts we want to defend against), where the “two times” is because of the birthday paradox.
Side note: current hashes always end with “a” or “q”, which is a bit wasteful. Maybe we should take the first N characters of the base32 encoding instead of the last N.
Code I used for the above example: https://fossil.falsifian.org/misc/file?name=src/twt_collision/find_collision.c
I only needed to compute 43394987 hashes to find it.
HTTPS is supposed to do [verification] anyway.
TLS provides verification that nobody is tampering with or snooping on your connection to a server. It doesn’t, for example, verify that a file downloaded from server A is from the same entity as the one from server B.
I was confused by this response for a while, but now I think I understand what you’re getting at. You are pointing out that with signed feeds, I can verify the authenticity of a feed without accessing the original server, whereas with HTTPS I can’t verify a feed unless I download it myself from the origin server. Is that right?
I.e. if the HTTPS origin server is online and I don’t mind taking the time and bandwidth to contact it, then perhaps signed feeds offer no advantage, but if the origin server might not be online, or I want to download a big archive of lots of feeds at once without contacting each server individually, then I need signed feeds.
feed locations [being] URLs gives some flexibility
It does give flexibility, but perhaps we should have made them URIs instead for even more flexibility. Then, you could use a tag URI,
urn:uuid:*
, or a regular old URL if you wanted to. The spec seems to indicate that theurl
tag should be a working URL that clients can use to find a copy of the feed, optionally at multiple locations. I’m not very familiar with IP{F,N}S but if it ensures you own an identifier forever and that identifier points to a current copy of your feed, it could be a great way to fix it on an individual basis without breaking any specs :)
I’m also not very familiar with IPFS or IPNS.
I haven’t been following the other twts about signatures carefully. I just hope whatever you smart people come up with will be backwards-compatible so it still works if I’m too lazy to change how I publish my feed :-)
🧮 USERS:1 FEEDS:2 TWTS:1092 ARCHIVED:78761 CACHE:2445 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1091 ARCHIVED:78750 CACHE:2482 FOLLOWERS:17 FOLLOWING:14
🧮 USERS:1 FEEDS:2 TWTS:1090 ARCHIVED:78738 CACHE:2498 FOLLOWERS:17 FOLLOWING:14
@sorenpeter@darch.dk There was a client that would generate a unique hash for each twt. It didn’t get wide adoption.
🧮 USERS:1 FEEDS:2 TWTS:1089 ARCHIVED:78724 CACHE:2505 FOLLOWERS:17 FOLLOWING:14
@prologic@twtxt.net do that mean that for every new post (not replies) the client will have to generate a UUID or similar when posting and add that to to the twt?
🧮 USERS:1 FEEDS:2 TWTS:1088 ARCHIVED:78704 CACHE:2506 FOLLOWERS:17 FOLLOWING:14