Searching txt.sour.is

Twts matching #hash
Sort by: Newest, Oldest, Most Relevant

I have zero mental energy for programming at the moment. 🫤

I’ll try to implement the new hashing stuff in jenny before the ā€œdeadlineā€. But I don’t think you’ll see any texudus development from me in the near future. ā˜¹ļø

⤋ Read More
In-reply-to » I've just released version 1.0 of twtxt.el (the Emacs client), the stable and final version with the current extensions. I'll let the community maintain it, if there are interested in using it. I will also be open to fix small bugs. I don't know if this twt is a goodbye or a see you later. Maybe I will never come back, or maybe I will post a new twt this afternoon. But it's always important to be grateful. Thanks to @prologic @movq @eapl.me @bender @aelaraji @arne @david @lyse @doesnm @xuu @sorenpeter for everything you have taught me. I've learned a lot about #twtxt, HTTP and working in community. It has been a fantastic adventure! What will become of me? I have created a twtxt fork called Texudus (https://texudus.readthedocs.io/). I want to continue learning on my own without the legacy limitations or technologies that implement twtxt. It's not a replacement for any technology, it's just my own little lab. I have also made a fork of my own client and will be focusing on it for a while. I don't expect anyone to use it, but feedback is always welcome. Best regards to everyone. #twtxt #emacs #twtxt-el #texudus

@ About the URL, since it no longer used for hashing there might be no need to change it. I agree that we keep all the parts that already are out there for the most parts. Instead of a contact field you could also just use links like: link = Email mailto:user@example.dk or link = Signal https://signal.me/sthF4raI5Lg_ybpJwB1sOptDla4oU7p[...]

⤋ Read More
In-reply-to » @prologic Not sure I’d attach any if clauses to this. My point is: Every time I see a hash, I’d like to have a hint as to where to find the corresponding twt.

The reason I think this can work so well and I’m in full support of it is that it’s the least disruptive way to resolve the issue of:

where did this hash come from?

⤋ Read More
In-reply-to » If we must stick to hashes for threading, can we maybe make it mandatory to always include a reference to the original twt URL when writing replies?

@prologic@twtxt.net Not sure I’d attach any if clauses to this. My point is: Every time I see a hash, I’d like to have a hint as to where to find the corresponding twt.

⤋ Read More
In-reply-to » If we must stick to hashes for threading, can we maybe make it mandatory to always include a reference to the original twt URL when writing replies?

@movq@www.uninformativ.de If we’re focusing on solving the ā€œmissing rootsā€ problems. I would start to think about ā€œclient recommendationsā€. The first recommendation would be:

  1. Replying to a Twt that has no initial Subject must itself have a Subject of the form (hash; url).

This way it’s a hint to fetching clients that follow B, but not A (in the case of no mentions) that the Subject/Root might (very likely) is in the feed url.

⤋ Read More

If we must stick to hashes for threading, can we maybe make it mandatory to always include a reference to the original twt URL when writing replies?

Instead of

(<a href="https://txt.sour.is/search?q=%23123467">#123467</a>) hello foo bar

you would have

(<a href="https://txt.sour.is/search?q=%23123467">#123467</a> http://foo.com/tw.txt) hello foo bar

or maybe even:

(<a href="https://txt.sour.is/search?q=%23123467">#123467</a> 2025-04-30T12:30:31Z http://foo.com/tw.txt) hello foo bar

This would greatly help in reconstructing broken threads, since hashes are obviously unfortunately one-way tickets. The URL/timestamp would not be used for threading, just for discovery of feeds that you don’t already follow.

I don’t insist on including the timestamp, but having some idea which feed we’re talking about would help a lot.

⤋ Read More
In-reply-to » Finally I propose that we increase the Twt Hash length from 7 to 12 and use the first 12 characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q or a (oops) šŸ˜… And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! -- I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.social's 5th birthday and 5 years since I started this whole project and endeavour! 😱 #Twtxt #Update

July 1st. 63 days from now to implement a backward-incompatible change, apparently not open to other ideas like replacing blake with SHA, or discussing implementation challenges for other languages and platforms.
Finally just closing #18, #19 and #20 without starting a proper discussion and ignoring a ā€˜micro consensus’ feels… not right.

I don’t know what to think rather than letting it rest (May will be busy here) and focus on other stuff in the future.

twt-hash-v2.md#implementation-timeline

⤋ Read More
In-reply-to » Finally I propose that we increase the Twt Hash length from 7 to 12 and use the first 12 characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q or a (oops) šŸ˜… And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! -- I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.social's 5th birthday and 5 years since I started this whole project and endeavour! 😱 #Twtxt #Update

I will be adding the code in for yarnd very soonā„¢ for this change, with a if the date is >= 2025-07-01 then compute_new_hashes else compute_old_hashes

⤋ Read More

Finally I propose that we increase the Twt Hash length from 7 to 12 and use the first 12 characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q or a (oops) šŸ˜… And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! – I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.social’s 5th birthday and 5 years since I started this whole project and endeavour! 😱 #Twtxt #Update

⤋ Read More

I had Chick-fil-A breakfast today (sausage, egg, and cheese biscuit, hash browns, coffee, and orange juice). Then at lunch my work place offered hot dogs. I had two (kosher, if that matters), plus a coke, a macadamia nuts cookie, and a small chocolate brownie.

So, here I am, at home, feeling hungry but guilty and refusing to eat anything else for the rest of the day. To top it off, I have only clocked 4,000 steps today (and I don’t feel like walking). I am going to hell, am I?

⤋ Read More
In-reply-to » @andros maybe create a separate, completely distinct feed for DM? That way, clients do not need to do anything, only those wanted to "talk in private" follow themselves, using their very special dm-only.txt feeds. šŸ˜‚

by commenting out DMs are you giving up on simplicity? See the Metadata extension holding the data inside comments, as the client doesn’t need to show it inside the timeline.

I don’t think that commenting out DMs as we are doing for metadata is giving up on simplicity (it’s a feature already), and it helps to hide unwanted DMs to clients that will take months to add it’s support to something named… an extension.

For some other extensions in https://twtxt.dev/extensions.html (for example the reply-to hash #abcdfeg or the mention @ < example http://example.org/twtxt.txt >) is not a big deal. The twt is still understandable in plain text.
For DM, it’s only interesting for you if you are the recipient, otherwise you see an scrambled message like 1234567890abcdef=. Even if you see it, you’ll need some decryption to read it. I’ve said before that DMs shouldn’t be in the same section that the timeline as it’s confusing.

So my point stands, and as I’ve said before, we are discussing it as a community, so let’s see what other maintainers add to the convo.

⤋ Read More
In-reply-to » @andros maybe create a separate, completely distinct feed for DM? That way, clients do not need to do anything, only those wanted to "talk in private" follow themselves, using their very special dm-only.txt feeds. šŸ˜‚

After reading you, @eapl.me@eapl.me, I’ll tell you my point of view.
In my opinion, a feed does not have to be equivalent to a timeline. A timeline is a representation of the feed adapted to a user. You may not be interested in seeing other people’s threads or DMs. But perhaps they are interested in seeing mentions or DMs directed at them. It is important not to fall into the trap. With that clarification…
I insist, this is my point of view, it is not an absolute truth: I don’t think extensions should be respectful of customers who are no longer maintained.
We cannot have a system that is simple, backwards compatible and extensible all at the same time. We have to give up some of the 3 points. I would not like to give up simplicity because it will then make it harder to maintain the customers who do stay. Therefore, I think it is better to give up backwards compatibility and play with new formulas in the extensions. I don’t think it’s a good idea to make a hash keep so much load: a hashtag, a thread and also a DM.

⤋ Read More
In-reply-to » @bender I noticed that although the Discover view (and your own Timeline) is much improved with a MaxAgeDays configuration at the pod level, that now some profiles are rather empty. This is only because well, they're a bit "inactive" so to speak šŸ—£ļø Not sure what to do about this at the moment... Open to ideas? šŸ’”

yes it used be http:// only and to keep hashes from breaking i added # url = http://... and now we are stock with it due to the curret specs.

⤋ Read More

Hmmm there’s a bug somewhere in the way I’m ingesting archived feeds šŸ¤”

sqlite> select * from twts where content like 'The web is such garbage these days%';
      hash = 37sjhla
  feed_url = https://twtxt.net/user/prologic/twtxt.txt/1
   content = The web is such garbage these days šŸ˜” Or is it the garbage search engines? šŸ¤”
   created = 2024-11-14T01:53:46Z
created_dt = 2024-11-14 01:53:46
   subject = #37sjhla
  mentions = []
      tags = []
     links = []
sqlite>

⤋ Read More

Some A hole has been trying to pull every single Twtxt feed that existed/still exists since forever. How do I know? Welp’ They’ve been querying my Timelineā„¢ instance for all of it, every single twtxt file and twt Hash they can find. šŸ˜†šŸ¤¦ It must have been going on for days and I have just noticed… + it’s all coming from the same ASN AS136907 HWCLOUDS-AS-AP HUAWEI CLOUDS

Thank you Huawei for the DDos you sons of Glitches!!!

⤋ Read More
In-reply-to » @andros maybe create a separate, completely distinct feed for DM? That way, clients do not need to do anything, only those wanted to "talk in private" follow themselves, using their very special dm-only.txt feeds. šŸ˜‚

@bender@twtxt.net @aelaraji@aelaraji.com The client should ignore twts if it’s not compatible or not addressed to me. it’s a simple regex to add! It’s similar to Twt Hash Extension, should they be in another file? They are child messages, not flat twt. Not of course!

⤋ Read More
In-reply-to » (#znf6csa) @prologic What happened here – did I edit my twt or is this hash wrong? 🄓

Doesn’t look like it Hmmm

sqlite> select * from twts where content LIKE '%Linux installation%';
    hash = znf6csa
feed_url = https://www.uninformativ.de/twtxt.txt
 content = I wonder if my current Linux installation will actually make it to 20 years:

    $ head -n 1 /var/log/pacman.log
    [2011-07-07 11:19] installed filesystem (2011.04-1)

It’s not toooo far into the future.

It would be crazy … 20 years without reinstalling once … phew. 🄓
 created = 2025-04-07T19:59:51Z
 subject = (#znf6csa)
mentions = []
    tags = []
   links = []

⤋ Read More
In-reply-to » Oh well. I've gone and done it again! This time I've lost 4 months of data because for some reason I've been busy and haven't been taking backups of all the things I should be?! šŸ¤” Farrrrk 🤬

@prologic@twtxt.net Spring cleanup! That’s one way to encourage people to self-host their feeds. :-D

Since I’m only interested in the url metadata field for hashing, I do not keep any comments or metadata for that matter, just the messages themselves. The last time I fetched was probably some time yesterday evening (UTC+2). I cannot tell exactly, because the recorded last fetch timestamp has been overridden with today’s by now.

I dumped my new SQLite cache into: https://lyse.isobeef.org/tmp/backup.tar.gz This time maybe even correctly, if you’re lucky. I’m not entirely sure. It took me a few attempts (date and time were separated by space instead of T at first, I normalized offsets +00:00 to Z as yarnd does and converted newlines back to U+2028). At least now the simple cross check with the Twtxt Feed Validator does not yield any problems.

⤋ Read More
In-reply-to » Hello, i want to present my new revolution twtxt v3 format - twjson That's why you should use it: 1. It's easy to to parse 2. It's easy to read (in formatted mode :D) 3. It used actually \n for newlines, you don't need unprintable symbols 4. Forget about hash collisions because using full hash Here is my twjson feed: https://doesnm.p.psf.lt/twjson.json And twtxt2json converter: https://doesnm.p.psf.lt/twjson.js

Amazing! It is a good tool for reading feeds. What you used to calculate the hash?

⤋ Read More
In-reply-to » Wow, phishing is just around the corner šŸ‘€

@eapl.me@eapl.me Interesting! Two points stood right out to me:

  1. Why the hell are e-mail newsletters considered a valid option in the first place? Just offer an Atom feed and be done with it! Especially for a blog of this very type. This doesn’t even involve a third party service. Although, in addition he also links to Feedburner, what the fuck!? No e-mail address or the like is needed and subject to being disclosed.

  2. When these spam mailers want to prevent resubscribing, then for fuck’s sake, why don’t they use a hash of the e-mail address (I saw that in yarnd) for that purpose? Storing the e-mail address in clear text after unsubscribing is illegal in my book.

⤋ Read More
In-reply-to » Thanks, @movq!

There are 82.108 read statuses, but only 24.421 messages in the cache. In contrast to the cache with the messages, the read statuses are never cleaned up when a feed was unsubscribed from. And the read statuses also contain old style hashes, before we settled on the what we have today. Still a huge difference. Hmm.

⤋ Read More
In-reply-to » I now subscribed to most feeds in my Go tt reimplementation that I already followed with the old Python tt. Previously, I just had a few feeds for testing purposes in my new config. While transfering, I "dropped" heaps of feeds that appeared to be inactive.

Thanks, @movq@www.uninformativ.de!

My backing SQLite database with indices is 8.7 MiB in size right now.

The twtxt cache is 7.6 MiB, it uses Python’s pickle module. And next to it there is a 16.0 MiB second database with all the read statuses for the old tt. Wow, super inefficient, it shouldn’t contain anything else, it’s a giant, pickled {"$hash": {"read": True/False}, …}. What the heck, why is it so big?! O_o

⤋ Read More
In-reply-to » Dang it, first attempt failed:

(Back in tt.) Well, it kinda worked. At least appending to the file. But my cache database got screwed up. I do not yet support replies, so the subject and and root hash columns have not been set at all, resulting in a message that is just not shown at all. I gotta do something about that next. The good thing is, though, after simply fixing the two columns the message appeared on screen.

⤋ Read More

@movq@www.uninformativ.de I have no doubt that you’re not seeing the images correctly šŸ˜€. It’s just that it’s broken when viewing them, in my case, and analyzing the URLs, I’ve seen everything I mentioned.
Regarding the hash, you’re right. I’ll have to investigate what’s going on. I’m having a hard time getting the hash generation to work properly.

⤋ Read More

@andros@twtxt.andros.dev Hm, looks correct to me. The image to be displayed is a thumbnail and this links to the full-sized image. The thumbnail (JPG) is auto-generated from the full image (PNG), hence the two extensions.

What does look strange, though, is that your client came up with the hash pqsmcka, while it should have been te5quba. šŸ¤”

⤋ Read More

@prologic@twtxt.net We can’t agree on this idea because that makes things even more complicated than it already is today. The beauty of twtxt is, you put one file on your server, done. One. Not five million. Granted, there might be archive feeds, so it might be already a bit more, but still faaaaaaar less than one file per message.

Also, you would need to host not your own hash files, but everybody else’s as well you follow. Otherwise, what is that supposed to achieve? If people are already following my feed, they know what hashes I have, so this is to no use of them (unless they want to look up a message from an archive feed and don’t process them). But the far more common scenario is that an unknown hash originates from a feed that they have not subscribed to.

Additionally, yarnd’s URL schema would then also break, because https://twtxt.net/twt/<hash> now becomes https://twtxt.net/user/prologic/<hash>, https://twtxt.net/user/bender/<hash> and so on. To me, that looks like you would only get hashes if they belonged to this particular user. Of course, you could define rules that if there is a /user/ part in the path, then use a different URL, but this complicates things even more.

Sorry, I don’t like that idea.

⤋ Read More
In-reply-to » @eapl.me There are several points that I like, but I want to highlight number 7. https://text.eapl.mx/a-few-ideas-for-a-next-twtxt-version #twtxt

a few async ideas for later

The editing process needs a lot of consideration and compromises.

From one side, editing and deleting it’s necessary IMO. People will do it anyway, and personally I like to edit my texts, so I’d put some effort on make it work.
Should we keep a history of edits? Should we hash every edit to avoid abuse? Should we mark internally a twt as deleted, but keeping the replies?

I think that’s part of a more complete ā€˜thread’ extension, although I’d say it’s worth to agree on something reflecting the real usage in the wild, along with what people usually do on other platforms.

⤋ Read More
In-reply-to » @eapl.me There are several points that I like, but I want to highlight number 7. https://text.eapl.mx/a-few-ideas-for-a-next-twtxt-version #twtxt

looks good to me!

About alice’s hash, using SHA256, I get 96473b4f or 96473B4F for the last 8 characters. I’ll add it as an implementation example.
The idea of including it besides the follow URL is to avoid calculating it every time we load the file (assuming the client did that correctly), and helps to track replies across the file with a simple search.

Also, watching your example I’m thinking now that instead of {url=96473B4F,id=1} which is ambiguous of which URL we are referring to, it could be something like:
{reply_to=[URL_HASH]_[TWT_ID]} / {reply_to=96473B4F_1}
That way, the ā€˜full twt ID’ could be 96473B4F_1.

⤋ Read More
In-reply-to » @andros I've commented on the ticket: https://git.mills.io/yarnsocial/twtxt.dev/issues/14#issuecomment-19142

True. Though if the idea turns out to be better.. then community will adopt it.

if you look at the subject for that twt you will see that it uses the extended hash format to include a URL address.

⤋ Read More

@bmallred@staystrong.run I forgot one more effect of edits. If clients remember the read status of massages by hash, an edit will mark the updated message as unread again. To some degree that is even the right behavior, because the message was updated, so the user might want to have a look at the updated version. On the other hand, if it’s just a small typo fix, it’s maybe not worth to tell the user about. But the client doesn’t know, at least not with additional logic.

Having said that, it appears that this only affects me personally, noone else. I don’t know of any other client that saves read statuses. But don’t worry about me, all good. Just keep doing what you’ve done so far. I wanted to mention that only for the sake of completeness. :-)

⤋ Read More
In-reply-to » (#oknfufq) @lyse What do you think about this? https://git.mills.io/yarnsocial/twtxt.dev/issues/14

I like this syntax, you have my vote, although I’d change it a bit like
#<Alice https://example.com/twtxt.com#2024-12-18T14:18:26+01:00>

Hashes are not a problem on PHP, I dont know why it’s slow to calculate them from your side, but I agree with your points.

BTW, did you have the chance to read my proposal on twtxt 2.0? I shared a few ideas about possible improvements to discuss:
https://text.eapl.mx/a-few-ideas-for-a-next-twtxt-version
https://text.eapl.mx/reply-to-lyse-about-twtxt

⤋ Read More

@bmallred@staystrong.run Any edit automatically changes the twt hash, because the hash is built over the hash URL, message timestamp and message text. https://twtxt.dev/exts/twt-hash.html So, it is only a problem, if somebody replied to your original message with the old hash. The original message suddenly doesn’t exist anymore and the reply becomes detached, orphaned, whatever you wanna call it. Threading doesn’t break, though, if nobody replied to your message.

⤋ Read More

@andros@twtxt.andros.dev I believe you have just reproduced the bug… it looks like you’ve replayed to a twt but the hash is wrong. I can see the hash here from Jenny, but it doesn’t look like it corresponds to any{twt,thing}. if you check it out on any yarn instance it won’t look like a replay.

⤋ Read More

My hypothesis about that thing breaking my twts is that it might have something to do with the parenthesis surrounding the root twt hash in the replay twt-A when I replay to it with fork-twt-B; I imagine elisp interpreting those as a s-expression thus breaking the generation precess of hash (#twt-A) before prepending it to for-twt-B … but then I’m too ignorant to figure out how to test my theory (heck I couldn’t even recalculate the hashes myself correctly in bash xD). I’ll keep trying tho.

⤋ Read More
In-reply-to » (#56wivca) I suspect the problem is that the content is updated. It looks like a design problem.

@andros@twtxt.andros.dev yes, that usually happens when twts get edited and we just made a gentlemen agreement to avoid edits as much as possible (at least for the time being). But the thing is, That is not what’s happening with my broken twts’ hashes. Since I’ve bee mostly replaying to my own twts as a test and I know for sure that I haven’t edited any. (I usually fork-replay instead of edit a twt when needed)

⤋ Read More

@prologic@twtxt.net Of course you don’t notice it when yarnd only shows at most the last n messages of a feed. As an example, check out mckinley’s message from 2023-01-09T22:42:37Z. It has ā€œ[Scheduled][Scheduled][Scheduled]ā€œā€¦ in it. This text in square brackets is repeated numerous times. If you search his feed for closing square bracket followed by an opening square bracket (][) you will find a bunch more of these. It goes without question he never typed that in his feed. My client saves each twt hash I’ve explicitly marked read. A few days ago, I got plenty of apparently years old, yet suddenly unread messages. Each and every single one of them containing this repeated bracketed text thing. The only conclusion is that something messed up the feed again.

⤋ Read More

@lyse@lyse.isobeef.org Danke! Ja, es gibt noch unzƤhlige Stellschrauben an dem Ding. Deine Anmerkungen werde ich einarbeiten. Eine mobile Ansicht wƤr auch noch schƶn. Derzeit sitzt es auf dem Smartphone doch noch recht stramm.

@Unterhaltungen: Die von gestern zu verschlüsselten Nachrichten war ausschlaggebend für die Umsetzung. In ā€œTimelineā€ und ā€œYarnā€ haben mich die LƶsungsansƤtze bisher nicht überzeugt. Aber wir kƶnnen ja alle etwas von einander lernen.

⤋ Read More
In-reply-to » I want to share a little idea for a new extension with the goal of adding direct messages in #twtxt https://github.com/tanrax/twtxt-direct-message-extension

another one would be to allow changing public keys over time (as it may be a good practice [0]). A syntax like the following could help to know what public key you used to encrypt the message, and which private key the client should use to decrypt it:

!<nick url> <encrypted_message> <public_key_hash_7_chars>

Also I’d remove support for storing the message as hex, only allowing base64 (more compact, aiming for a minimalistic spec, etc.)

[0] https://www.brandonchecketts.com/archives/its-2023-you-should-be-using-an-ed25519-ssh-key-and-other-current-best-practices

⤋ Read More

I’m still making progress with the Emacs client. I’m proud to say that the code that is responsible for reading the feeds is almost finished, including: Twt Hash Extension, Twt Subject Extension, Multiline Extension and Metadata Extension. I’m fine-tuning some tests and will soon do the first buffer that displays the twts.

⤋ Read More

@movq@www.uninformativ.de i’m sorry if I sound too contrarian. I’m not a fan of using an obscure hash as well. The problem is that of future and backward compatibility. If we change to sha256 or another we don’t just need to support sha256. But need to now support both sha256 AND blake2b. Or we devide the community. Users of some clients will still use the old algorithm and get left behind.

Really we should all think hard about how changes will break things and if those breakages are acceptable.

⤋ Read More

I share I did write up an algorithm for it at some point I think it is lost in a git comment someplace. I’ll put together a pseudo/go code this week.

Super simple:

Making a reply:

  1. If yarn has one use that. (Maybe do collision check?)
  2. Make hash of twt raw no truncation.
  3. Check local cache for shortest without collision
    • in SQL: select len(subject) where head_full_hash like subject || '%'

Threading:

  1. Get full hash of head twt
  2. Search for twts
    • in SQL: head_full_hash like subject || '%' and created_on > head_timestamp

The assumption being replies will be for the most recent head. If replying to an older one it will use a longer hash.

⤋ Read More

These collisions aren’t important unless someone tries to fork. So.. for the vast majority its not a big deal. Using the grow hash algorithm could inform the client to add another char when they fork.

⤋ Read More

@prologic@twtxt.net Regarding the new way of generating twt-hashes, to me it makes more sense to use tabs as separator instead of spaces, since the you can just copy/past a line directly from a twtxt-file that already go a tab between timestamp and message. But tabs might be hard to ā€œtypeā€ when you are in a terminal, since it will activate autocompleteā€¦šŸ¤”

Another thing, it seems that you sugget we only use the domain in the hash-creation and not the full path to the twtxt.txt

$ echo -e "https://example.com 2024-09-29T13:30:00Z Hello World!" | sha256sum - | awk '{ print $1 }' | base64 | head -c 12

⤋ Read More
In-reply-to » @doesnm I am not sure I am understanding what you mean. Can you explain?

twet display twts in raw format with some formatting (sadly no newlines). And for reply messages i just seen (#hash). But which text hidden on hash? currenly im open twtxt.net/twt/hash to see this

⤋ Read More

More thoughts about changes to twtxt (as if we haven’t had enough thoughts):

  1. There are lots of great ideas here! Is there a benefit to putting them all into one document? Seems to me this could more easily be a bunch of separate efforts that can progress at their own pace:

1a. Better and longer hashes.

1b. New possibly-controversial ideas like edit: and delete: and location-based references as an alternative to hashes.

1c. Best practices, e.g. Content-Type: text/plain; charset=utf-8

1d. Stuff already described at dev.twtxt.net that doesn’t need any changes.

  1. We won’t know what will and won’t work until we try them. So I’m inclined to think of this as a bunch of draft ideas. Maybe later when we’ve seen it play out it could make sense to define a group of recommended twtxt extensions and give them a name.

  2. Another reason for 1 (above) is: I like the current situation where all you need to get started is these two short and simple documents:
    https://twtxt.readthedocs.io/en/latest/user/twtxtfile.html
    https://twtxt.readthedocs.io/en/latest/user/discoverability.html
    and everything else is an extension for anyone interested. (Deprecating non-UTC times seems reasonable to me, though.) Having a big long ā€œtwtxt v2ā€ document seems less inviting to people looking for something simple. (@prologic@twtxt.net you mentioned an anonymous comment ā€œyou’ve ruined twtxtā€ and while I don’t completely agree with that commenter’s sentiment, I would feel like twtxt had lost something if it moved away from having a super-simple core.)

  3. All that being said, these are just my opinions, and I’m not doing the work of writing software or drafting proposals. Maybe I will at some point, but until then, if you’re actually implementing things, you’re in charge of what you decide to make, and I’m grateful for the work.

⤋ Read More
In-reply-to » (#zqpkfla) @prologic Thanks for writing that up!

@bender@twtxt.net

Sorry, you’re right, I should have used numbers!

I’m don’t understand what ā€œpreserve the original hashā€ could mean other than ā€œmake sure there’s still a twt in the feed with that hashā€. Maybe the text could be clarified somehow.

I’m also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But it’s not universal; e.g. as a jenny user I just see the plain text.

⤋ Read More

@prologic@twtxt.net Thanks for writing that up!

I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.

I am not sure how I feel about all this being done at once, vs. letting conventions arise.

For example, even today I could reply to twt abc1234 with ā€œ(#abc1234) Edit: ā€¦ā€ and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.

Similarly we could just start using 11-digit hashes. We should iron out whether it’s sha256 or whatever but there’s no need get all the other stuff right at the same time.

I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).

However I recognize that I’m not the one implementing this stuff, and it’s less work to just have everything determined up front.

Misc comments (I haven’t read the whole thing):

  • Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. I’d suggest gaining 11 bits with base64 instead.

  • ā€œClients MUST preserve the original hashā€ — do you mean they MUST preserve the original twt?

  • Thanks for phrasing the bit about deletions so neutrally.

  • I don’t like the MUST in ā€œClients MUST follow the chain of reply-to referencesā€¦ā€. If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldn’t declare the client non-conforming just because they didn’t get to all the bells and whistles.

  • Similarly I don’t like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (I’m again thinking again of the 40-line shell script).

  • For ā€œwho followsā€ lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?

  • Why can’t feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasn’t too bad, but 1.0 would have been slightly simpler.

  • Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.

  • I’m a little sad about other protocols being not recommended.

  • I don’t know how I feel about including markdown. I don’t mind too much that yarn users emit twts full of markdown, but I’m more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.

⤋ Read More

I wrote some code to try out non-hash reply subjects formatted as (replyto ), while keeping the ability to use the existing hash style.

I don’t think we need to decide all at once. If clients add support for a new method then people can use it if they like. The downside of course is that this costs developer time, so I decided to invest a few hours of my own time into a proof of concept.

With apologies to @movq@www.uninformativ.de for corrupting jenny’s beautiful code. I don’t write this expecting you to incorporate the patch, because it does complicate things and might not be a direction you want to go in. But if you like any part of this approach feel free to use bits of it; I release the patch under jenny’s current LICENCE.

Supporting both kinds of reply in jenny was complicated because each email can only have one Message-Id, and because it’s possible the target twt will not be seen until after the twt referencing it. The following patch uses an sqlite database to keep track of known (url, timestamp) pairs, as well as a separate table of (url, timestamp) pairs that haven’t been seen yet but are wanted. When one of those ā€œwantedā€ twts is finally seen, the mail file gets rewritten to include the appropriate In-Reply-To header.

Patch based on jenny commit 73a5ea81.

https://www.falsifian.org/a/oDtr/patch0.txt

Not implemented:

  • Composing twts using the (replyto …) format.
  • Probably other important things I’m forgetting.

⤋ Read More
In-reply-to » (#v3lkjca) no my fault your client can't handle a little editing ;)

@prologic@twtxt.net I know the role of the current hash is to allow referencing (replies and, thus, threads), and it also represents a ā€œuniqueā€ way to verify a twtxt hasn’t been tampered with. Is that second so important, if we are trying to allow edits? I know if feels good to be able to verify, but in reality, how often one does it?

⤋ Read More
In-reply-to » (#v3lkjca) no my fault your client can't handle a little editing ;)

@prologic@twtxt.net how about hashing a combination of nick/timestamp, or url/timestamp only, and not the twtxt content? On edit those will not change, so no breaking of threads. I know, I know, just adding noise here. :-P

⤋ Read More

the stem matching is the same as how GIT does its branch hashes. i think you can stem it down to 2 or 3 sha bytes.

if a client sees someone in a yarn using a byte longer hash it can lengthen to match since it can assume that maybe the other client has a collision that it doesnt know about.

⤋ Read More

@prologic@twtxt.net the basic idea was to stem the hash.. so you have a hash abcdef0123456789... any sub string of that hash after the first 6 will match. so abcdef, abcdef012, abcdef0123456 all match the same. on the case of a collision i think we decided on matching the newest since we archive off older threads anyway. the third rule was about growing the minimum hash size after some threshold of collisions were detected.

⤋ Read More

@prologic@twtxt.net Wikipedia claims sha1 is vulnerable to a ā€œchosen-prefix attackā€, which I gather means I can write any two twts I like, and then cause them to have the exact same sha1 hash by appending something. I guess a twt ending in random junk might look suspcious, but perhaps the junk could be worked into an image URL like

Image

. If that’s not possible now maybe it will be later.

git only uses sha1 because they’re stuck with it: migrating is very hard. There was an effort to move git to sha256 but I don’t know its status. I think there is progress being made with Game Of Trees, a git clone that uses the same on-disk format.

I can’t imagine any benefit to using sha1, except that maybe some very old software might support sha1 but not sha256.

⤋ Read More

@movq@www.uninformativ.de going a little sideways on this, ā€œ*If twtxt/Yarn was to grow bigger, then this would become a concern again. But even Mastodon allows editing, so how much of a problem can it really be? šŸ˜…*ā€, wouldn’t it preparing for a potential (even if very, very, veeeeery remote) growth be a good thing? Mastodon signs all messages, keeps a history of edits, and it doesn’t break threads. It isn’t a problem there.šŸ˜‰ It is here.

I think keeping hashes is a must. If anything for that ā€œfeels goodā€ feeling.

⤋ Read More
In-reply-to » (replyto http://darch.dk/twtxt.txt 2024-09-15T12:50:17Z) @sorenpeter I like this idea. Just for fun, I'm using a variant in this twt. (Also because I'm curious how it non-hash subjects appear in jenny and yarn.)

@movq@www.uninformativ.de Agreed that hashes have a benefit. I came up with a similar example where when I twted about an 11-character hash collision. Perhaps hashes could be made optional somehow. Like, you could use the ā€œreplytoā€ idea and then additionally put a hash somewhere if you want to lock in which version of the twt you are replying to.

⤋ Read More

There is nothing wrong with how we currently run a diff to see what has been removed. if i build a merkle tree off all the twt hashes in a feed i can use that to verify a twt should be in a feed or not. and gossip that to my peers.

⤋ Read More

isn’t the benefit of blake2b that it is a more efficient algo than sha1 and has the same or similar entropy to sha3? i thought we had partially solved this with some type of expanding hash size? additionally we could increase bit density by using base36 or base64/url-safe…

⤋ Read More

@prologic@twtxt.net

There’s a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isn’t divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).

So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash function’s output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.

Other than the divisible-by-5 thing, my current intuition is it doesn’t matter what part you take.

  1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.

  2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.

  3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.

  4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.

⤋ Read More
In-reply-to » Could someone knowledgable reply with the steps a grandpa will take to calculate the hash of a twtxt from the CLI, using out-of-the-box tools? I swear I read about it somewhere, but can't find it.

@prologic@twtxt.net I saw those, yes. I tried using yarnc, and it would work for a simple twtxt. Now, for a more convoluted one it truly becomes a nightmare using that tool for the job. I know there are talks about changing this hash, so this might be a moot point right now, but it would be nice to have a tool that:

  1. Would calculate the hash of a twtxt in a file.
  2. Would calculate all hashes on a twtxt.txt (local and remote).

Again, something lovely to have after any looming changes occur.

⤋ Read More

Could someone knowledgable reply with the steps a grandpa will take to calculate the hash of a twtxt from the CLI, using out-of-the-box tools? I swear I read about it somewhere, but can’t find it.

⤋ Read More
In-reply-to » (replyto http://darch.dk/twtxt.txt 2024-09-15T12:50:17Z) Hmm, but yarnd also isn't showing these twts as being part of a thread. @prologic you said yarnd respects customs subjects. Shouldn't these twts count as having a custom subject, and get threaded together?

@falsifian@www.falsifian.org based on Twt Subject Extension, your subject is invalid. You can have custom subjects, that is, not a valid hash, but you simply can’t put anything, and expect it to be treated as a TwtSubject, me thinks.

⤋ Read More

@sorenpeter@darch.dk I like this idea. Just for fun, I’m using a variant in this twt. (Also because I’m curious how it non-hash subjects appear in jenny and yarn.)

URLs can contain commas so I suggest a different character to separate the url from the date. Is this twt I’ve used space (also after ā€œreplytoā€, for symmetry).

I think this solves:

  • Changing feed identities: although @mckinley@twtxt.net points out URLs can change, I think this syntax should be okay as long as the feed at that URL can be fetched, and as long as the current canonical URL for the feed lists this one as an alternate.
  • editing, if you don’t care about message integrity
  • finding the root of a thread, if you’re not following the author

An optional hash could be added if message integrity is desired. (E.g. if you don’t trust the feed author not to make a misleading edit.) Other recent suggestions about how to deal with edits and hashes might be applicable then.

People publishing multiple twts per second should include sub-second precision in their timestamps. As you suggested, the timestamp could just be copied verbatim.

⤋ Read More