Not shown here but, this Shape class used on the linked sketch helps eliminate (by adding them to a set) not only Polygons that are visually the same but also shape rotations using a custom .hash() method :)
(A caveat to the reader: The code can be is messy because it sometimes retains remnants of abandoned ideas and lateral explorations. This is creative coding not software engineering)
Not shown here but, this Shape class used on the linked sketch helps eliminate (by adding them to a set) not only Polygons that are visually the same but also shape rotations using a custom .__hash__()
method :)
(A caveat to the reader: The code is messy because it sometimes retains remnants of abandoned ideas and lateral explorations, also, this is creative coding not software engineering)
url
metadata field unequivocally treated as the canon feed url when calculating hashes, or are they ignored if they're not at least proper urls? do you just tolerate it if they're impersonating someone else's feed, or pointing to something that isn't even a feed at all?
@zvava@twtxt.net My clients trusts the first url
field it finds. If there is none, it uses the URL that I’m using for fetching the feed.
No validation, no logging.
In practice, I’ve not seen issues with people messing with this field. (What I do see, of course, is broken threads when people do legitimate edits that change the hash.)
I don’t see a way how anyone can impersonate anybody else this way. 🤔 Sure, you could use my URL in your url
field, but then what? You will still show up as zvava
in my client or, if you also change your nick
field, as movq (zvava)
.
url
metadata field unequivocally treated as the canon feed url when calculating hashes, or are they ignored if they're not at least proper urls? do you just tolerate it if they're impersonating someone else's feed, or pointing to something that isn't even a feed at all?
@zvava@twtxt.net Yes, the specification defines the first url
to be used for hashing. No matter if it points to a different feed or whatever. Just unsubscribe from malicious feeds and you’re done.
Since the first url
is used for hashing, it must never change. Otherwise, it will break threading, as you already noticed. If your feed moves and you wanna keep the old messages in the same new feed, you still have to point to the old url
location and keep that forever. But you can add more url
s. As I said several times in the past, in hindsight, using the first url
was a big mistake. It would have been much better, if the last encountered url
were used for hashing onwards. This way, feed moves would be relatively straightforward. However, that ship has sailed. Luckily, feeds typically don’t relocate.
url
metadata field unequivocally treated as the canon feed url when calculating hashes, or are they ignored if they're not at least proper urls? do you just tolerate it if they're impersonating someone else's feed, or pointing to something that isn't even a feed at all?
@alexonit@twtxt.alessandrocutolo.it prologic has me sold on the idea of hashv2 being served alongside a text fragment, eg. (#abcdefghijkl https://example.com/tw.txt#:~:text=2025-10-01T10:28:00Z)
, because it can be simply hacked in to clients currently on hashv1 and provides an off-ramp to location-based addressing (though i still think the format should be changed to smth like #<abc... http://example.com/...>
so it’s cleaner once we finally drop hashes)
is the first url
metadata field unequivocally treated as the canon feed url when calculating hashes, or are they ignored if they’re not at least proper urls? do you just tolerate it if they’re impersonating someone else’s feed, or pointing to something that isn’t even a feed at all?
and if the first url
metadata field changes, should it be logged with a time so we can still calculate hashes for old posts? or should it never be updated? (in the case of a pod, where the end user has no choice in how such events are treated) or do we redirect all the old hashes to the new ones (probably this, since it would be helpful for edits too)
@movq@www.uninformativ.de You were seeing that mayn hash collisions for you to notice this? 😱
The twtiverse appears to have shrunk. Among the 61 feeds that I follow, I don’t see any hash collisions anymore. 🤔
Exactly, @zvava@twtxt.net, I agree. (Although, in my client at least, I wouldn’t use hashes anywhere.)
@alexonit@twtxt.alessandrocutolo.it Yeah I think we’re overstating the UNIX principles a bit here 🤣 I get what you’re trying to say though @zvava@twtxt.net 😅 If I could go back in time and do it all over again, I would have gotten the Hash length correct and I would have used SHA-256 instead. But someone way smarter than me designed the Twt Hash spec, we adopted it and well here we are today, it works™ 😅
That’s what I’m using right now, while my own client is still in the making.
A simple bash script to write a post in a mktemp
file then clean it with regex.
I don’t even bother to hash the replies, I just open https://twtxt.net and copy the hash by hand since I’m checking the new posts from there anyway (temporarily, as I might end up DoS-ing everyone’s feed in my client right now).
plus, if hashv2 was implemented in combination with text fragments the way you proposed that would solve both scripting and human readability woes!!
…though, the presence of the text fragments then makes reversing the replied-to twt (and therefore its hash) trivial, which could allow clients to tolerate the omission of the hash — and while it would be ‘non-standard’ this would be the best of both worlds; potential to tolerate (or pave a glacial path toward? :o) human writable replies whilst keeping a unique id for twts that is universal across all pods
@prologic@twtxt.net to clarify: i meant the ability to parse feeds using unix command line utilities, as a principal of twtxtv1’s design. im not sure how feasible it is to build a simple feed reader out of common scripting utilities when hashing is in play, and;
i concede, it does make a lot of sense to fix up the hashing spec rather than completely supplant it at this point, just thinking about what the rewrite would be like is dreadful in and of itself x.x
@zvava@twtxt.net Going to have to hard disagree here I’m sorry. a) no-one reads the raw/plain twtxt.txt files, the only time you do is to debug something, or have a stick beak at the comments which most clients will strip out and ignore and b) I’m sorry you’ve completely lost me! I’m old enough to pre-date before Linux became popular, so I’m not sure what UNIX principles you think are being broken or violated by having a Twt Subject (Subject)
whose contents is a cryptographic content-addressable hash of the “thing”™ you’re replying to and forming a chain of other replies (a thread).
I’m sorry, but the simplest thing to do is to make the smallest number of changes to the Spec as possible and all agree on a “Magic Date” for which our clients use the modified function(s).
@prologic@twtxt.net the simplest thing to do is to completely forgo hashing anything because we are communicating using plain text files right now :3 while i agree hashes are incredibly helpful in the backend im not sure it has a place outside of it, it basically eliminates two core design principals of twtxt (human readability and integrating well with unix command line utilities) and makes new clients more difficult to build than it should be
@bender@twtxt.net Well honestly, this is just it. My strong position on this is quite simple:
Do the simplest thing that could work.
It’s one of the age old UNIX philosphies.
Therefore, the simplest thing™ to do here is to just increase the hash length, mark a magic™ date/time as @lyse@lyse.isobeef.org has indicated and call it a day. We’ll then be fine for a few hundred years, at which point there’ll be no-one left alive to give a shit™ anyway 🤣
@prologic@twtxt.net considering other alternatives we have seeing (of which I have lost track already), yes. Why don’t you guys (client makers) take a step at a time and, for now, increase the hash length to deal with the collisions. Then location-based addressing can be added… or not, you know. 😅
Of course we still have to fix the hashing algorithm and length.
@lyse@lyse.isobeef.org I don’t think there’s any point in continuing the discussion of Location vs. Content based addressing.
I want us to preserve Content based addressing.
Let’s improve the user experience and fix the hash commission problems.
@prologic@twtxt.net I know we won’t ever convince each other of the other’s favorite addressing scheme. :-D But I wanna address (haha) your concerns:
I don’t see any difference between the two schemes regarding link rot and migration. If the URL changes, both approaches are equally terrible as the feed URL is part of the hashed value and reference of some sort in the location-based scheme. It doesn’t matter.
The same is true for duplication and forks. Even today, the “cannonical URL” has to be chosen to build the hash. That’s exactly the same with location-based addressing. Why would a mirror only duplicate stuff with location- but not content-based addressing? I really fail to see that. Also, who is using mirrors or relays anyway? I don’t know of any such software to be honest.
If there is a spam feed, I just unfollow it. Done. Not a concern for me at all. Not the slightest bit. And the byte verification is THE source of all broken threads when the conversation start is edited. Yes, this can be viewed as a feature, but how many times was it actually a feature and not more behaving as an anti-feature in terms of user experience?
I don’t get your argument. If the feed in question is offline, one can simply look in local caches and see if there is a message at that particular time, just like looking up a hash. Where’s the difference? Except that the lookup key is longer or compound or whatever depending on the cache format.
Even a new hashing algorithm requires work on clients etc. It’s not that you get some backwards-compatibility for free. It just cannot be backwards-compatible in my opinion, no matter which approach we take. That’s why I believe some magic time for the switch causes the least amount of trouble. You leave the old world untouched and working.
If these are general concerns, I’m completely with you. But I don’t think that they only apply to location-based addressing. That’s how I interpreted your message. I could be wrong. Happy to read your explanations. :-)
Here is just a small list of things™ that I’m aware will break, some quite badly, others in minor ways:
- Link rot & migrations: domain changes, path reshuffles, CDN/mirror use, or moving from txt → jsonfeed will orphan replies unless every reader implements perfect 301/410 history, which they won’t.
- Duplication & forks: mirrors/relays produce multiple valid locations for the same post; readers see several “parents” and split the thread.
- Verification & spam-resistance: content addressing lets you dedupe and verify you’re pointing at exactly the post you meant (hash matches bytes). Location anchors can be replayed or spoofed more easily unless you add signing and canonicalization.
- Offline/cached reading: without the original URL being reachable, readers can’t resolve anchors; with hashes they can match against local caches/archives.
- Ecosystem churn: all existing clients, archives, and tools that assume content-derived IDs need migrations, mapping layers, and fallback logic. Expect long-lived threads to fracture across implementations.
We’ve been discussing the idea of changing the threading model from Content-based Addressing to Location-based addressing for years now. The problem is quite complex, but I feel I have to keep reminding y’all of the potential perils of changing this and the pros/cons of each model:
With content-addressed threading, a reply points at something that’s intrinsically identified (hash of author/feed URI + timestamp + content). That ID never changes as long as the content doesn’t. Switching to location-based anchors makes the reply target extrinsic—it now depends on where the post currently lives. In a pull-based, decentralised network, locations drift. The moment they do, thread identity fragments.
@zvava@twtxt.net @lyse@lyse.isobeef.org I also think a location based reference might be better.
A thread is a single post of a single feed as a root, but the hash has the drawback of not referencing the source, in a distributed network like twtxt it might leave some people out of the whole conversation.
I suggest a simpler format, something like: (#<TIMESTAMP URL>)
This solves three issues:
- Easier referencing: no need to generate a hash, just copy the timestamp and url, it’s also simpler to implement in a client without the rish of collisions when putting things together
- Fetchable source: you can find the source within the reference and construct the thread from there
- Allow editing: If a post is modified the hash becomes invalid since it depends on
[ timestamp, url, content ]
@zvava@twtxt.net There would be only one hash for a message. Some to be defined magic date selects which hash to use. If the message creation timestamp is before this epoch, hash it with v1, otherwise hammer it through v2. Eventually, support for v1 could be dropped as nobody interacts with the old stuff anymore. But I’d keep it around in my client, because why not.
If users choose a client which supports the extensions, they don’t have to mess around with v1 and v2 hashing, just like today.
As for the school of thought, personally, I’d prefer something else, too. I’m in camp location-based addressing, or whatever it is called. There more I think about it, a complete redesign of twtxt and its extensions would be necessary in my opinion. Retrofitting has its limits. Of course, this is much more work, though.
@lyse@lyse.isobeef.org i dont mind if the hash is not backward compatible but im not sure if this is the right way to proceed because the added complexity dealing with two hash versions isnt justified
regular end users wont care to understand how twt hashes are formed, they just want to use twtxt! so i guess i could work in protecting users from themselves by disallowing post edits on old posts or posts with replies, but i’m not fond of this either really. if they want to break a thread, they can just delete the post (though i’ve noticed yarn handling post deletes dubiously…)
on activitypub i do genuinely find myself looking through several month or even year old posts sometimes and deciding to edit/reword them a little to be slightly less confusing, this should be trivial to handle on twtxt which is an infinitely simpler specification
@zvava@twtxt.net I was about to suggest that you post some examples. By now, we’re pretty good at debugging hashing issues, because that happens so often. 😂 But it looks like you figured it out on your own. ✌️
wait why are so many of my post hashes not generating correctly ;w;
edit: i read the spec wrong :3 only +/-00:00 is stripped, not the entire timezone offset >.<
@prologic@twtxt.net im unsure how i feel about the hash v2 proposal, given it is completely backward incompatible with hash v1 it doesn’t really solve any of the problems with it. it only delays collisions, and still fragments threads on post edits
i skimmed through discussions under other the proposals — i agree humans are very bad at keeping the integrity of the web in tact, but hashes in done in this way make it impossible even for systems to rebuild threads if any post edits have occurred prior to their deployment
@zvava@twtxt.net we have to amend the spec and increase the hash length. We just haven’t done so yet 😆
ok so i have found a genuine twt hash collision. what do i do.
internally, bbycll relies on a post lookup table with post hashes as keys, this is really fast but i knew i’d inevitably run into this issue (just not so soon) so now i have to either:
1) pick the newer post over the other
2) break from specification and not lowercase hashes
3) secretly associate canonical urls or additional entropy with post hashes in the backend without a sizeable performance impact somehow
at first i dismissed the idea of likes on twtxt as not sensible…like at all — then i considered they could just be published in a metadata field (though that field could get really unruly after a while)
retwts are plausible, as “RE: https://example.com/twtxt.txt#abcdefg
”, the hash could even be the original timestamp from the feed to make it human readable/writable, though im extremely wary of clogging up timelines
i thought quote twts could be done extremely sensibly, by interpreting a mention+hash at the end of the twt differently to when placed at the beginning — but the twt subject extension requires it be at the beginning, so the clean fallback to a normal reply i originally imagined is out of the question — it could still be possible (reusing the retwt format, just like twitter!) but i’m not convinced it’s worth it at that point
is any of this in the spirit of twtxt? no, not in the slightest, lmao
@zvava@twtxt.net may I recommend to change the mention format upon hitting reply to something similar to what it’s used in Yarn, and perhaps hiding the hash on the post too? Looking good!
@movq@www.uninformativ.de Yeah, we’ve seen how this plays out in practice 🤣 @dce@hashnix.club My advice, do what @movq@www.uninformativ.de has hinted at and don’t change the 1st # url =
field in your feed. I’m not sure if you had already, but the first url field is kind of important in your feed as it is used as the “Hashing URI” for threading.
@dce@hashnix.club Ah, oh, well then. 🥴
My client supports that, if you set multiple url =
fields in your feed’s metadata (the top-most one must be the “main” URL, that one is used for hashing).
But yeah, multi-protocol feeds can be problematic and some have considered it a mistake to support them. 🤔
As in regards to technology…
“Behind the scenes, U.S. Trade Representative Jamieson Greer and EU’s Maroš Šefčovič hashed out technical annexes on automobiles, pharmaceuticals and digital trade.”
“Tech giants Microsoft’s cloud services, Apple’s iPhones and Google’s data solutions gain tariff-free pathway, fueling digital exports growth.”
“Digital Services: Microsoft announces a new 150 MW cloud data center in Berlin backed by tariff-free equipment imports; SAP commits to expanding U.S. R&D hubs in Austin, Texas.”
Per Coredump: Angreifer können unter Linux Passwort-Hashes abgreifen
Mehrere Versionen von Ubuntu, Fedora und RHEL sind angreifbar. Böswillige Akteure können Anwendungen crashen und vertrauliche Daten erbeuten. ( Sicherheitslücke, Ubuntu)
huh.. so not even trying to be compatible with existing hashes?
@movq@www.uninformativ.de Has that hashing change even be accepted? :-?
I have zero mental energy for programming at the moment. 🫤
I’ll try to implement the new hashing stuff in jenny before the “deadline”. But I don’t think you’ll see any texudus development from me in the near future. ☹️
[$] Hash table memory usage and a BPF interpreter bug
Anton Protopopov led a short discussion at the 2025 Linux Storage, Filesystem,
Memory-Management, and BPF Summit about amount of memory used
by hash tables in BPF programs. He thinks that the current memory layout is
inefficient, and wants to split the structure that holds table entries into two
variants for different kinds of maps. When that proposal proved
uncontroversial, he also took the chance to talk about a bug in BPF’s call
instruction. ⌘ Read more
@ About the URL, since it no longer used for hashing there might be no need to change it. I agree that we keep all the parts that already are out there for the most parts. Instead of a contact field you could also just use links like: link = Email mailto:user@example.dk
or link = Signal https://signal.me/sthF4raI5Lg_ybpJwB1sOptDla4oU7p[...]
@lyse@lyse.isobeef.org Yeah to avoid cutting off bits at the end making hashes end in either q
or a
🤣
tt2
from @lyse and Twtxtory from @javivf?
@prologic@twtxt.net if I understand correctly it’s just to increase hash size from 7 to 12 once it gets calculated, isn’t it? BTW is this change already approved? I still don’t understand how a proposal become an implementation in the twtxtverse 🤓
if
clauses to this. My point is: Every time I see a hash, I’d like to have a hint as to where to find the corresponding twt.
The reason I think this can work so well and I’m in full support of it is that it’s the least disruptive way to resolve the issue of:
where did this hash come from?
@prologic@twtxt.net Not sure I’d attach any if
clauses to this. My point is: Every time I see a hash, I’d like to have a hint as to where to find the corresponding twt.
@movq@www.uninformativ.de If we’re focusing on solving the “missing roots” problems. I would start to think about “client recommendations”. The first recommendation would be:
- Replying to a Twt that has no initial Subject must itself have a Subject of the form (hash; url).
This way it’s a hint to fetching clients that follow B, but not A (in the case of no mentions) that the Subject/Root might (very likely) is in the feed url
.
If we must stick to hashes for threading, can we maybe make it mandatory to always include a reference to the original twt URL when writing replies?
Instead of
(<a href="https://txt.sour.is/search?q=%23123467">#123467</a>) hello foo bar
you would have
(<a href="https://txt.sour.is/search?q=%23123467">#123467</a> http://foo.com/tw.txt) hello foo bar
or maybe even:
(<a href="https://txt.sour.is/search?q=%23123467">#123467</a> 2025-04-30T12:30:31Z http://foo.com/tw.txt) hello foo bar
This would greatly help in reconstructing broken threads, since hashes are obviously unfortunately one-way tickets. The URL/timestamp would not be used for threading, just for discovery of feeds that you don’t already follow.
I don’t insist on including the timestamp, but having some idea which feed we’re talking about would help a lot.
7
to 12
and use the first 12
characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q
or a
(oops) 😅 And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! -- I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.social's 5th birthday and 5 years since I started this whole project and endeavour! 😱 #Twtxt #Update
July 1st. 63 days from now to implement a backward-incompatible change, apparently not open to other ideas like replacing blake with SHA, or discussing implementation challenges for other languages and platforms.
Finally just closing #18, #19 and #20 without starting a proper discussion and ignoring a ‘micro consensus’ feels… not right.
I don’t know what to think rather than letting it rest (May will be busy here) and focus on other stuff in the future.
7
to 12
and use the first 12
characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q
or a
(oops) 😅 And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! -- I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.social's 5th birthday and 5 years since I started this whole project and endeavour! 😱 #Twtxt #Update
I will be adding the code in for yarnd
very soon™ for this change, with a if the date is >= 2025-07-01 then compute_new_hashes else compute_old_hashes
Finally I propose that we increase the Twt Hash length from 7
to 12
and use the first 12
characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q
or a
(oops) 😅 And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! – I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.social’s 5th birthday and 5 years since I started this whole project and endeavour! 😱 #Twtxt #Update
I had Chick-fil-A breakfast today (sausage, egg, and cheese biscuit, hash browns, coffee, and orange juice). Then at lunch my work place offered hot dogs. I had two (kosher, if that matters), plus a coke, a macadamia nuts cookie, and a small chocolate brownie.
So, here I am, at home, feeling hungry but guilty and refusing to eat anything else for the rest of the day. To top it off, I have only clocked 4,000 steps today (and I don’t feel like walking). I am going to hell, am I?
dm-only.txt
feeds. 😂
by commenting out DMs are you giving up on simplicity? See the Metadata extension holding the data inside comments, as the client doesn’t need to show it inside the timeline.
I don’t think that commenting out DMs as we are doing for metadata is giving up on simplicity (it’s a feature already), and it helps to hide unwanted DMs to clients that will take months to add it’s support to something named… an extension.
For some other extensions in https://twtxt.dev/extensions.html (for example the reply-to hash #abcdfeg
or the mention @ < example http://example.org/twtxt.txt >
) is not a big deal. The twt is still understandable in plain text.
For DM, it’s only interesting for you if you are the recipient, otherwise you see an scrambled message like 1234567890abcdef=
. Even if you see it, you’ll need some decryption to read it. I’ve said before that DMs shouldn’t be in the same section that the timeline as it’s confusing.
So my point stands, and as I’ve said before, we are discussing it as a community, so let’s see what other maintainers add to the convo.
dm-only.txt
feeds. 😂
After reading you, @eapl.me@eapl.me, I’ll tell you my point of view.
In my opinion, a feed does not have to be equivalent to a timeline. A timeline is a representation of the feed adapted to a user. You may not be interested in seeing other people’s threads or DMs. But perhaps they are interested in seeing mentions or DMs directed at them. It is important not to fall into the trap. With that clarification…
I insist, this is my point of view, it is not an absolute truth: I don’t think extensions should be respectful of customers who are no longer maintained.
We cannot have a system that is simple, backwards compatible and extensible all at the same time. We have to give up some of the 3 points. I would not like to give up simplicity because it will then make it harder to maintain the customers who do stay. Therefore, I think it is better to give up backwards compatibility and play with new formulas in the extensions. I don’t think it’s a good idea to make a hash keep so much load: a hashtag, a thread and also a DM.
MaxAgeDays
configuration at the pod level, that now some profiles are rather empty. This is only because well, they're a bit "inactive" so to speak 🗣️ Not sure what to do about this at the moment... Open to ideas? 💡
yes it used be http://
only and to keep hashes from breaking i added # url = http://...
and now we are stock with it due to the curret specs.
Hmmm there’s a bug somewhere in the way I’m ingesting archived feeds 🤔
sqlite> select * from twts where content like 'The web is such garbage these days%';
hash = 37sjhla
feed_url = https://twtxt.net/user/prologic/twtxt.txt/1
content = The web is such garbage these days 😔 Or is it the garbage search engines? 🤔
created = 2024-11-14T01:53:46Z
created_dt = 2024-11-14 01:53:46
subject = #37sjhla
mentions = []
tags = []
links = []
sqlite>
Some A hole has been trying to pull every single Twtxt feed that existed/still exists since forever. How do I know? Welp’ They’ve been querying my Timeline™ instance for all of it, every single twtxt file and twt Hash they can find. 😆🤦 It must have been going on for days and I have just noticed… + it’s all coming from the same ASN AS136907 HWCLOUDS-AS-AP HUAWEI CLOUDS
Thank you Huawei for the DDos you sons of Glitches!!!
@quark@ferengi.one No editing old Twts that are the root of a thread with replies in the ecosystem. Just results in a fork. Unless the client has an implementation that does not store Twts keyed by Hash.
Ha! I stand corrected, didn’t scrolled long enough. Indeed, it should be added (you will need an account on Mills’ Gitea), noted.
si4er3q
. See https://twtxt.dev/exts/twt-hash.html, a timezone offset of +00:00
or -00:00
must be replaced by Z
.
@eaplme@eapl.me you wrote:
“That PHP snippet could be merged into https://twtxt.dev/exts/twt-hash.html”
Why, though? AFAIK @andros@twtxt.andros.dev’s client is on Emacs, @lyse@lyse.isobeef.org’s is on Python (and Golang, for tt2
), @movq@www.uninformativ.de’s is on Python, and @prologic@twtxt.net’s is on Golang. All the client creator needs to know is in the documentation already, coding language agnostic.
si4er3q
. See https://twtxt.dev/exts/twt-hash.html, a timezone offset of +00:00
or -00:00
must be replaced by Z
.
just a note that we are doing that on PHP: https://github.com/eapl-gemugami/twtxt-php/blob/master/docs/03-hash-extension.md#php-72
That PHP snippet could be merged into https://twtxt.dev/exts/twt-hash.html
@david@collantes.us @andros@twtxt.andros.dev The correct hash would be si4er3q
. See https://twtxt.dev/exts/twt-hash.html, a timezone offset of +00:00
or -00:00
must be replaced by Z
.
(That said, there’s a bug in jenny as well. It only replaces +00:00
, not -00:00
. 🤡)
@prologic@twtxt.net @bender@twtxt.net
What is the hash of the last message from?: https://aelaraji.com/twtxt.txt
dm-only.txt
feeds. 😂
@bender@twtxt.net @aelaraji@aelaraji.com The client should ignore twts if it’s not compatible or not addressed to me. it’s a simple regex to add! It’s similar to Twt Hash Extension, should they be in another file? They are child messages, not flat twt. Not of course!
@prologic@twtxt.net interesting. What would happen on a hash collision? 🤔
@bender@twtxt.net It’s a bug in the UI for sure. The hash is the primary key.
@david@collantes.us Yeah, we’ve been debugging that a bit yesterday. Looks like the wrong input (sometimes) gets fed to the hash function → broken threads.
@movq@www.uninformativ.de @kat@yarn.girlonthemoon.xyz Heck yeah, that’s crazy! :-) Fingers crossed! (tt
also agrees with the right™ hash)
./yarnc debug <your feed url>
:
The actual hash is fs7673q
.
./yarnc debug <your feed url>
:
@prologic@twtxt.net that’s not what I see. The hash znf6csa
cannot be found.
@prologic@twtxt.net There was no edit according to my Git history. 🤔 On my end, the hash is fs7673q
and that’s also what kat used to reply.
Doesn’t look like it Hmmm
sqlite> select * from twts where content LIKE '%Linux installation%';
hash = znf6csa
feed_url = https://www.uninformativ.de/twtxt.txt
content = I wonder if my current Linux installation will actually make it to 20 years:
$ head -n 1 /var/log/pacman.log
[2011-07-07 11:19] installed filesystem (2011.04-1)
It’s not toooo far into the future.
It would be crazy … 20 years without reinstalling once … phew. 🥴
created = 2025-04-07T19:59:51Z
subject = (#znf6csa)
mentions = []
tags = []
links = []
@movq@www.uninformativ.de Apparently you wrote it :D The hash doesn’t lie? 🤣 https://twtxt.net/twt/znf6csa
@prologic@twtxt.net What happened here – did I edit my twt or is this hash wrong? 🥴
@prologic@twtxt.net Spring cleanup! That’s one way to encourage people to self-host their feeds. :-D
Since I’m only interested in the url
metadata field for hashing, I do not keep any comments or metadata for that matter, just the messages themselves. The last time I fetched was probably some time yesterday evening (UTC+2). I cannot tell exactly, because the recorded last fetch timestamp has been overridden with today’s by now.
I dumped my new SQLite cache into: https://lyse.isobeef.org/tmp/backup.tar.gz This time maybe even correctly, if you’re lucky. I’m not entirely sure. It took me a few attempts (date and time were separated by space instead of T
at first, I normalized offsets +00:00
to Z
as yarnd does and converted newlines back to U+2028
). At least now the simple cross check with the Twtxt Feed Validator does not yield any problems.
@andros@twtxt.andros.dev sha256 hash of twt in json. Look at converter script
Amazing! It is a good tool for reading feeds. What you used to calculate the hash?
Hello, i want to present my new revolution twtxt v3 format - twjson
That’s why you should use it:
- It’s easy to to parse
- It’s easy to read (in formatted mode :D)
- It used actually \n for newlines, you don’t need unprintable symbols
- Forget about hash collisions because using full hash
Here is my twjson feed: https://doesnm.p.psf.lt/twjson.json
And twtxt2json converter: https://doesnm.p.psf.lt/twjson.js
@eapl.me@eapl.me Interesting! Two points stood right out to me:
Why the hell are e-mail newsletters considered a valid option in the first place? Just offer an Atom feed and be done with it! Especially for a blog of this very type. This doesn’t even involve a third party service. Although, in addition he also links to Feedburner, what the fuck!? No e-mail address or the like is needed and subject to being disclosed.
When these spam mailers want to prevent resubscribing, then for fuck’s sake, why don’t they use a hash of the e-mail address (I saw that in yarnd) for that purpose? Storing the e-mail address in clear text after unsubscribing is illegal in my book.
There are 82.108 read statuses, but only 24.421 messages in the cache. In contrast to the cache with the messages, the read statuses are never cleaned up when a feed was unsubscribed from. And the read statuses also contain old style hashes, before we settled on the what we have today. Still a huge difference. Hmm.
tt
reimplementation that I already followed with the old Python tt
. Previously, I just had a few feeds for testing purposes in my new config. While transfering, I "dropped" heaps of feeds that appeared to be inactive.
Thanks, @movq@www.uninformativ.de!
My backing SQLite database with indices is 8.7 MiB in size right now.
The twtxt
cache is 7.6 MiB, it uses Python’s pickle
module. And next to it there is a 16.0 MiB second database with all the read statuses for the old tt
. Wow, super inefficient, it shouldn’t contain anything else, it’s a giant, pickled {"$hash": {"read": True/False}, …}
. What the heck, why is it so big?! O_o
(Back in tt
.) Well, it kinda worked. At least appending to the file. But my cache database got screwed up. I do not yet support replies, so the subject and and root hash columns have not been set at all, resulting in a message that is just not shown at all. I gotta do something about that next. The good thing is, though, after simply fixing the two columns the message appeared on screen.
@bender@twtxt.net Yeah, as you mentioned in the other thread, @andros@twtxt.andros.dev’s hashes appear to be not quite right. 🤔
@movq@www.uninformativ.de I have no doubt that you’re not seeing the images correctly 😀. It’s just that it’s broken when viewing them, in my case, and analyzing the URLs, I’ve seen everything I mentioned.
Regarding the hash, you’re right. I’ll have to investigate what’s going on. I’m having a hard time getting the hash generation to work properly.
@andros@twtxt.andros.dev Hm, looks correct to me. The image to be displayed is a thumbnail and this links to the full-sized image. The thumbnail (JPG) is auto-generated from the full image (PNG), hence the two extensions.
What does look strange, though, is that your client came up with the hash pqsmcka
, while it should have been te5quba
. 🤔
ditatompel releases ‘xmr-remote-nodes’ v0.2.1
ditatompel1 has released xmr-remote-nodes 2 version 0.2.13 with a fix for CVE-2024-453384, new features and updates:
”`
- fix: CVE-2024-45338 in #173
- feat: Added tor hidden service via HTTP header
- feat: Added more information on monero node details page
- feat: Added curl example command to Node details modal and page
- feat: Store hashed user IP address when submitting new node
- build(de … ⌘ Read more”`
Why not just use registry? It can be personal or hosted by someone like registry.twtxt.org. Just need to be adapt to support hashes
@prologic@twtxt.net We can’t agree on this idea because that makes things even more complicated than it already is today. The beauty of twtxt is, you put one file on your server, done. One. Not five million. Granted, there might be archive feeds, so it might be already a bit more, but still faaaaaaar less than one file per message.
Also, you would need to host not your own hash files, but everybody else’s as well you follow. Otherwise, what is that supposed to achieve? If people are already following my feed, they know what hashes I have, so this is to no use of them (unless they want to look up a message from an archive feed and don’t process them). But the far more common scenario is that an unknown hash originates from a feed that they have not subscribed to.
Additionally, yarnd’s URL schema would then also break, because https://twtxt.net/twt/<hash>
now becomes https://twtxt.net/user/prologic/<hash>
, https://twtxt.net/user/bender/<hash>
and so on. To me, that looks like you would only get hashes if they belonged to this particular user. Of course, you could define rules that if there is a /user/
part in the path, then use a different URL, but this complicates things even more.
Sorry, I don’t like that idea.
One of the biggest gripes of the community with the way the threading model currently works with Twtxt v1.2 (https://twtxt.dev) is this notion of:
What is this hash?
What does it refer to?
Idea: Why can’t we all agree to implement a simple URI scheme where we host our Twtxt feeds?
That is, if you host your feed at https://example.com/twtxt.txt
– Why can’t or could you not also host various JSON files (let’s agree on the spec of course) at https://example.com/twt/<hash>
? 🤔
That way we solve this problem in a truly decentralised way, rather than every relying on yarnd
pods alone.
Go-redis:執行 Lua 腳本
go-redis (github.com/redis/go-redis) 支持 Lua 腳本 redis.Script,本文在這裏簡單展示其在秒殺場景中使用的代碼片段。秒殺場景在秒殺場景中,一個商品的庫存對應了兩個信息,分別是總庫存量和已秒殺量。可以使用一個 Hash 類型的鍵值對來保存庫存的這兩個信息,如下所示:key: productid value: {total: N, ordered: ⌘ Read more
a few async ideas for later
The editing process needs a lot of consideration and compromises.
From one side, editing and deleting it’s necessary IMO. People will do it anyway, and personally I like to edit my texts, so I’d put some effort on make it work.
Should we keep a history of edits? Should we hash every edit to avoid abuse? Should we mark internally a twt as deleted, but keeping the replies?
I think that’s part of a more complete ‘thread’ extension, although I’d say it’s worth to agree on something reflecting the real usage in the wild, along with what people usually do on other platforms.
looks good to me!
About alice’s hash, using SHA256, I get 96473b4f
or 96473B4F
for the last 8 characters. I’ll add it as an implementation example.
The idea of including it besides the follow URL is to avoid calculating it every time we load the file (assuming the client did that correctly), and helps to track replies across the file with a simple search.
Also, watching your example I’m thinking now that instead of {url=96473B4F,id=1}
which is ambiguous of which URL we are referring to, it could be something like:
{reply_to=[URL_HASH]_[TWT_ID]}
/ {reply_to=96473B4F_1}
That way, the ‘full twt ID’ could be 96473B4F_1
.
True. Though if the idea turns out to be better.. then community will adopt it.
if you look at the subject for that twt you will see that it uses the extended hash format to include a URL address.
@bmallred@staystrong.run I forgot one more effect of edits. If clients remember the read status of massages by hash, an edit will mark the updated message as unread again. To some degree that is even the right behavior, because the message was updated, so the user might want to have a look at the updated version. On the other hand, if it’s just a small typo fix, it’s maybe not worth to tell the user about. But the client doesn’t know, at least not with additional logic.
Having said that, it appears that this only affects me personally, noone else. I don’t know of any other client that saves read statuses. But don’t worry about me, all good. Just keep doing what you’ve done so far. I wanted to mention that only for the sake of completeness. :-)
I like this syntax, you have my vote, although I’d change it a bit like
#<Alice https://example.com/twtxt.com#2024-12-18T14:18:26+01:00>
Hashes are not a problem on PHP, I dont know why it’s slow to calculate them from your side, but I agree with your points.
BTW, did you have the chance to read my proposal on twtxt 2.0? I shared a few ideas about possible improvements to discuss:
https://text.eapl.mx/a-few-ideas-for-a-next-twtxt-version
https://text.eapl.mx/reply-to-lyse-about-twtxt
@bmallred@staystrong.run Any edit automatically changes the twt hash, because the hash is built over the hash URL, message timestamp and message text. https://twtxt.dev/exts/twt-hash.html So, it is only a problem, if somebody replied to your original message with the old hash. The original message suddenly doesn’t exist anymore and the reply becomes detached, orphaned, whatever you wanna call it. Threading doesn’t break, though, if nobody replied to your message.
@andros@twtxt.andros.dev I believe you have just reproduced the bug… it looks like you’ve replayed to a twt but the hash is wrong. I can see the hash here from Jenny, but it doesn’t look like it corresponds to any{twt,thing}. if you check it out on any yarn instance it won’t look like a replay.
My hypothesis about that thing breaking my twts is that it might have something to do with the parenthesis surrounding the root twt hash in the replay twt-A
when I replay to it with fork-twt-B
; I imagine elisp interpreting those as a s-expression thus breaking the generation precess of hash (#twt-A) before prepending it to for-twt-B
… but then I’m too ignorant to figure out how to test my theory (heck I couldn’t even recalculate the hashes myself correctly in bash xD). I’ll keep trying tho.
@andros@twtxt.andros.dev yes, that usually happens when twts get edited and we just made a gentlemen agreement to avoid edits as much as possible (at least for the time being). But the thing is, That is not what’s happening with my broken twts’ hashes. Since I’ve bee mostly replaying to my own twts as a test and I know for sure that I haven’t edited any. (I usually fork-replay instead of edit a twt when needed)
@aelaraji@aelaraji.com Can you give me examples of hashes that you have detected wrong between Emacs client and twtxt.net?
Perhaps there is some character, some space, that is creating the discrepancy.
@prologic@twtxt.net Agreed! But clients can hallucinate and generate wrong hashes
aka Lies
🤣 Also, If you chheck your own twt on twtxt.net, it looks like a root twt instead of a replay.