@david@collantes.us We’ll get there soon™ 🔜
@david@collantes.us Hah Welcome back! 😅
Finally @lyse@lyse.isobeef.org ’s idea of updating metadata changes in a feed “inline” where the change happened (with respect to other Twts in whatever order the file is written in) is used to drive things like “Oh this feed now has a new URI, let’s use that from now on as the feed’s identity for the purposes of computing Twt hashes”. This could extend to # nick =
as preferential indicators to clients as well as even other updates such as # description =
– Not just # url =
Likewise we could also support delete:229d24612a2
, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2
if the author wishes to “unpublish” that Twt permanently, rather than just deleting the line from the feed (which does nothing for clients really).
An alternate idea for supporting (properly) Twt Edits is to denoate as such and extend the meaning of a Twt Subject (which would need to be called something better?); For example, let’s say I produced the following Twt:
2024-09-18T23:08:00+10:00 Hllo World
And my feed’s URI is https://example.com/twtxt.txt
. The hash for this Twt is therefore 229d24612a2
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of 026d77e03fa
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this edit:#229d24612a2
to mean, this Twt is an edit of 229d24612a2
and should be replaced in the client’s cache, or indicated as such to the user that this is the intended content.
@bender@twtxt.net Just replace the echo
with something like pbpaste
or similar. You’d just need to shell escape things like "
and such. That’s all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...
With a SHA1 encoding the probability of a hash collision becomes, at various k (number of twts):
>>> import math
>>>
>>> def collision_probability(k, bits):
... n = 2 ** bits # Total unique hash values based on the number of bits
... probability = 1 - math.exp(- (k ** 2) / (2 * n))
... return probability * 100 # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44 # Number of bits for the hash
>>>
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>
If we adopted this scheme, we could have to increase the no. of characters (first N) from 11
to 12
and finally 13
as we approach globally larger enough Twts across the space. I think at least full crawl/scrape it was around ~500k (maybe)? https://search.twtxt.net/ says only ~99k
@quark@ferengi.one My money is on a SHA1SUM hash encoding to keep things much simpler:
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e
I think it was a mistake to take the last n base32 encoded characters of the blake2b 256bit encoded hash value. It should have been the first n. where n is >= 7
Taking the last n characters of a base32 encoded hash instead of the first n can be problematic for several reasons:
Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.
In summary, using the first n characters generally preserves the intended randomness and collision resistance of the hash, making it a safer choice in most cases.
@quark@ferengi.one Bloody good question 🙋 God only knows 🤣
@movq@www.uninformativ.de Haha 😝
What I was referring to in the OP: Sometimes I check the workphone simply out of curiosity. 😂
@movq@www.uninformativ.de Fair 👌
Current Twt Hash spec and probability of hash collision:
The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.
Breakdown:- Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since ( 2^5 = 32 )).
- 7 characters: With 7 characters, the total number of possible hashes is:
[ 32^7 = 3,518,437,208 ] This gives about 3.5 billion possible hash values.
The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.
The approximate formula for the probability of at least one collision after generating n
hashes is:
[
P(\text{collision}) \approx 1 - e^{-\frac{n^2}{2M}}
]
Where:
- (n) is the number of generated Twt Hashes.
- (M = 32^7 = 3,518,437,208) is the total number of possible hash values.
For practical purposes, here are some example probabilities for different numbers of hashes (n
):
- For 1,000 hashes:
[ P(\text{collision}) \approx 1 - e^{-\frac{1000^2}{2 \cdot 3,518,437,208}} \approx 0.00014 \, \text{(0.014%)}
]
- For 10,000 hashes:
[ P(\text{collision}) \approx 1 - e^{-\frac{10000^2}{2 \cdot 3,518,437,208}} \approx 0.14 \, \text{(14%)}
]
- For 100,000 hashes:
[ P(\text{collision}) \approx 1 - e^{-\frac{100000^2}{2 \cdot 3,518,437,208}} \approx 0.999 \, \text{(99.9%)}
]
- For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
- However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).
@quark@ferengi.one Add here:
* a0826a65 - Add debug sub-command to yarnc (7 weeks ago) <James Mills>
I’d recommend a git pull && make build
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha256sum | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 12
tdqmjaeawqu
Just experimenting…
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha256sum | base64 | tr -d '=' | tail -c 12
NWY4MSAgLQo
It would appear that the blake2b
256bit digest algorithm is no longer supported by the openssl
tool, however blake2s256
is; I’m not sure why 🤔
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
Obviously produce the wrong hash, which should be o6dsrga
as indicated by the yarnc hash
utility:
$ yarnc hash -u https://twtxt.net/user/prologic/twtxt.txt -t 2020-07-18T12:39:52Z "Hello World! 😊"
o6dsrga
But at least the shell pipeline is “correct”.
FWIW the standard UNIX tools for Blake2b are openssl
and b2sum
– Just trying to figure out how to make a shell pipeline again (if you really want that); as tools keep changing god damnit 🤣
@quark@ferengi.one Do you mean something like this?
$ ./yarnc debug ~/Public/twtxt.txt | tail -n 1
kp4zitq 2024-09-08T02:08:45Z (#wsdbfna) @<aelaraji https://aelaraji.com/twtxt.txt> My work has this thing called "compressed work", where you can **buy** extra time off (_as much as 4 additional weeks_) per year. It comes out of your pay though, so it's not exactly a 4-day work week but it could be useful, just haven't tired it yet as I'm not entirely sure how it'll affect my net pay
@quark@ferengi.one Watch TV at the moment so unsure about “out of the box tools” but there are 3 implementations on the spec page https://dev.twtxt.net/doc/twthashextension.html
@aelaraji@aelaraji.com Sometimes that’s the most valuable though! Feedback!
I’ve been using Codeium too the last week or so ! It’s pretty good and like @xuu said is a pretty desent Junior assistant, it helps me write good docs and the tab completion is amazing!
It of course completely sucks at doing anything “intelligent” or complex, but if you just use it as a fancier auto complete it’s actually half way decent 👌
@quark@ferengi.one I admit I find the general “click here to share blah” generally wasteful, useless and unengaging really. Not just Wordle.
I admittedly however, I’ve been guilty of doing this sometimes myself. 🤦♂️ sometimes though I think it’s OK to show your achievements. 👌
@quark@ferengi.one LOL 😂 No worries! 😉
@quark@ferengi.one All fixed! 🥳 pleasure doing business with y’all. 🤣
@lyse@lyse.isobeef.org Just as an aside, shouldn’t you assume utf-8
anyway these days if not specified? 🤔 I mean basically everything almost always uses utf-8
encoding right? 😅
jenny
nor yarnd
support it very well. Only at a very basic level.
@quark@ferengi.one Okay fair 👌
@movq@www.uninformativ.de So basically on-call, but you don’t get paid for it? 🤔
@quark@ferengi.one aye aye captain 🤣
main.go
(but it can be done on a template now, so no reason to touch the code):
@quark@ferengi.one It’s a good thin yarnd
makes this user configurable via a preference 🤣 And displays both 😅
@quark@ferengi.one LOL 😂 Thanks!
yarnd
(at least) doesn't support creating such a custom TwtSubject, but it will reply and respect and thread one if one was constructed.
@quark@ferengi.one You are right, whilst it technically works, its not well supported. Too much of the code would have to change to support that, and it’s not worth the value.
So yeah no, whilst it technically works, neither jenny
nor yarnd
support it very well. Only at a very basic level.
It actually has treated this as a thread, but it gets a bit weird, because we thread based on the content address of a root twt.
@movq@www.uninformativ.de for oncall?
jenny,
ttand
yarnd...
@movq@www.uninformativ.de Haha 🤣
@movq@www.uninformativ.de this is why I don’t have a work phone 🤣
-T/--template
in case you need a custom template 👌
@bender@twtxt.net I should put the template that is used by default as a file in the repo. Look at the source for now and you’ll see 😅
jenny,
ttand
yarnd...
@movq@www.uninformativ.de I never implemented UI support for their creation 🤣
Just that yarnd
(at least) doesn’t support creating such a custom TwtSubject, but it will reply and respect and thread one if one was constructed.
@falsifian@www.falsifian.org You will actually find that they work perfectly fine. They just tend to have little value I think. Most of our clients do support them, jenny,
ttand
yarnd...
At least you took the time to file an issue on a decentralised Git server 💪 Thank you! 🙏
@movq@www.uninformativ.de (#3xpygsq) @movq@www.uninformativ.de I have a slightly different but compatible view:
What makes twtxt unique is its radical technical simplicity. And that means you have to be a tech-savvy person to appreciate twtxt and that means mass-appeal is pretty much out of the question to begin with. 😅
You see, if you recall my old man ain’t all that great with tech these days, though he used to be and that’s how I got into it, encouraged as a young lad. Anyway… I built yarnd
for that purpose, so a) I could use it as my daily driver (think of it like Jenny/tt but for the web with a little server) and b) so others could use it too (admitedly that hasn’t been well adopted because reasons)
Anyway my view is that Yarn/Twtxt is designed to be a slow social media without distraction. I like that a lot. Forget the simplicity for a second, if you think about how we use this, and how damn well fucking effective it is, without all the ads, tracking, god knows what useless-ass features, all the nonsense multi-Megabytes your browser has to download, just to post what you ate for breakfast, I like what we’ve built 😅
@aelaraji@aelaraji.com If you’re talking about me, I’m notorious for typos 🤣 I type too fast, and I think I need a new keyboard, these keys are getting stuck 😅 Either that or I need to take my blowvac to it and clean the dust, skin and muck under the butterfly keys 🎹
@aelaraji@aelaraji.com I just added support for passing a custom template file via -T/--template
in case you need a custom template 👌
prologic@JamessMacStudio
Wed Sep 18 01:27:29
~/Projects/yarnsocial/twtxt2html
(main) 130
$ ./twtxt2html --help
Usage: twtxt2html [options] FILE|URL
twtxt2html converts a twtxt feed to a static HTML page
-d, --debug enable debug logging
-l, --limit int limit number ot twts (default all) (default -1)
-n, --noreldate do now show twt relative dates
-r, --reverse reverse the order of twts (oldest first)
-T, --template string path to template file
-t, --title string title of generated page (default "Twtxt Feed")
-v, --version display version information
pflag: help requested
@movq@www.uninformativ.de Oh! 🤣 For some reason I thought my first feed was lost and gone 🤣 Hahahaha it’s been there the whole time 😅 Fuck we write good specs and software 🥳
@falsifian@www.falsifian.org I like this idea too as a completing solution to the “identify problem”:
Or maybe url changes could somehow be combined with the archive feeds extension? Could the url metadata field be local to each archive file, so that to switch to a new url all you need to do is archive everything you’ve got and start a new file at the new url?
👌