falsifian

www.falsifian.org

James Cook. Time-space trader and software hipster.

Recent twts from falsifian

@sorenpeter@darch.dk I like this idea. Just for fun, Iā€™m using a variant in this twt. (Also because Iā€™m curious how it non-hash subjects appear in jenny and yarn.)

URLs can contain commas so I suggest a different character to separate the url from the date. Is this twt Iā€™ve used space (also after ā€œreplytoā€, for symmetry).

I think this solves:

  • Changing feed identities: although @mckinley@twtxt.net points out URLs can change, I think this syntax should be okay as long as the feed at that URL can be fetched, and as long as the current canonical URL for the feed lists this one as an alternate.
  • editing, if you donā€™t care about message integrity
  • finding the root of a thread, if youā€™re not following the author

An optional hash could be added if message integrity is desired. (E.g. if you donā€™t trust the feed author not to make a misleading edit.) Other recent suggestions about how to deal with edits and hashes might be applicable then.

People publishing multiple twts per second should include sub-second precision in their timestamps. As you suggested, the timestamp could just be copied verbatim.

ā¤‹ Read More
In-reply-to » Now WTF!? Suddenly, @falsifian's feed renders broken in my tt Python implementation. Exactly what I had with my Go rewrite. I haven't touched the Python stuff in ages, though. Also, tt and tt2 do not share any data at all.

@lyse@lyse.isobeef.org Sorry, I donā€™t think I ever had charset=utf8. I just noticed that a few days ago. OpenBSDā€™s httpd might not support including a parameter with the mime type, unfortunately. Iā€™m going to look into it.

ā¤‹ Read More
In-reply-to » @prologic Yeah, that thing with (#hash;#originalHash) would also work.

@movq@www.uninformativ.de

Maybe Iā€™m being a bit too purist/minimalistic here. As I said before (in one of the 1372739 posts on this topic ā€“ or maybe I didnā€™t even send that twt, I donā€™t remember šŸ˜…), I never really liked hashes to begin with. They arenā€™t super hard to implement but they are kind of against the beauty of the original twtxt ā€“ because you need special client support for them. Itā€™s not something that you could write manually in your twtxt.txt file. With @sorenpeter@darch.dkā€™s proposal, though, that would be possible.

Tangentially related, I was a bit disappointed to learn that the twt subject extension is now never used except with hashes. Manually-written subjects sounded so beautifully ad-hoc and organic as a way to disambiguate replies. Maybe Iā€™ll try it some time just for fun.

ā¤‹ Read More
In-reply-to » @falsifian TLS won't help you if you change your domain name. How will people know if it's really you? Maybe that's not the biggest problem for something with such low stakes as twtxt, but it's a reasonable concern that could be solved using signatures from an unchanging cryptographic key.

@prologic@twtxt.net

(#w4chkna) @falsifian@www.falsifian.org You mean the idea of being able to inline # url = changes in your feed?

Yes, that one. But @lyse@lyse.isobeef.org pointed out suffers a compatibility issue, since currently the first listed url is used for hashing, not the last. Unless your feed is in reverse chronological order. Heh, I guess another metadata field could indicate which version to use.

Or maybe url changes could somehow be combined with the archive feeds extension? Could the url metadata field be local to each archive file, so that to switch to a new url all you need to do is archive everything youā€™ve got and start a new file at the new url?

I donā€™t think itā€™s that likely my feed url will change.

ā¤‹ Read More
In-reply-to » @falsifian TLS won't help you if you change your domain name. How will people know if it's really you? Maybe that's not the biggest problem for something with such low stakes as twtxt, but it's a reasonable concern that could be solved using signatures from an unchanging cryptographic key.

@mckinley@twtxt.net Yes, changing domains is be a problem if you tie your identity to an https url. But I also worry about being stuck with a key I canā€™t rotate. Whatever gets used, it would be nice to be able to rotate identities. I like @lyse@lyse.isobeef.orgā€™s idea for that.

ā¤‹ Read More
In-reply-to » @prologic earlier you suggested extending hashes to 11 characters, but here's an argument that they should be even longer than that.

@prologic@twtxt.net Brute force. I just hashed a bunch of versions of both tweets until I found a collision.

I mostly just wanted an excuse to write the program. I donā€™t know how I feel about actually using super-long hashes; could make the twts annoying to read if you prefer to view them untransformed.

ā¤‹ Read More

@prologic@twtxt.net earlier you suggested extending hashes to 11 characters, but hereā€™s an argument that they should be even longer than that.

Imagine I found this twt one day at https://example.com/twtxt.txt :

2024-09-14T22:00Z Useful backup command: rsync -a ā€œ$HOMEā€ /mnt/backup

Image

and I responded with ā€œ(#5dgoirqemeq) Thanks for the tip!ā€. Then Iā€™ve endorsed the twt, but it could latter get changed to

2024-09-14T22:00Z Useful backup command: rm -rf /some_important_directory

Image

which also has an 11-character base32 hash of 5dgoirqemeq. (Iā€™m using the existing hashing method with https://example.com/twtxt.txt as the feed url, but Iā€™m taking 11 characters instead of 7 from the end of the base32 encoding.)

Thatā€™s what I meant by ā€œspoofingā€ in an earlier twt.

I donā€™t know if preventing this sort of attack should be a goal, but if it is, the number of bits in the hash should be at least two times log2(number of attempts we want to defend against), where the ā€œtwo timesā€ is because of the birthday paradox.

Side note: current hashes always end with ā€œaā€ or ā€œqā€, which is a bit wasteful. Maybe we should take the first N characters of the base32 encoding instead of the last N.

Code I used for the above example: https://fossil.falsifian.org/misc/file?name=src/twt_collision/find_collision.c
I only needed to compute 43394987 hashes to find it.

ā¤‹ Read More
In-reply-to » Interesting.. QUIC isn't very quick over fast internet.

@prologic@twtxt.net

Theyā€™re in Section 6:

  • Receiver should adopt UDP GRO. (Something about saving CPU processing UDP packets; Iā€™m a but fuzzy about it.) And they have suggestions for making GRO more useful for QUIC.

  • Some other receiver-side suggestions: ā€œsending delayed QUICK ACKsā€; ā€œusing recvmsg to read multiple UDF packets in a single system callā€.

  • Use multiple threads when receiving large files.

ā¤‹ Read More
In-reply-to » @prologic Some criticisms and a possible alternative direction:

@mckinley@twtxt.net

HTTPS is supposed to do [verification] anyway.

TLS provides verification that nobody is tampering with or snooping on your connection to a server. It doesnā€™t, for example, verify that a file downloaded from server A is from the same entity as the one from server B.

I was confused by this response for a while, but now I think I understand what youā€™re getting at. You are pointing out that with signed feeds, I can verify the authenticity of a feed without accessing the original server, whereas with HTTPS I canā€™t verify a feed unless I download it myself from the origin server. Is that right?

I.e. if the HTTPS origin server is online and I donā€™t mind taking the time and bandwidth to contact it, then perhaps signed feeds offer no advantage, but if the origin server might not be online, or I want to download a big archive of lots of feeds at once without contacting each server individually, then I need signed feeds.

feed locations [being] URLs gives some flexibility

It does give flexibility, but perhaps we should have made them URIs instead for even more flexibility. Then, you could use a tag URI, urn:uuid:*, or a regular old URL if you wanted to. The spec seems to indicate that the url tag should be a working URL that clients can use to find a copy of the feed, optionally at multiple locations. Iā€™m not very familiar with IP{F,N}S but if it ensures you own an identifier forever and that identifier points to a current copy of your feed, it could be a great way to fix it on an individual basis without breaking any specs :)

Iā€™m also not very familiar with IPFS or IPNS.

I havenā€™t been following the other twts about signatures carefully. I just hope whatever you smart people come up with will be backwards-compatible so it still works if Iā€™m too lazy to change how I publish my feed :-)

ā¤‹ Read More
In-reply-to » Interesting.. QUIC isn't very quick over fast internet.

@xuu Thanks for the link. I found a pdf on one of the authorsā€™ home pages: https://ahmadhassandebugs.github.io/assets/pdf/quic_www24.pdf . I wonder how the protocol was evaluated closer to the time it became a standard, and whether anything has changed. I wonder if network speeds have grown faster than CPU speeds since then. The paper says the performance is around the same below around 600 Mbps.

To be fair, I donā€™t think QUIC was ever expected to be faster for transferring a single stream of data. I think QUIC is supposed to reduce the impact of a dropped packet by making sure it only affects the stream itā€™s part of. I imagine QUIC still has that advantage, and this paper is showing the other side of a tradeoff.

ā¤‹ Read More
In-reply-to » @prologic Some criticisms and a possible alternative direction:

@lyse@lyse.isobeef.org This looks like a nice way to do it.

Another thought: if clients canā€™t agree on the url (for example, if we switch to this new way, but some old clients still do it the old way), that could be mitigated by computing many hashes for each twt: one for every url in the feed. So, if a feed has three URLs, every twt is associated with three hashes when it comes time to put threads together.

A client stills need to choose one url to use for the hash when composing a reply, but this might add some breathing room if thereā€™s a period when clients are doing different things.

(From what I understand of jenny, this would be difficult to implement there since each pseudo-email can only have one msgid to match to the in-reply-to headers. I donā€™t know about other clients.)

ā¤‹ Read More
In-reply-to » All this hash breakage made me wonder if we should try to introduce ā€œmessage IDsā€ after all. šŸ˜…

@movq@www.uninformativ.de Another idea: just hash the feed url and time, without the message content. And donā€™t twt more than once per second.

Maybe you could even just use the time, and rely on @-mentions to disambiguate. Not sure how that would work out.

Though I kind of like the idea of twts being immutable. At least, itā€™s clear which version of a twt youā€™re replying to (assuming nobody is engineering hash collisions).

ā¤‹ Read More
In-reply-to » On the Subject of Feed Identities; I propose the following:

In fact, maybe your public key idea is compatible with my last point. Just come up with a url scheme that means ā€œthis feedā€™s primary URL is actually a public keyā€, and then feed authors can optionally switch to that.

ā¤‹ Read More
In-reply-to » On the Subject of Feed Identities; I propose the following:

@prologic@twtxt.net Some criticisms and a possible alternative direction:

  1. Key rotation. Iā€™m not a security person, but my understanding is that itā€™s good to be able to give keys an expiry date and replace them with new ones periodically.

  2. It makes maintaining a feed more complicated. Now instead of just needing to put a file on a web server (and scan the logs for user agents) I also need to do this. What brought me to twtxt was its radical simplicity.

Instead, maybe we should think about a way to allow old urls to be rotated out? Like, my metadata could somehow say that X used to be my primary URL, but going forward from date D onward my primary url is Y. (Or, if you really want to use public key cryptography, maybe something similar could be used for key rotation there.)

Itā€™s nice that your scheme would add a way to verify the twts you download, but https is supposed to do that anyway. If you donā€™t trust https to do that (maybe you donā€™t like relying on root CAs?) then maybe your preferred solution should be reflected by your primary feed url. E.g. if you prefer the security offered by IPFS, then maybe an IPNS url would do the trick. The fact that feed locations are URLs gives some flexibility. (But then rotation is still an issue, if I understand ipns right.)

ā¤‹ Read More
In-reply-to » All this hash breakage made me wonder if we should try to introduce ā€œmessage IDsā€ after all. šŸ˜…

@movq@www.uninformativ.de @prologic@twtxt.net Another option would be: when you edit a twt, prefix the new one with (#[old hash]) and some indication that itā€™s an edited version of the original tweet with that hash. E.g. if the hash used to be abcd123, the new version should start ā€œ(#abcd123) (redit)ā€.

What I like about this is that clients that donā€™t know this convention will still stick it in the same thread. And I feel itā€™s in the spirit of the old pre-hash (subject) convention, though thatā€™s before my time.

I guess it may not work when the edited twt itself is a reply, and there are replies to it. Maybe that could be solved by letting twts have more than one (subject) prefix.

But the great thing about the current system is that nobody can spoof message IDs.

I donā€™t think twtxt hashes are long enough to prevent spoofing.

ā¤‹ Read More
In-reply-to » Serious open (for anyone) question: what makes you follow someone on twtxt? Will you just follow anyone that you come across, simply because that someone using the "decentralised, minimalist microblogging service for hackers" microblog?

@bender@twtxt.net So far Iā€™ve been following feeds fairly liberally. Iā€™ll check to see if we have anything in common and lean toward following, just because this is new to me and it feels like a small community. But Iā€™m still figuring out what I want. Later Iā€™ll probably either trim my follower list or come up with some way to prioritize the feeds Iā€™m more interested in.

ā¤‹ Read More
In-reply-to » I guess I can configure neomutt to hide the feeds I don't care about.

@prologic@twtxt.net One of your twts begins with (#st3wsda): https://twtxt.net/twt/bot5z4q

Based on the twtxt.net web UI, it seems to be in reply to a twt by @cuaxolotl@sunshinegardens.org which begins ā€œIā€™ve been sketching outā€¦ā€.

But jenny thinks the hash of that twt is 6mdqxrq. At least, thereā€™s a very twt in their feed with that hash that has the same text as appears on yarn.social (except with ā€˜ instead of ā€™).

Based on this, it appears jenny and yarnd disagree about the hash of the twt, or perhaps the twt was edited (though I canā€™t see any difference, assuming ā€™ vs ā€™ is just a rendering choice).

ā¤‹ Read More
In-reply-to » @movq Is there a good way to get jenny to do a one-off fetch of a feed, for when you want to fill in missing parts of a thread? I just added @slashdot to my private follow file just because @prologic keeps responding to the feed :-P and I want to know what he's commenting on even though I don't want to see every new slashdot twt.

@prologic@twtxt.net I believe you when you say registries as designed today do not crawl. But when I first read the spec, it conjured in my mind a search engine. Now I donā€™t know how things work out in practice, but just based on reading, I donā€™t see why it canā€™t be an API for a crawling search engine. (In fact I donā€™t see anything in the spec indicating registry servers shouldnā€™t crawl.)

(I also noticed that https://twtxt.readthedocs.io/en/latest/user/registry.html recommends ā€œThe registries should sync each others user list by using the users endpointā€. If I understood that right, registering with one should be enough to appear on others, even if they donā€™t crawl.)

Does yarnd provide an API for finding twts? Is it similar?

ā¤‹ Read More
In-reply-to » @movq Is there a good way to get jenny to do a one-off fetch of a feed, for when you want to fill in missing parts of a thread? I just added @slashdot to my private follow file just because @prologic keeps responding to the feed :-P and I want to know what he's commenting on even though I don't want to see every new slashdot twt.

@prologic@twtxt.net I guess I thought they were search engines. Anyway, the registry API looks like a decent one for searching for tweets. Could/should yarn.social pods implement the same API?

ā¤‹ Read More
In-reply-to » I guess I can configure neomutt to hide the feeds I don't care about.

I just manually followed the steps at https://dev.twtxt.net/doc/twthashextension.html and got 6mdqxrq. I wonder what happened. Did @cuaxolo@sunshinegardens.org edit the twt in some subtle way after twtxt.net downloaded it? I couldnā€™t spot a diff, other than ā€˜ appearing as ā€™ on yarn.social, which I assume is a transformation done by twtxt.net.

ā¤‹ Read More
In-reply-to » @movq Is there a good way to get jenny to do a one-off fetch of a feed, for when you want to fill in missing parts of a thread? I just added @slashdot to my private follow file just because @prologic keeps responding to the feed :-P and I want to know what he's commenting on even though I don't want to see every new slashdot twt.

@prologic@twtxt.net Whatā€™s the difference between search.twtxt.net and the /api/plain/tweets endpoint of a registry? In my mind, a registry is a twtxt search engine. Or are registries not supposed to do their own crawling to discover new feeds?

ā¤‹ Read More
In-reply-to » @movq Is there a good way to get jenny to do a one-off fetch of a feed, for when you want to fill in missing parts of a thread? I just added @slashdot to my private follow file just because @prologic keeps responding to the feed :-P and I want to know what he's commenting on even though I don't want to see every new slashdot twt.

@prologic@twtxt.net How does yarn.socialā€™s API fix the problem of centralization? I still need to know whose API to use.

Say I see a twt beginning (#hash) and I want to look up the start of the thread. Is the idea that if that twt is hosted by a a yarn.social pod, it is likely to know the thread start, so I should query that particular pod for the hash? But what if no yarn.social pods are involved?

The community seems small enough that a registry server should be able to keep up, and I can have a couple of others as backups. Or I could crawl the list of feeds followed by whoever emitted the twt that prompted my query.

I have successfully used registry servers a little bit, e.g. to find a feed that mentioned a tag I was interested in. Was even thinking of making my own, if I get bored of my too many other projects :-)

ā¤‹ Read More
In-reply-to » I guess I can configure neomutt to hide the feeds I don't care about.

@movq@www.uninformativ.de Thanks, it works!

But when I tried it out on a twt from @prologic@twtxt.net, I discovered jenny and yarn.social seem to disagree about the hash of this twt: https://twtxt.net/twt/st3wsda . jenny assigned it a hash of 6mdqxrq but the URL and prologicā€™s reply suggest yarn.social thinks the hash is st3wsda. (And as a result, jenny ā€“fetch-context didnā€™t work on prologicā€™s twt.)

ā¤‹ Read More
In-reply-to » @movq Is there a good way to get jenny to do a one-off fetch of a feed, for when you want to fill in missing parts of a thread? I just added @slashdot to my private follow file just because @prologic keeps responding to the feed :-P and I want to know what he's commenting on even though I don't want to see every new slashdot twt.

@prologic@twtxt.net Yes, fetching the twt by hash from some service could be a good alternative, in case the twt I have does not @-mention the source. (Besides yarnd, maybe this should be part of the registry API? I donā€™t see fetch-by-hash in the registry API docs.)

ā¤‹ Read More
In-reply-to » @movq Is there a good way to get jenny to do a one-off fetch of a feed, for when you want to fill in missing parts of a thread? I just added @slashdot to my private follow file just because @prologic keeps responding to the feed :-P and I want to know what he's commenting on even though I don't want to see every new slashdot twt.

@movq@www.uninformativ.de I donā€™t know if Iā€™d want to discard the twts. I think what Iā€™m looking for is a command ā€œjenny -g https://host.org/twtxt.txtā€ to fetch just that one feed, even if itā€™s not in my follow list. I could wrap that in a shell script so that when I see a twt in reply to a feed I donā€™t follow, I can just tap a key and the feed will get added to my maildir. I guess the script would look for a mention at the start of a selected twt and call jenny -g on the feed.

ā¤‹ Read More
In-reply-to » @bender I'm not a yarnd user, but automatically unfollowing on 404 doesn't seem right. Besides @lyse's example, I could imagine just accidentally renaming my own twtxt file, or forgetting to push it when I point my DNS to a new web server. I'd rather not lose all my yarnd followers in a situation like that (and hopefully they feel the same).

(@anth@a.9srv.netā€™s feed almost never works, but I keep it because they told me they want to fix their server some time.)

ā¤‹ Read More
In-reply-to » @movq Is there a good way to get jenny to do a one-off fetch of a feed, for when you want to fill in missing parts of a thread? I just added @slashdot to my private follow file just because @prologic keeps responding to the feed :-P and I want to know what he's commenting on even though I don't want to see every new slashdot twt.

I guess I can configure neomutt to hide the feeds I donā€™t care about.

ā¤‹ Read More
In-reply-to » @bender I'm not a yarnd user, but automatically unfollowing on 404 doesn't seem right. Besides @lyse's example, I could imagine just accidentally renaming my own twtxt file, or forgetting to push it when I point my DNS to a new web server. I'd rather not lose all my yarnd followers in a situation like that (and hopefully they feel the same).

@bender@twtxt.net Based on my experience so far, as a user, I would be upset if my client dropped someone from my follower list, i.e. stopped fetching their feed, without me asking for that to happen.

ā¤‹ Read More
In-reply-to » I'm wrong! Both 404 and 410, among others, are considered dead feeds: https://git.mills.io/yarnsocial/yarn/src/branch/main/internal/cache.go#L1343 Whatever that actually means.

@bender@twtxt.net Iā€™m not a yarnd user, but automatically unfollowing on 404 doesnā€™t seem right. Besides @lyse@lyse.isobeef.orgā€™s example, I could imagine just accidentally renaming my own twtxt file, or forgetting to push it when I point my DNS to a new web server. Iā€™d rather not lose all my yarnd followers in a situation like that (and hopefully they feel the same).

ā¤‹ Read More
In-reply-to » New Research Reveals AI Lacks Independent Learning, Poses No Existential Threat ZipNada writes: New research reveals that large language models (LLMs) like ChatGPT cannot learn independently or acquire new skills without explicit instructions, making them predictable and controllable. The study dispels fears of these models developing complex reasoning abilities, emphasizing that while LLMs can genera ... āŒ˜ Read more

@prologic@twtxt.net The headline is interesting and sent me down a rabbit hole understanding what the paper (https://aclanthology.org/2024.acl-long.279/) actually says.

The result is interesting, but the Neuroscience News headline greatly overstates it. If Iā€™ve understood right, they are arguing (with strong evidence) that the simple technique of making neural nets bigger and bigger isnā€™t quite as magically effective as people say ā€” if you use it on its own. In particular, they evaluate LLMs without two common enhancements, in-context learning and instruction tuning. Both of those involve using a small number of examples of the particular task to improve the modelā€™s performance, and they turn them off because they are not part of what is called ā€œemergenceā€: ā€œan ability to solve a task which is absent in smaller models, but present in LLMsā€.

They show that these restricted LLMs only outperform smaller models (i.e demonstrate emergence) on certain tasks, and then (end of Section 4.1) discuss the nature of those few tasks that showed emergence.

Iā€™d love to hear more from someone more familiar with this stuff. (Iā€™ve done research that touches on ML, but neural nets and especially LLMs arenā€™t my area at all.) In particular, how compelling is this finding that zero-shot learning (i.e. without in-context learning or instruction tuning) remains hard as model size grows.

ā¤‹ Read More
In-reply-to » I love shell scripts because theyā€™re so pragmatic and often allow me to get jobs done really quickly.

@movq@www.uninformativ.de Variable names used with -eq in [[ ]] are automatically expanded even without $ as explained in the ā€œARITHMETIC EVALUATIONā€ section of the bash man page. Interesting. Trying this on OpenBSDā€™s ksh, it seems ā€œset -uā€ doesnā€™t affect that substitution.

ā¤‹ Read More
In-reply-to » @movq The success of large neural nets. People love to criticize today's LLMs and image models, but if you compare them to what we had before, the progress is astonishing.

@prologic@twtxt.net I donā€™t know what you mean when you call them stochastic parrots, or how you define understanding. Itā€™s certainly true that current language models show an obvious lack of understanding in many situations, but I find the trend impressive. I would love to see someone achieve similar results with much less power or training data.

ā¤‹ Read More