@bender@twtxt.net Hahaha, I had to look this idiom up, but you’re spot on. :-D
I heard a funny saying today: Democracy is when three foxes and a bunny decide what to have for dinner.
I can’t make it as I’m on a hike with a mate.
@prologic@twtxt.net How is nick@domain any better than a feed URL? Changing the nick now also breaks threading. That’s even worse than the current approach. Also, there might be multiple feeds with same nicks on one domain, e.g. on free hosters.
Phew! I now finally called it a day as well. Our customer wanted me to emergency-start implementing some changes. Got an initial version with unit tests, but the final testing must wait until Monday.
-P
is a life saver when running rsync
over spotty connections. In my very illiterate opinion, it should always be a default.
@mckinley@twtxt.net I could have sworn that it resumed even a partial file the other week. But maybe that was because the first attempt used scp
when the connection broke. And then rsync
detected that only the last part of that file was incomplete and transferred the missing bits. So, lucky by accident. In any case, I will always include -P
from now on. :-)
Ah, I see! Thanks, @bender@twtxt.net.
@david@collantes.us Sounds lovely. :-)
We had rain all day long and my mate and I still went for a walk with our umbrellas. It was a bit wet. But now I can send my drying rack over the tub on its maiden voyage. Should have built a second rod for more capacity.
@david@collantes.us Enjoy the day off and fingers crossed that you survive without damages. Stay safe!
Good writeup, @anth@a.9srv.net! I agree to most of your points.
3.2 Timestamps: I feel no need to mandate UTC. Timezones are fine with me. But I could also live with this new restriction. I fail to see, though, how this change would make things any easier compared to the original format.
3.4 Multi-Line Twts: What exactly do you think are bad things with multi-lines?
4.1 Hash Generation: I do like the idea with with a new uuid
metadata field! Any thoughts on two feeds selecting the same UUID for whatever reason? Well, the same could happen today with url
.
5.1 Reply to last & 5.2 More work to backtrack: I do not understand anything you’re saying. Can you rephrase that?
8.1 Metadata should be collected up front: I generally agree, but if the uuid
metadata field were a feed URL and no real UUID, there should be probably an exception to change the feed URL mid-file after relocation.
I passed a mountainbiker with a helmet camera in the forst, saw a four centimeter long black beetle that rolled over its side to change directions and finally spotted three deer on the paddock. An hour well spent I reckon.
Finally! After hours I figured out my problems.
The clever Go code to filter out completely read conversations got in the way with the filtering now moved into SQL. Yeah, I also did not think that this could ever conflict. But it did. Initializing the
completeConversationRead
flag totrue
got now in my way, this caused a conversation to be removed. Simply deleting all the code around that flag solved it.Generation of missing conversation roots in SQL simply used the oldest (smallest) timestamp from any direct reply in the tree. To find the missing roots I grouped by subject and then aggregated using
min(created_at)
. Now that I optimized this to only take unread messages into consideration in the first place, I do not necessarily see the smallest child anymore (when it’s already read), so the timestamp is then moved forward to the next oldest unread reply. As I do not care too much about an accurate timestamp for something made up, I just adjusted my test case accordingly. Good enough for me. :-)
It’s an interesting experiment with SQLite so far. I certainly did learn a few things along the way. Mission accomplished.
@prologic@twtxt.net Ta! Somehow, my unit tests break, though. Running the same query manually looks like it’s producing a plausible looking result, though. I do not understand it.
@david@collantes.us As far as I understand it, auto-completion is working, that’s the issue. :-D Instead of spamming the terminal with bucketloads of possibilities, zsh’s auto-complete is nice enough to ask whether to proceed or not.
-P
is a life saver when running rsync
over spotty connections. In my very illiterate opinion, it should always be a default.
@david@collantes.us Weird, I always thought that rsync automatically resumes the up- or download when aborted. But the manual indicates otherwise with --partial
(-P
is --partial --progress
).
@prologic@twtxt.net I reckon, I could just hash the subject internally to get a shorter version.
Three feeds (prologic, movq and mine) and my database is already 1.3 MiB in size. Hmm. I actually got the read filter working. More on that later after polishing it.
rsync(1)
but, whenever I Tab
for completion and get this:
@aelaraji@aelaraji.com @mckinley@twtxt.net rsync -avzr
with an optional --progress
is what I always use. Ah, I could use the shorter -P
, thanks @movq@www.uninformativ.de.
@movq@www.uninformativ.de Interesting, it’s always good to know how things work under the hood. But I’m very glad, that I do not have to deal with this low-level stuff. :-)
@prologic@twtxt.net @movq@www.uninformativ.de Luckily, we were only touched by the thunderstorm cell. Even though the sky lit up a bunch and the thunder roared, there were no close thunderbolts. But it rained cats and dogs. The air smelled lovely.
@eapl.me@eapl.me All the best, see you next life around. :-) On Twtxt I only meet my online friends. I’m staying in touch with some of my real life mates on IRC or e-mail. But that’s fine. That’s just how it goes.
Thanks, @bender@twtxt.net. :-)
@aelaraji@aelaraji.com Hahaha, brilliant! :-D
We’re now having a thunderstorm with rain, lightning and thunder and the severe weather map shows all green. I’d expect it to be violet.
Okay, I figured out the cause of the broken output. I also replaced the first subject = ''
for the existing conversation roots with subject > ''
. Somehow, my brain must have read subject <> ''
. That equality check should not have been touched at all. I just updated the updated archive for anyone who is interested to follow along: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (151.1 KiB)
@prologic@twtxt.net Yeah, relational databases are definitely not the perfect fit for trees, but I want to give it a shot anyway. :-)
Using EXPLAIN QUERY PLAN
I was able to create two indices, to avoid some table scans:
CREATE INDEX parent ON messages (hash, subject);
CREATE INDEX subject_created_at ON messages (subject, created_at);
Also, since strings are sortable, instead of str_col <> ''
I now use str_col > ''
to allow the use of an index.
But somehow, my output seems to be broken at the end for some reason, I just noticed. :-? Hmm.
The read status still gives me headache. I think I either have to filter in the application or create more meta data structures in the database.
I’m wondering if anyone here already used certain storages for tree data.
@prologic@twtxt.net I see. I reckon, it makes to combine 1 and 2, because if we change the hashing anyway, we don’t break it twice.
This organigram example got me started: https://www.sqlite.org/lang_with.html#controlling_depth_first_versus_breadth_first_search_of_a_tree_using_order_by
But I feel execution times get worse rather quickly with more data I add. Also, caching helps tremendously, executing it for the first time took over 600ms. From then on I’m down to 40ms.
I think, it’s particularly bad that parents might be missing. Thus, I cannot use an index, because there is no parent to reference. But my database knowledge is fairly limited, so I have to read up on that.
There you go, @prologic@twtxt.net, the SQLite database (with a bit more data now) and the sqlitebrowser project file containing the query: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (133.9 KiB)
@falsifian@www.falsifian.org I agreee. It’s an optional header.
@movq@www.uninformativ.de Oha! @bender@twtxt.net Happy cooling off!
@prologic@twtxt.net Well, mentions are also quite lengthy as they always include the feed URL. I know, that’s not a good argument.
I just got a very, very wild idea that I have not put any brain power into, so it might be totally stupid: Since many replies also mention the original feed, maybe a mention and thread identifier could be compbined, something like: @<nick url timestamp>
. But then we would also need another style if one does not want to mention the original author.
So, scratch that. But I put it out there anyway. Maybe this inspires someone else to come up with something neat.
@prologic@twtxt.net Not sure how many actually care about a 140 character limit. I don’t. Not at all.
@prologic@twtxt.net I’m wondering what exactly you mean by incremental changes, what are the individual ones? What do you have in mind?
@prologic@twtxt.net I find it quite hard to rank the facets. Some go hand in hand or depend on the protocol that a feed is offered. I feel some are only relevant to specific clients. I’m sure, people interpret some of them differently.
I’m curious, is it possible to see each individual poll submission?
I’m experimenting with SQLite and trees. It’s going good so far with only my own 439 messages long main feed from a few days ago in the cache. Fetching these 632 rows took 20ms:
Now comes the real tricky part, how do I exclude completely read threads?
@movq@www.uninformativ.de Heaps of mozzies and other stuff that wants to eats you. Yeah, I noticed that as well. But I don’t know if it’s really more than usual. I might just have forgotten how bad it was in the past by now. :-?
With the wet beginning this year, water-loving insects certainly got a head start.
@prologic@twtxt.net Correct. The plan is that operators have to manually trust a peer before it is used for fetching missing conversation roots from. Preview of the horrible UI:
@bender@twtxt.net Yeah, it was nice. 23°C and a bit of wind. Quite acceptable in my opinion. :-)
@prologic@twtxt.net @movq@www.uninformativ.de In all reality, even seconds precision would be enough for this new feed announcement bot. It just has to delay or predate its messages. It hopefully does not find new feeds all the time. :-)
@prologic@twtxt.net What should happen if the archive chain is detected to be broken? I don’t think that including the hash in the prev
field does really help us in reality. What if messages in the archive feed themselves got lost? You can’t detect this unless you’ve already known about them. I reckon we can simply use the relative path and call it good. I know, I know, we have this format already today. But in my opinion, the hash does not add value.
@prologic@twtxt.net The Content-Type
should probably even include the charset=utf-8
as we learned recently. :-) Iff you want to keep the UTF-8 encoding mandatory. It doesn’t say anything about it in that document.
@prologic@twtxt.net The reply-to
can come anywhere in the message text? Most examples even put it at the very end. Why relax that? It currently has to be at the beginning, which I think makes parsing easier. I have to admit, at the end makes reading the raw feed nicer. But multi-line messages with U+2028 ruin the raw feed reading experience very quickly.
@prologic@twtxt.net For hash calculation we could maybe rethink the newlines and use tabs instead. This is more in line with the twtxt file format itself. With tabs it also is much closer to the registry format (minus the nick).
What about the timestamp format? Just verbatim as it appears in the feed (what I would recommend) or any other shenanigans with normalization, like +00:00 → Z
?
An append style is not required, btw. If one uses prepend style feeds, the new URL simply comes at the beginning of the file, where the old URL is further down.
Clients must use the full-length hash in their storages, but only use the first eleven digits when referencing? This differentiation is a bit odd.
@prologic@twtxt.net The multline example is broken. I don’t see any “pipes”.
@prologic@twtxt.net I notice that in your document it says reply-to
, where in the ReplyTo Extension it’s without the hyphen. (But they also use different values after the colon. :-))
Thanks again for typing it up, @movq@www.uninformativ.de! I left a few comments there. Currently, I’m in favor of the location-based adressing, that’s heaps simpler.
@sorenpeter@darch.dk Excellent point! I agree.
@bender@twtxt.net @prologic@twtxt.net @aelaraji@aelaraji.com Everything entering over Pod Gossiping is only cached temporarily, but never archived. So, it eventually fell off the cache. If my fake feeds were still up, yarnd would have pulled it from me again. I ran into the situation locally as well and then got it back, though.
@movq@www.uninformativ.de Awesome, thank you very much! I’ll have a look at it tomorrow.