@movq@www.uninformativ.de OK, to be more specific: it does to the point of adding twts to the correct file.
I’ve not checked actual file rotation. With max_twts_per_rotation
set to 100 and me posting ~ once a week first roatation will take place in two years ;-)
I feel like
README
will need a rework soon. There’s a lot of options now. Or maybe a manpage instead.
For example that local_twtxt_dir
MUST end in path separator should be mentioned somewhere ;-)
@movq@www.uninformativ.de Works like a charm!
@movq@www.uninformativ.de Great work! I wish we could make all those BIG twtxt writers to use it ;-)
I’ve a problem with local_twtxt_file
not beeing supported any more. Being forced to use twtxt.txt
as file name breaks at least my URL.
@movq@www.uninformativ.de I always understood it as good practice to early catch hardware errors.
@movq@www.uninformativ.de Indeed! I’m sorry for that!
@movq@www.uninformativ.de Manpage says
The user is supposed to run it manually or via a periodic system
service. The recommended period is a month but could be less.
So me doing it weekly is a bit over cautious. It’s often overseen by users that they are supposed to perform this task regularly.
Not that easy to decide when coming home from work: which site do I visit first?
@movq@www.uninformativ.de Don’t forget to btrfs scrub
e.g. once a week.
I’m using btrfs scrub -B /dev/xyz
and mail the result to myself.
@will@twtxt.net At work we are using KeePass with Multi Cert KeyProvider Plugin.
https://www.creative-webdesign.de/en/software/keepass-plugins/multi-cert-keyprovider
We leave master password empty. Each person needs an own certificate to access the database file.
Not using a master password makes it easy to add or remove people with access w/o changing (and sharing) a master password.
@prologic@twtxt.net Very nice board and figures. Do they actually fit in the drawer?
@movq@www.uninformativ.de Thank you very much for implementing this! It’s very useful (at least for me)!
Makefile
question?
@adi@f.adi.onl What about this one?
SRCFILES = $(wildcard *)
# remove existing *.gz (actually doubles entries)
CLEANSRC = $(SRCFILES:.gz=)
DSTFILES = $(addsuffix .gz, $(CLEANSRC))
%.gz: %
gzip -c $< > $<.gz
all: $(DSTFILES)
You must not have subdirectories in that folder, though.
:q!
?
@xuu@txt.sour.is Well, the point is, things do not work like this.
Actually in nano you would have to ctrl-k ctrl-k ctrl-x y to discard your reply.
@movq@www.uninformativ.de I don’t by your example (rebasing behaviour), sorry.
Writing a twt is more similiar to writing a commit message. Git does quite some checks to detect that nothing new was written and happily discards a commit if you just leave the editor. You don’t need any special action, just quit your editor. Git will take care for the rest.
But it’s OK as it is. I just didn’t expect that I have to select and delete all to discard a twt. So it’s C-x h C-w C-x C-c for me.
@movq@www.uninformativ.de Yes, this may be enough to check.
I only know this “feature” from my revision control software where I get “abort: empty Commit message” or “Aborting commit due to empty commit message” when I do not change whatever is already in there. Can be quite some text about which files changed and so on.
@movq@www.uninformativ.de My workflow is as follows.
I hit “reply” hotkey and my editor comes up.
With or without writing something I close my editor without saving the content.
Of course I close it by C-x C-c, not by :q! ;-)
Jenny finds the temp file unchanged, e.g. it’s content is the same as it was when my editor was started. I would like that jenny discards the reply then.
Autosaving is no problem either. Real editors do this to a temporary (kind of backup) file. Only in case of a crash that file is consulted and the user is asked if she would like to continue with that stored content.
jenny -f
. Have now jennys cache under revision control, automatically commiting changes after each fetch. Let's see if this helps finding a (possible) bug.
@movq@www.uninformativ.de Your scenario would produce observed behaviour, agreed. On the other side I’m sure I’ve set very URL in lasttwt > 1630000000.0 (manually, in my editor).
But I can’t reproduce any weird behaviour right now. I’ve tried to “blackhole” twt.nfld.uk temporarily. That does not have any effect.
I’ve also tried to force twt.nfld.uk to deliver an empty twtxt. That does not have any effect either.
So I guess everything is fine with jenny.
I have wrapped jenny into some shell script to versionize ~/.cache/jenney
. This way I have better data if anything unexprected is showing again.
@prologic@twtxt.net I’ve deleted eleven
and utf8test
, https://search.twtxt.net is the only follower. Maybe you can stop it to follow those twtxts? They were meant for testing purposes only.
Funny bug in LG TV: last Saturday I scheduled some film from yesterday for recording. Actual recording yesterday started 1 hour late. Looks like although TV knows actual time perfectly well it was not capable to “translate” schedule from CEST to CET.
jenny -f
. Have now jennys cache under revision control, automatically commiting changes after each fetch. Let's see if this helps finding a (possible) bug.
@movq@www.uninformativ.de Yes, it was exactly those twts. I don’t think I’ve managed to “match” the downtime while fetching twts. But even if I had, how can this lead to inserting old twts?
@movq@www.uninformativ.de Another feature request: sometimes I start writing a twt but then would like to discard it. It would be great if jeny could detect that I did not wrote (or saved) anything and then discards the twt instead of creating an “empty” one.
@movq@www.uninformativ.de Today I had unexpected old twts after jenny -f
. Have now jennys cache under revision control, automatically commiting changes after each fetch. Let’s see if this helps finding a (possible) bug.
jenny
has never failed me 😂. It is so neat, powerful, and streamlined, not even funny! Thank you very much, @movq for it! 💛
I want to second that!
track-lasttwts
, I’ve started implementing that “don’t recreate deleted mail files” thingy. So when you delete/move/archive twts, they should no longer reappear when you run jenny -f
. Feel free to give this branch a try. 👌 (Bugs may lurk, it’s very fresh.)
@movq@www.uninformativ.de What do you think about this?
diff –git a/jenny b/jenny
index b47c78e..20cf659 100755
— a/jenny
+++ b/jenny
@@ -278,7 +278,8 @@ def prefill_for(email, reply_to_this, self_mentions):
def process_feed(config, nick, url, content, lasttwt):
nick_address, nick_desc = decide_nick(content, nick)
url_for_hash = decide_url_for_hash(content, url)
new_lasttwt = parse(‘1800-01-01T12:00:00+00:00’).timestamp()
# new_lasttwt = parse(‘1800-01-01T12:00:00+00:00’).timestamp()
new_lasttwt = None
for line in twt_lines_from_content(content):
res = twt_line_to_mail(
@@ -296,7 +297,7 @@ def process_feed(config, nick, url, content, lasttwt):
twt_stamp = twt_date.timestamp() if lasttwt is not None and lasttwt >= twt_stamp: continue
if twt_stamp > new_lasttwt:
if not new_lasttwt or twt_stamp > new_lasttwt:
new_lasttwt = twt_stamp mailname_new = join(config['maildir_target'], 'new', twt_hash)
track-lasttwts
, I’ve started implementing that “don’t recreate deleted mail files” thingy. So when you delete/move/archive twts, they should no longer reappear when you run jenny -f
. Feel free to give this branch a try. 👌 (Bugs may lurk, it’s very fresh.)
@movq@www.uninformativ.de I just observed unexpected old twts coming back.
It looks like lasttwts
is reset to -5364619200.0 every time no new content wasfetched for example if if-modified-since
did not produce new twts?
tt
really sucks, it's terrible!
@lyse@lyse.isobeef.org I’m seeing your response as reply to #p522joq, where it doesn’t seem to belong to. Did this happen by accident or is there a bug hiding somewhere?
@prologic@twtxt.net I’m seeing your response as reply to #p522joq, where it doesn’t seem to belong to. Did this happen by accident or is there a bug hiding somewhere?
track-lasttwts
, I’ve started implementing that “don’t recreate deleted mail files” thingy. So when you delete/move/archive twts, they should no longer reappear when you run jenny -f
. Feel free to give this branch a try. 👌 (Bugs may lurk, it’s very fresh.)
@movq@www.uninformativ.de Ha, but when you control lastmods
, lastseen
and lasttwts
it’s easy to test.
Works like a charm!
track-lasttwts
, I’ve started implementing that “don’t recreate deleted mail files” thingy. So when you delete/move/archive twts, they should no longer reappear when you run jenny -f
. Feel free to give this branch a try. 👌 (Bugs may lurk, it’s very fresh.)
@movq@www.uninformativ.de Not that easy to test when pods honor if-modified-since
;-)
I’ve almost only timestamps -5364619200.0…
Diff looks good to me!
track-lasttwts
, I’ve started implementing that “don’t recreate deleted mail files” thingy. So when you delete/move/archive twts, they should no longer reappear when you run jenny -f
. Feel free to give this branch a try. 👌 (Bugs may lurk, it’s very fresh.)
@movq@www.uninformativ.de
I’ll test it tomorrow. Thank’s for starting this feature!
F
in their name.
(#el7d3ja) I believe
glob ()
is anO(n)
algorithm
Yes, I see. But don’t underestimate OS caching for files and directories!
If you look up files in the same directory many times then OS may use cached results from earlier lookups.
I’m not totally sure but I believe this is how things work for both, Windows and Linux at least.
@movq@www.uninformativ.de
When I look in my twtxt maildir for duplicated messages they all have F
in their name.
I see that in mail_file_exists
jenny does not consider flagged messages when testing if a message already exists.
I understand that looking up only 12 combinations is faster than reading huge directories. I’m astonished that globbing would be slower. Learning something new every day…
@movq@www.uninformativ.de
I just pulled it, works like a charm (as expected) ;-)
@movq@www.uninformativ.de
I’m not a Python programmer, so please bear with me.
The doc about encodings does also mention:
If you require a different encoding, you can manually set the Response.encoding property
Wouldn’t that be a one liner like (Ruby example)?
'some text'.force_encoding('utf-8')
I understand that you do not want to interfere with requests
. On the other hand we know that received data must be utf-8 (by twtxt spec) and it does burden “publishers” to somehow add charset
property to content-type
header. But again I’m not sure what “the right thing to do” ™ is.
@prologic@twtxt.net @movq@www.uninformativ.de
Exactly, you see correct UTF-8 encoded version (even with content-type: text/plain
leaving out charset declaration).
After following utf8test twtxt myself I now see that jenny
does not handle it as UTF-8 when charset is missing from HTTP header, just like @quark@ferengi.one has observed.
So should jenny
treat twtxt files always as UTF-8 encoded? I’m not sure about this.
@lyse@lyse.isobeef.org
Sorry, I should have mentioned your twt #vjjdara where you already described the same idea.
@movq@www.uninformativ.de
Applause!
I believe Yarn assumes utf-8 anyway which is why we don’t see encoding issues
Are you sure? I think in #kj2c5oa @quark@ferengi.one mentioned exactly that problem. My logs say “jenny/latest” was fetching my twtxt for quark.
All I did to fix this was to adding AddCharset utf-8 .txt
to .htaccess. Especially I did not change encoding of stackeffect.txt.
Don’t miss step 0 (I should have made this a separate point): having a meta header promising appending twts with strictly monotonically increasing timestamps.
(Also, I’d first like to see the pagination thingy implemented.)
In jenny I would like to see “don’t process previously fetched twts” AKA “Allow the user to archive/delete old twts” feature implemented ;-)
What about a meta header for setting charset?
I myself stumbled upon .txt files not being delivered with charset: utf-8
by default.
I had to set/modify .htaccess
to correct that.
It would have been easier if there had been a charset header entry “overwriting” what http server is delivering.
What do you think?
My thoughts about range requests
Additionally to pagination also range request should be used to reduce traffic.
I understand that there are corner cases making this a complicated matter.
I would like to see a meta header saying that the given twtxt is append only with increasing timestamps so that a simple strategy can detect valid content fetched per range request.
- read meta part per range request
- read last fetched twt at expected range (as known from last fetch)
- if fetched content starts with expected twt then process rest of data
- if fetched content doesn’t start with expected twt discard all and fall back to fetching whole twtxt
Pagination (e.g. archiving old content in a different file) will lead to point 4.
Of course especially pods should support range requests, correct @prologic@twtxt.net?
My thoughts about pagination (paging)
Following the discussion about pagination (paging) I think that’s the right thing to do.
Fetching the same content again and again with only a marginal portion of actually new twts is unbearable and does not scale in any way. It’s not only a waste of bandwidth but with increasing number of fetchers it will also become a problem for pods to serve all requests.
Because it’s so easy to implement and simple to understand, splitting twtxt file in parts with next
and prev
pointers seems a really amazing solution.
As in RFC5005 there should also be a meta header pointing to the main URL, e.g. current
or baseurl
or something like that. This way hashes can calculated correctly even for archived twts.
D~d>1m
and then fetched by !jenny -f
. This brings back all deleted twts. Isn't lastmods
used to skip older twts?
I’m curious, what is your use case for deleting twts?
Not just deleting, also sorting into other folders is impossible.
It also doesn’t scale in the long term. When I cannot delete twts then I have a full copy of every twtxt I follow - forever. That’s a waste of bandwidth and disk space.
@movq@www.uninformativ.de How is deletion supposed to work? In mutt I deleted by D~d>1m
and then fetched by !jenny -f
. This brings back all deleted twts. Isn’t lastmods
used to skip older twts?
No, it would be sufficient to skip avatar discovery when metadata does contain an avatar.
@prologic@twtxt.net
Thank you, that’s the correct one.
Still I have this in my logs (first access of “eleven” by yarnd):
ip.ip.ip.ip - - [21/Oct/2021:20:05:36 +0000] “GET /eleven.txt HTTP/2.0” 200 344 “-” “yarnd/0.2.0@46bea3f (Pod: twtxt.net Support: https://twtxt.net/support)”
ip.ip.ip.ip - - [21/Oct/2021:20:05:36 +0000] “HEAD /avatar.png HTTP/2.0” 200 0 “-” “yarnd/0.2.0@46bea3f (Pod: twtxt.net Support: https://twtxt.net/support)”
And I guess without avatar.png sitting there I would have seen even more requests like /eleven.txt/avatar.png.
I’ve copied stackeffect.png to avatar.png to make yarnd happy when accessing stackeffect.txt.
So in this setup yarnd fetched eleven.txt along with avatar.png which belongs to another twtxt. This feels buggy.