@movq@www.uninformativ.de Non-ASCII characters were broken. Like U+2028, degrees (°), etc.

Turns out I used a silly library to detect the encoding and transform to UTF-8 if needed. When there is no Content-Type header, like for local files, it looks at the first 1024 bytes. Since it only saw ASCII in that region, the damn thing assumed the data to be in Windows-1252 (which for web pages kinda makes sense):

// TODO: change default depending on user's locale?
return charmap.Windows1252, "windows-1252", false

https://cs.opensource.google/go/x/net/+/master:html/charset/charset.go;l=102

This default is hardcoded and cannot be changed.

Trying to be smart and adding automatic support for other encodings turned out to be a bad move on my end. At least I can reduce my dependency list again. :-)

I now just reject everything that explicitly specifies something different than text/plain and an optional charset other than utf-8 (ignoring casing). Otherwise I assume it’s in UTF-8 (just like the twtxt file format specification mandates) and hope for the best.

⤋ Read More

Now WTF!? Suddenly, @falsifian@www.falsifian.org’s feed renders broken in my tt Python implementation. Exactly what I had with my Go rewrite. I haven’t touched the Python stuff in ages, though. Also, tt and tt2 do not share any data at all.

By any chance, did you remove the ; charset=utf-8 from your Content-Type: text/plain header, falsifian?

interpreted in some crappy windows charset
Download

interpreted in some crappy windows charset

⤋ Read More

Participate

Login to join in on this yarn.