I recently got an email with this byte sequence:
\xf0\x9f\x8e\x81\xf0\x9f\x95\xaf\xef\xb8\x8f
That’s U+1F381, U+1F56F, U+FE0F. The last one is a “variation selector”:
https://unicodeplus.com/U+FE0F
My toolkit renders this incorrectly – and so do tmux and GNU screen.
Unicode ain’t easy. 🥴
Could it be that Source Sans Pro changed recently? No… Somehow at some point ✳ was replaced with ⚹ in my markdown files… I have no idea how this happened.
#Unicode #Typography
@bender@twtxt.net I’m already using it for tracktivity (meant for tracking activities and events, like weather, food consumption, stuff like that), which is basically a somewhat-fancy CSV editor:
https://movq.de/v/f26eb836ee/s.png
I have a couple of other projects where I could use it, because they are plain curses at the moment. Like, one of them has an “edit box”, but you can’t enter Unicode, because it was too complicated. That would benefit from the framework.
Either way, it’s the most satisfying project in a long time and I’m learning a ton of stuff.
tcell.Key constants and typing different key combinations in the terminal to see the generated tcell.EventKeys in the debug log. Until I pressed Ctrl+Alt+Backspace… :-D Yep, suddenly there went my X…
@movq@www.uninformativ.de Yeah, I know that terminals are super weird and messy. In both the KDE Konsole (identifying itself as TERM=xterm-256color) and xterm (TERM=xterm) it just works flawlessly. My urxvt (TERM=rxvt-unicode-256color) just doesn’t. I also tried messing with TERM in urxvt, but no luck so far.
More widget system progress:
https://movq.de/v/87e2bce376/vid-1767467193.mp4
I like the oldschool shadow effect. 😅 Not sure if I’ll keep it, but it’s neat.
The menu bar is still fake.
Had to spend quite a bit of time optimizing the rendering today. This can get really slow really quickly.
Unicode is Pain.
I might be able to start porting my first program (currently uses urwid) soon. 🤔
Why have these Unicode smilies never caught on, I wonder? 🤪
Well, you girls and guys are making cool things, and I have some progress to show as well. 😅
https://movq.de/v/c0408a80b1/movwin.mp4
Scrolling widgets appears to work now. This is (mostly) Unicode-aware: Note how emojis like “😅” are double-width “characters” and the widget system knows this. It doesn’t try to place a “😅” in a location where there’s only one cell available.
Same goes for that weird “ä” thingie, which is actually “a” followed by U+0308 (a combining diacritic). Python itself thinks of this as two “characters”, but they only occupy one cell on the screen. (Assuming your terminal supports this …)
This library does the heavy Unicode lifting: https://github.com/jquast/wcwidth (Take a look at its implementation to learn how horrible Unicode and human languages are.)
The program itself looks like this, it’s a proper widget hierarchy:
https://movq.de/v/1d155106e2/s.png
(There is no input handling yet, hence some things are hardwired for the moment.)
@movq@www.uninformativ.de I see. Yeah, all the Unicode stuff certainly doesn’t help here, that’s for sure.
Maybe “speedcurses” could be a name. Or just select any Palatinate curse. ;-)
@lyse@lyse.isobeef.org I’m toying with the idea of making a widget/window system on top of Python’s ncurses. I’ve never really been happy with the existing ones (like urwid, textual, pytermgui, …). I mean, they’re not horrible, it’s mostly the performance that’s bugging me – I don’t want to wait an entire second for a terminal program to start up.
Not sure if I’ll actually see it through, though. Unicode makes this kind of thing extremely hard. 🫤
Fun video about #Unicode #UTF8. I knew about the historical context and fundamental implementation ideas already, but I didn’t know about the Hangul combinations block trick mentioned in the end… clever stuff.
@lyse@lyse.isobeef.org The underlines are a bit much, yes. It appears to be related to my font (Helvetica) … Maybe they do some Unicode trickery these days, I don’t know. 🫤
fn sub(foo: &String) {
println!("We got this string: [{}]", foo);
}
fn main() {
// "Hello", 0x00, 0x00, "!"
let buf: [u8; 8] = [0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x00, 0x00, 0x21];
// Create a string from the byte array above, interpret as UTF-8, ignore decoding errors.
let lossy_unicode = String::from_utf8_lossy(&buf).to_string();
sub(&lossy_unicode);
}
Create a string from a byte array, but the result isn’t a string, it’s a cow 🐮, so you need another to_string() to convert your “string” into a string.
- https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy
- https://doc.rust-lang.org/std/borrow/enum.Cow.html
I still have a lot to learn.
(into_owned() instead of to_string() also works and makes more sense to me, it’s just that the compiler suggested to_string() first, which led to this funny example.)
(Where is there no bass emoji in Unicode? Pah.)
Hmmm, when I Ctrl+Left to jump a word left, I get 1;5D in my tt2 message text. My TERM is set to rxvt-unicode-256color. In tt, it works just fine. When I change to TERM=xterm-256color, it also works in tt2. I have to read up on that. Maybe even try to capture these sequences and rewrite them.
@arne@uplegger.eu Ohjemine, TYPO3! O_o Lass mich schreiend davonlaufen!
Mit dieser absoluten Katastrophensoftware vor dem Herrn haben wir mal ein Studienprojekt gemacht. Die hat alle Vorurteile komplett übererfüllt. Angefangen von Fehlerseiten, die statt 4xx oder dergleichen immer mit HTTP 200 ausgeliefert wurden oder auch, dass das generierte HTML leider einfach ungültig war. Über die Implementierung von Löschen durch einen Deleted-Schalter in der Datenbank, das Speichern von Passwörtern im Klartext bis hin zu völlig umständlichen Bedienungskonzepten. Alles hat immer brutal viele Schritte gebraucht. Das Zeilennummernrumgeeier im TYPO-Script erinnerte eher an Basic. Uns kam es auch so vor, als ob man damit nicht ernsthaft was sinnvolles machen könnte.
Zu allem Überfluss hatte irgendwer noch ein ganz hundsmiserables Buch ausgegraben, das als Vorbereitung dienen sollte. Ich kann mich zum Glück weder an den Titel noch den Autor erinnern, aber ich weiß noch, wie das komplett inkonsistent geschrieben war. Anfangs gabs mehrere Seiten zu Unicode und UTF-8 wurde angepriesen, aber alle Beispiele haben dann auf ISO-8859-1 gesetzt. Gezeigter Beispielcode war häufig unterste Schublade. Selten hab ich so merkwürdige Erklärungen gelesen: „Wenn Sie die Sicherheitswarnhinweise stören, kommentieren Sie doch bitte im Quelltext die die()-Funktion in $ZEILE aus.“ Oder ein anderer Klassiker: „Ausgeschrieben würde der Code wohl folgendes tun…“. War sich der Autor also nicht ganz sicher, ob sein Codeschnipsel vllt. doch in Wahrheit was ganz anderes tut.
Seit diesem gigantischen Trauma (das hat mich wirklich sehr nachhaltig geprägt, wie man Dinge nicht machen sollte) hab ich erfolgreich einen Bogen um das TYPO3-Universum gemacht.
Ich kann nur hoffen, dass es zwischenzeitlich ein wenig besser geworden ist. Aber Deinem Kurzbericht zufolge scheint da ja immer noch der Wurm drin zu sein. Mein Beileid! :-(
Righto, @eapl.me@eapl.me, ta for the writeup. Here we go. :-)
Metadata on individual twts are too much for me. I do like the simplicity of the current spec. But I understand where you’re coming from.
Numbering twts in a feed is basically the attempt of generating message IDs. It’s an interesting idea, but I reckon it is not even needed. I’d simply use location based addressing (feed URL + ‘#’ + timestamp) instead of content addressing. If one really wanted to, one could hash the feed URL and timestamp, but the raw form would actually improve disoverability and would not even require a richer client. But the majority of twtxt users in the last poll wanted to stick with content addressing.
yarnd actually sends If-Modified-Since request headers. Not only can I observe heaps of 304 responses for yarnds in my access log, but in Cache.FetchFeeds(…) we can actually see If-Modified-Since being deployed when the feed has been retrieved with a Last-Modified response header before: https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/cache.go#L1278
Turns out etags with If-None-Match are only supported when yarnd serves avatars (https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/handlers.go#L158) and media uploads (https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/media_handlers.go#L71). However, it ignores possible etags when fetching feeds.
I don’t understand how the discovery URLs should work to replace the User-Agent header in HTTP(S) requests. Do you mind to elaborate?
Different protocols are basically just a client thing.
I reckon it’s best to just avoid mixing several languages in one feed in the first place. Personally, I find it okay to occasionally write messages in other languages, but if that happens on a more regularly basis, I’d definitely create a different feed for other languages.
Isn’t the emoji thing “just” a client feature? So, feed do not even have to state any emojis. As a user I’d configure my client to use a certain symbol for feed ABC. Currently, I can do a similar thing in tt where I assign colors to feeds. On the other hand, what if a user wants to control what symbol should be displayed, similar to the feed’s nick? Hmm. But still, my terminal font doesn’t even render most of emojis. So, Unicode boxes everywhere. This makes me think it should actually be a only client feature.
(#fmnhewq) @bender@bender Which feed has Unicode newlines in the desc? Hmm 🧐
@bender Which feed has Unicode newlines in the desc? Hmm 🧐 ⌘ Read more
@bender@twtxt.net @prologic@twtxt.net I’m not exactly asking yarnd to change. If you are okay with the way it displayed my twts, then by all means, leave it as is. I hope you won’t mind if I continue to write things like 1/4 to mean “first out of four”.
What has text/markdown got to do with this? I don’t think Markdown says anything about replacing 1/4 with ¼, or other similar transformations. It’s not needed, because ¼ is already a unicode character that can simply be directly inserted into the text file.
What’s wrong with my original suggestion of doing the transformation before the text hits the twtxt.txt file? @prologic@twtxt.net, I think it would achieve what you are trying to achieve with this content-type thing: if someone writes 1/4 on a yarnd instance or any other client that wants to do this, it would get transformed, and other clients simply wouldn’t do the transformation. Every client that supports displaying unicode characters, including Jenny, would then display ¼ as ¼.
Alternatively, if you prefer yarnd to pretty-print all twts nicely, even ones from simpler clients, that’s fine too and you don’t need to change anything. My 1/4 -> ¼ thing is nothing more than a minor irritation which probably isn’t worth overthinking.
Unicode doesn’t distinguish between a dollar sign with one and a dollar sign with two strokes, which makes me sad.
Account Problems
⌘ Read more
Weird Unicode Math Symbols
⌘ Read more
someday i will descend upon the unicode consortium and add sub/superscript version of the whole latin alphabet
@lyse@lyse.isobeef.org What the heck? no emoji? do you even Unicode!
@lyse@lyse.isobeef.org What the heck? no emoji? do you even Unicode!
@prologic@twtxt.net Yeah like normally I’m just a little annoyed and just say “whatever” and shrug it off, but come on I am searching for emojis here. Do you really need to harvest my user data for what is essentially a fuzzy search in the Unicode table?
@prologic@twtxt.net lol. just testing some Unicode.
@prologic@twtxt.net lol. just testing some Unicode.
On the blog: Where Have All the Emoji Gone? https://john.colagioia.net/blog/2021/09/29/emoji.html #programming #techtips #unicode #blog
https://metacpan.org/release/WOLFSAGE/perl-5.35.4/changes#Unicode-14.0-is-supported Perl 5.35.4 版之後所對應的 Unicode 版本已經推進到 14.0.0 了。
@prologic@twtxt.net should we enable all unicode glyphs for tags? https://txt.sour.is/conv/55yrura
@prologic@twtxt.net should we enable all unicode glyphs for tags? https://txt.sour.is/conv/55yrura
I wrote a ‘banner’-like program for Plan 9 (and p9p) that uses the Unicode box drawing characters: http://txtpunk.com/banner/index.html
huh, txtnish seems to have problems with linebreaks & unicode;.
Teletext graphics characters among those added to Unicode – Teletext Art http://teletextart.co.uk/teletext-graphics-characters-among-those-added-to-unicode
Because of the use of ‘rune’ to refer to unicode codepoints in go, a fulthark transliteration program might have somewhat confusing source…
A Spectre is Haunting Unicode https://www.dampfkraft.com/ghost-characters.html
unum - Interconvert numbers, Unicode, and HTML/XHTML characters http://www.fourmilab.ch/webtools/unum/
Does unicode also work? 💚☎
@quite@lublin.se there is also a unicode symbol in there, maybe that?
All fonts I tried are either ugly or the unicode glyphs are so small that they become unreadable. #fonts
All fonts I tried are either ugly or the unicode glyphs are so small that they become unreadable. #fonts
The pretty format is very similar to twtxt without the unicode glyphs and the relative date.
The pretty format is very similar to twtxt without the unicode glyphs and the relative date.