Fun video about #Unicode #UTF8. I knew about the historical context and fundamental implementation ideas already, but I didn’t know about the Hangul combinations block trick mentioned in the end… clever stuff.
@lyse@lyse.isobeef.org The underlines are a bit much, yes. It appears to be related to my font (Helvetica) … Maybe they do some Unicode trickery these days, I don’t know. 🫤
fn sub(foo: &String) {
println!("We got this string: [{}]", foo);
}
fn main() {
// "Hello", 0x00, 0x00, "!"
let buf: [u8; 8] = [0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x00, 0x00, 0x21];
// Create a string from the byte array above, interpret as UTF-8, ignore decoding errors.
let lossy_unicode = String::from_utf8_lossy(&buf).to_string();
sub(&lossy_unicode);
}
Create a string from a byte array, but the result isn’t a string, it’s a cow 🐮, so you need another to_string() to convert your “string” into a string.
- https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy
- https://doc.rust-lang.org/std/borrow/enum.Cow.html
I still have a lot to learn.
(into_owned() instead of to_string() also works and makes more sense to me, it’s just that the compiler suggested to_string() first, which led to this funny example.)
(Where is there no bass emoji in Unicode? Pah.)
Hmmm, when I Ctrl+Left to jump a word left, I get 1;5D in my tt2 message text. My TERM is set to rxvt-unicode-256color. In tt, it works just fine. When I change to TERM=xterm-256color, it also works in tt2. I have to read up on that. Maybe even try to capture these sequences and rewrite them.
@arne@uplegger.eu Ohjemine, TYPO3! O_o Lass mich schreiend davonlaufen!
Mit dieser absoluten Katastrophensoftware vor dem Herrn haben wir mal ein Studienprojekt gemacht. Die hat alle Vorurteile komplett übererfüllt. Angefangen von Fehlerseiten, die statt 4xx oder dergleichen immer mit HTTP 200 ausgeliefert wurden oder auch, dass das generierte HTML leider einfach ungültig war. Über die Implementierung von Löschen durch einen Deleted-Schalter in der Datenbank, das Speichern von Passwörtern im Klartext bis hin zu völlig umständlichen Bedienungskonzepten. Alles hat immer brutal viele Schritte gebraucht. Das Zeilennummernrumgeeier im TYPO-Script erinnerte eher an Basic. Uns kam es auch so vor, als ob man damit nicht ernsthaft was sinnvolles machen könnte.
Zu allem Überfluss hatte irgendwer noch ein ganz hundsmiserables Buch ausgegraben, das als Vorbereitung dienen sollte. Ich kann mich zum Glück weder an den Titel noch den Autor erinnern, aber ich weiß noch, wie das komplett inkonsistent geschrieben war. Anfangs gabs mehrere Seiten zu Unicode und UTF-8 wurde angepriesen, aber alle Beispiele haben dann auf ISO-8859-1 gesetzt. Gezeigter Beispielcode war häufig unterste Schublade. Selten hab ich so merkwürdige Erklärungen gelesen: „Wenn Sie die Sicherheitswarnhinweise stören, kommentieren Sie doch bitte im Quelltext die die()-Funktion in $ZEILE aus.“ Oder ein anderer Klassiker: „Ausgeschrieben würde der Code wohl folgendes tun…“. War sich der Autor also nicht ganz sicher, ob sein Codeschnipsel vllt. doch in Wahrheit was ganz anderes tut.
Seit diesem gigantischen Trauma (das hat mich wirklich sehr nachhaltig geprägt, wie man Dinge nicht machen sollte) hab ich erfolgreich einen Bogen um das TYPO3-Universum gemacht.
Ich kann nur hoffen, dass es zwischenzeitlich ein wenig besser geworden ist. Aber Deinem Kurzbericht zufolge scheint da ja immer noch der Wurm drin zu sein. Mein Beileid! :-(
Righto, @eapl.me@eapl.me, ta for the writeup. Here we go. :-)
Metadata on individual twts are too much for me. I do like the simplicity of the current spec. But I understand where you’re coming from.
Numbering twts in a feed is basically the attempt of generating message IDs. It’s an interesting idea, but I reckon it is not even needed. I’d simply use location based addressing (feed URL + ‘#’ + timestamp) instead of content addressing. If one really wanted to, one could hash the feed URL and timestamp, but the raw form would actually improve disoverability and would not even require a richer client. But the majority of twtxt users in the last poll wanted to stick with content addressing.
yarnd actually sends If-Modified-Since request headers. Not only can I observe heaps of 304 responses for yarnds in my access log, but in Cache.FetchFeeds(…) we can actually see If-Modified-Since being deployed when the feed has been retrieved with a Last-Modified response header before: https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/cache.go#L1278
Turns out etags with If-None-Match are only supported when yarnd serves avatars (https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/handlers.go#L158) and media uploads (https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/media_handlers.go#L71). However, it ignores possible etags when fetching feeds.
I don’t understand how the discovery URLs should work to replace the User-Agent header in HTTP(S) requests. Do you mind to elaborate?
Different protocols are basically just a client thing.
I reckon it’s best to just avoid mixing several languages in one feed in the first place. Personally, I find it okay to occasionally write messages in other languages, but if that happens on a more regularly basis, I’d definitely create a different feed for other languages.
Isn’t the emoji thing “just” a client feature? So, feed do not even have to state any emojis. As a user I’d configure my client to use a certain symbol for feed ABC. Currently, I can do a similar thing in tt where I assign colors to feeds. On the other hand, what if a user wants to control what symbol should be displayed, similar to the feed’s nick? Hmm. But still, my terminal font doesn’t even render most of emojis. So, Unicode boxes everywhere. This makes me think it should actually be a only client feature.
@bender@twtxt.net @prologic@twtxt.net I’m not exactly asking yarnd to change. If you are okay with the way it displayed my twts, then by all means, leave it as is. I hope you won’t mind if I continue to write things like 1/4 to mean “first out of four”.
What has text/markdown got to do with this? I don’t think Markdown says anything about replacing 1/4 with ¼, or other similar transformations. It’s not needed, because ¼ is already a unicode character that can simply be directly inserted into the text file.
What’s wrong with my original suggestion of doing the transformation before the text hits the twtxt.txt file? @prologic@twtxt.net, I think it would achieve what you are trying to achieve with this content-type thing: if someone writes 1/4 on a yarnd instance or any other client that wants to do this, it would get transformed, and other clients simply wouldn’t do the transformation. Every client that supports displaying unicode characters, including Jenny, would then display ¼ as ¼.
Alternatively, if you prefer yarnd to pretty-print all twts nicely, even ones from simpler clients, that’s fine too and you don’t need to change anything. My 1/4 -> ¼ thing is nothing more than a minor irritation which probably isn’t worth overthinking.
Unicode doesn’t distinguish between a dollar sign with one and a dollar sign with two strokes, which makes me sad.
Account Problems
⌘ Read more
Weird Unicode Math Symbols
⌘ Read more
someday i will descend upon the unicode consortium and add sub/superscript version of the whole latin alphabet
@lyse@lyse.isobeef.org What the heck? no emoji? do you even Unicode!
@lyse@lyse.isobeef.org What the heck? no emoji? do you even Unicode!
@prologic@twtxt.net Yeah like normally I’m just a little annoyed and just say “whatever” and shrug it off, but come on I am searching for emojis here. Do you really need to harvest my user data for what is essentially a fuzzy search in the Unicode table?
@prologic@twtxt.net lol. just testing some Unicode.
@prologic@twtxt.net lol. just testing some Unicode.
On the blog: Where Have All the Emoji Gone? https://john.colagioia.net/blog/2021/09/29/emoji.html #programming #techtips #unicode #blog
https://metacpan.org/release/WOLFSAGE/perl-5.35.4/changes#Unicode-14.0-is-supported Perl 5.35.4 版之後所對應的 Unicode 版本已經推進到 14.0.0 了。
@prologic@twtxt.net should we enable all unicode glyphs for tags? https://txt.sour.is/conv/55yrura
@prologic@twtxt.net should we enable all unicode glyphs for tags? https://txt.sour.is/conv/55yrura
I wrote a ‘banner’-like program for Plan 9 (and p9p) that uses the Unicode box drawing characters: http://txtpunk.com/banner/index.html
huh, txtnish seems to have problems with linebreaks & unicode;.
Teletext graphics characters among those added to Unicode – Teletext Art http://teletextart.co.uk/teletext-graphics-characters-among-those-added-to-unicode
Because of the use of ‘rune’ to refer to unicode codepoints in go, a fulthark transliteration program might have somewhat confusing source…
A Spectre is Haunting Unicode https://www.dampfkraft.com/ghost-characters.html
unum - Interconvert numbers, Unicode, and HTML/XHTML characters http://www.fourmilab.ch/webtools/unum/
Does unicode also work? 💚☎
@quite@lublin.se there is also a unicode symbol in there, maybe that?
All fonts I tried are either ugly or the unicode glyphs are so small that they become unreadable. #fonts
All fonts I tried are either ugly or the unicode glyphs are so small that they become unreadable. #fonts
The pretty format is very similar to twtxt without the unicode glyphs and the relative date.
The pretty format is very similar to twtxt without the unicode glyphs and the relative date.