Fun video about #Unicode #UTF8. I knew about the historical context and fundamental implementation ideas already, but I didnāt know about the Hangul combinations block trick mentioned in the end⦠clever stuff.
@lyse@lyse.isobeef.org The underlines are a bit much, yes. It appears to be related to my font (Helvetica) ⦠Maybe they do some Unicode trickery these days, I donāt know. š«¤
fn sub(foo: &String) {
println!("We got this string: [{}]", foo);
}
fn main() {
// "Hello", 0x00, 0x00, "!"
let buf: [u8; 8] = [0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x00, 0x00, 0x21];
// Create a string from the byte array above, interpret as UTF-8, ignore decoding errors.
let lossy_unicode = String::from_utf8_lossy(&buf).to_string();
sub(&lossy_unicode);
}
Create a string from a byte array, but the result isnāt a string, itās a cow š®, so you need another to_string()
to convert your āstringā into a string.
- https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy
- https://doc.rust-lang.org/std/borrow/enum.Cow.html
I still have a lot to learn.
(into_owned()
instead of to_string()
also works and makes more sense to me, itās just that the compiler suggested to_string()
first, which led to this funny example.)
(Where is there no bass emoji in Unicode? Pah.)
Hmmm, when I Ctrl+Left
to jump a word left, I get 1;5D
in my tt2 message text. My TERM
is set to rxvt-unicode-256color
. In tt
, it works just fine. When I change to TERM=xterm-256color
, it also works in tt2
. I have to read up on that. Maybe even try to capture these sequences and rewrite them.
@arne@uplegger.eu Ohjemine, TYPO3! O_o Lass mich schreiend davonlaufen!
Mit dieser absoluten Katastrophensoftware vor dem Herrn haben wir mal ein Studienprojekt gemacht. Die hat alle Vorurteile komplett übererfüllt. Angefangen von Fehlerseiten, die statt 4xx oder dergleichen immer mit HTTP 200 ausgeliefert wurden oder auch, dass das generierte HTML leider einfach ungültig war. Ćber die Implementierung von Lƶschen durch einen Deleted-Schalter in der Datenbank, das Speichern von Passwƶrtern im Klartext bis hin zu vƶllig umstƤndlichen Bedienungskonzepten. Alles hat immer brutal viele Schritte gebraucht. Das Zeilennummernrumgeeier im TYPO-Script erinnerte eher an Basic. Uns kam es auch so vor, als ob man damit nicht ernsthaft was sinnvolles machen kƶnnte.
Zu allem Ćberfluss hatte irgendwer noch ein ganz hundsmiserables Buch ausgegraben, das als Vorbereitung dienen sollte. Ich kann mich zum Glück weder an den Titel noch den Autor erinnern, aber ich weiĆ noch, wie das komplett inkonsistent geschrieben war. Anfangs gabs mehrere Seiten zu Unicode und UTF-8 wurde angepriesen, aber alle Beispiele haben dann auf ISO-8859-1 gesetzt. Gezeigter Beispielcode war hƤufig unterste Schublade. Selten hab ich so merkwürdige ErklƤrungen gelesen: āWenn Sie die Sicherheitswarnhinweise stƶren, kommentieren Sie doch bitte im Quelltext die die()
-Funktion in $ZEILE
aus.ā Oder ein anderer Klassiker: āAusgeschrieben würde der Code wohl folgendes tunā¦ā. War sich der Autor also nicht ganz sicher, ob sein Codeschnipsel vllt. doch in Wahrheit was ganz anderes tut.
Seit diesem gigantischen Trauma (das hat mich wirklich sehr nachhaltig geprƤgt, wie man Dinge nicht machen sollte) hab ich erfolgreich einen Bogen um das TYPO3-Universum gemacht.
Ich kann nur hoffen, dass es zwischenzeitlich ein wenig besser geworden ist. Aber Deinem Kurzbericht zufolge scheint da ja immer noch der Wurm drin zu sein. Mein Beileid! :-(
Righto, @eapl.me@eapl.me, ta for the writeup. Here we go. :-)
Metadata on individual twts are too much for me. I do like the simplicity of the current spec. But I understand where youāre coming from.
Numbering twts in a feed is basically the attempt of generating message IDs. Itās an interesting idea, but I reckon it is not even needed. Iād simply use location based addressing (feed URL + ā#ā + timestamp) instead of content addressing. If one really wanted to, one could hash the feed URL and timestamp, but the raw form would actually improve disoverability and would not even require a richer client. But the majority of twtxt users in the last poll wanted to stick with content addressing.
yarnd actually sends If-Modified-Since
request headers. Not only can I observe heaps of 304 responses for yarnds in my access log, but in Cache.FetchFeeds(ā¦)
we can actually see If-Modified-Since
being deployed when the feed has been retrieved with a Last-Modified
response header before: https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/cache.go#L1278
Turns out etags with If-None-Match
are only supported when yarnd serves avatars (https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/handlers.go#L158) and media uploads (https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/media_handlers.go#L71). However, it ignores possible etags when fetching feeds.
I donāt understand how the discovery URLs should work to replace the User-Agent
header in HTTP(S) requests. Do you mind to elaborate?
Different protocols are basically just a client thing.
I reckon itās best to just avoid mixing several languages in one feed in the first place. Personally, I find it okay to occasionally write messages in other languages, but if that happens on a more regularly basis, Iād definitely create a different feed for other languages.
Isnāt the emoji thing ājustā a client feature? So, feed do not even have to state any emojis. As a user Iād configure my client to use a certain symbol for feed ABC. Currently, I can do a similar thing in tt
where I assign colors to feeds. On the other hand, what if a user wants to control what symbol should be displayed, similar to the feedās nick? Hmm. But still, my terminal font doesnāt even render most of emojis. So, Unicode boxes everywhere. This makes me think it should actually be a only client feature.
@bender@twtxt.net @prologic@twtxt.net Iām not exactly asking yarnd to change. If you are okay with the way it displayed my twts, then by all means, leave it as is. I hope you wonāt mind if I continue to write things like 1/4
to mean āfirst out of fourā.
What has text/markdown
got to do with this? I donāt think Markdown says anything about replacing 1/4
with ¼, or other similar transformations. Itās not needed, because ¼ is already a unicode character that can simply be directly inserted into the text file.
Whatās wrong with my original suggestion of doing the transformation before the text hits the twtxt.txt file? @prologic@twtxt.net, I think it would achieve what you are trying to achieve with this content-type thing: if someone writes 1/4
on a yarnd instance or any other client that wants to do this, it would get transformed, and other clients simply wouldnāt do the transformation. Every client that supports displaying unicode characters, including Jenny, would then display ¼ as ¼.
Alternatively, if you prefer yarnd to pretty-print all twts nicely, even ones from simpler clients, thatās fine too and you donāt need to change anything. My 1/4
-> ¼ thing is nothing more than a minor irritation which probably isnāt worth overthinking.
Unicode doesnāt distinguish between a dollar sign with one and a dollar sign with two strokes, which makes me sad.
someday i will descend upon the unicode consortium and add sub/superscript version of the whole latin alphabet
@lyse@lyse.isobeef.org What the heck? no emoji? do you even Unicode!
@prologic@twtxt.net Yeah like normally Iām just a little annoyed and just say āwhateverā and shrug it off, but come on I am searching for emojis here. Do you really need to harvest my user data for what is essentially a fuzzy search in the Unicode table?
@prologic@twtxt.net lol. just testing some Unicode.
On the blog: Where Have All the Emoji Gone? https://john.colagioia.net/blog/2021/09/29/emoji.html #programming #techtips #unicode #blog
https://metacpan.org/release/WOLFSAGE/perl-5.35.4/changes#Unicode-14.0-is-supported Perl 5.35.4 ēä¹å¾ęå°ęē Unicode ēę¬å·²ē¶ęØé²å° 14.0.0 äŗć
@prologic@twtxt.net should we enable all unicode glyphs for tags? https://txt.sour.is/conv/55yrura
I wrote a ābannerā-like program for Plan 9 (and p9p) that uses the Unicode box drawing characters: http://txtpunk.com/banner/index.html
huh, txtnish seems to have problems with linebreaks & unicode;.
Teletext graphics characters among those added to Unicode ā Teletext Art http://teletextart.co.uk/teletext-graphics-characters-among-those-added-to-unicode
Because of the use of āruneā to refer to unicode codepoints in go, a fulthark transliteration program might have somewhat confusing sourceā¦
A Spectre is Haunting Unicode https://www.dampfkraft.com/ghost-characters.html
unum - Interconvert numbers, Unicode, and HTML/XHTML characters http://www.fourmilab.ch/webtools/unum/
Does unicode also work? šā
@quite@lublin.se there is also a unicode symbol in there, maybe that?
All fonts I tried are either ugly or the unicode glyphs are so small that they become unreadable. #fonts
All fonts I tried are either ugly or the unicode glyphs are so small that they become unreadable. #fonts
The pretty format is very similar to twtxt without the unicode glyphs and the relative date.
The pretty format is very similar to twtxt without the unicode glyphs and the relative date.