txt.sour.is lyse@ "Alright, I didn't try it, but in Go regexes one should be able to just replace `\w` with `\p{L}` and in Python regexes `\w` just needs the `re.U ..."

Mon, Dec 6 10:45 2021 (2y ago)

↳ In-reply-to » Let's see, how tt's parser and lextwt behave. Nothing better than using PROD to do some experiments: @äöüß. Looks like at least tt will accept it as a nick. I have to admit, I'm quite surprised. Would have bet against it.

Alright, I didn’t try it, but in Go regexes one should be able to just replace \w with \p{L} and in Python regexes \w just needs the re.UNICODE flag. At least, that’s the theory. Now, before actually implementing this, we should carefully think about all the implications we’re creating with that. Also, I don’t know whether this is enough for scripts like Japanese or the like where there are no letters but “syllables”. (And now you know, that I don’t a clue about Asian scripts.)

⤋ Read More