In-reply-to » I played around with parsers. This time I experimented with parser combinators for twt message text tokenization. Basically, extract mentions, subjects, URLs, media and regular text. It's kinda nice, although my solution is not completely elegant, I have to say. Especially my communication protocol between different steps for intermediate results is really ugly. Not sure about performance, I reckon a hand-written state machine parser would be quite a bit faster. I need to write a second parser and then benchmark them.

Very cool. I like the chain rules. I wonder how it performs against lextwt.

⤋ Read More