In-reply-to » I played around with parsers. This time I experimented with parser combinators for twt message text tokenization. Basically, extract mentions, subjects, URLs, media and regular text. It's kinda nice, although my solution is not completely elegant, I have to say. Especially my communication protocol between different steps for intermediate results is really ugly. Not sure about performance, I reckon a hand-written state machine parser would be quite a bit faster. I need to write a second parser and then benchmark them.

@lyse@lyse.isobeef.org This is very cool indeed ๐Ÿ‘Œ

โค‹ Read More