[LRUG] Parsing text
Tim Cowlishaw
tim at timcowlishaw.co.uk
Wed May 14 01:11:24 PDT 2014
It sounds like you need Parsing Expression Grammars, a nice, declarative
way of solving exactly this sort of problem.
Treetop is probably the most-used ruby implementation:
http://treetop.rubyforge.org/
However, the canonical use-case for this sort of thing is writing parsers
for programming languages, and I've been unable to find documentation or
examples for the use-cases you describe. Still the principles should be the
same.
Hope this helps!
Cheers,
Tim
On 14 May 2014 08:49, Andrew Stewart <boss at airbladesoftware.com> wrote:
> Hello El Rug!
>
> From time to time I encounter a situation where I would like to parse
> (semi-)structured text. I'm sure this is trivial with the correct
> approach. Regrettably I don't know anything about parsers/compilers/etc
> and I end up hand-rolling fragile, line-based state machines which are soon
> impossible to reason about.
>
> I'd like to know how to do this properly but I don't know where to begin.
>
> Here are a couple of specific examples:
>
> - In 2006 before SASS etc existed, I wrote a Rails plugin for nested CSS.
> It read a nested stylesheet and flattened it into normal CSS. Back then I
> wasn't sure how to parse a nested stylesheet...and I still don't know how.
> (Stop laughing at the back!)
>
> - A few months ago I needed to convert hospital admissions records from a
> PDF to CSV. Each record had fields like id, name, various dates, clinical
> history, attending doctor, etc. The fields weren't always in the same
> order due to the layout of text in the PDF, and some fields were optional.
> Sometimes there were several fields on a line, and a field could be spread
> over several lines. I did my usual thing of looping over each line,
> matching field names with regular expressions, and trying to keep track of
> where I was with a state variable. Its sole virtue was that it (sort of)
> worked; otherwise it was horrible: hard to understand, hard to modify, hard
> to extend, and very hard to debug.
>
> Please could someone enlighten me?
>
> Cheers,
> Andy Stewart
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20140514/8ce2298f/attachment-0003.html>
More information about the Chat
mailing list