<div dir="ltr">I think your two examples are quite different - one is a formal language which you could definitely use something like a parser to handle whereas the other sounds a lot more messy - I'm not sure you're going to be able to do better than hacky stringing together regexps in that case.<div>
<br></div><div>I did a presentation on treetop back in the dark days of LRUG 2009 if you're interested: <a href="http://www.slideshare.net/knaveofdiamonds/treetop-id-rather-have-one-problem?type=presentation">http://www.slideshare.net/knaveofdiamonds/treetop-id-rather-have-one-problem?type=presentation</a> - also, have a look at parslet: <a href="http://kschiess.github.io/parslet/">http://kschiess.github.io/parslet/</a> if you're interested in treetop - parslet has a lot nicer error reporting, and the benefit of being real ruby than odd sort-of-ruby.</div>
<div><br></div><div>Cheers,</div><div>Roland</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, May 14, 2014 at 8:49 AM, Andrew Stewart <span dir="ltr"><<a href="mailto:boss@airbladesoftware.com" target="_blank">boss@airbladesoftware.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello El Rug!<br>
<br>
>From time to time I encounter a situation where I would like to parse (semi-)structured text. I'm sure this is trivial with the correct approach. Regrettably I don't know anything about parsers/compilers/etc and I end up hand-rolling fragile, line-based state machines which are soon impossible to reason about.<br>
<br>
I'd like to know how to do this properly but I don't know where to begin.<br>
<br>
Here are a couple of specific examples:<br>
<br>
- In 2006 before SASS etc existed, I wrote a Rails plugin for nested CSS. It read a nested stylesheet and flattened it into normal CSS. Back then I wasn't sure how to parse a nested stylesheet...and I still don't know how. (Stop laughing at the back!)<br>
<br>
- A few months ago I needed to convert hospital admissions records from a PDF to CSV. Each record had fields like id, name, various dates, clinical history, attending doctor, etc. The fields weren't always in the same order due to the layout of text in the PDF, and some fields were optional. Sometimes there were several fields on a line, and a field could be spread over several lines. I did my usual thing of looping over each line, matching field names with regular expressions, and trying to keep track of where I was with a state variable. Its sole virtue was that it (sort of) worked; otherwise it was horrible: hard to understand, hard to modify, hard to extend, and very hard to debug.<br>
<br>
Please could someone enlighten me?<br>
<br>
Cheers,<br>
Andy Stewart<br>
_______________________________________________<br>
Chat mailing list<br>
<a href="mailto:Chat@lists.lrug.org">Chat@lists.lrug.org</a><br>
<a href="http://lists.lrug.org/listinfo.cgi/chat-lrug.org" target="_blank">http://lists.lrug.org/listinfo.cgi/chat-lrug.org</a><br>
</blockquote></div><br></div>