[LRUG] Parsing text

Mark van Harmelen mark at hedtek.com
Wed May 14 02:22:23 PDT 2014


I recently enjoyed using parslet (once I learned about using the output
transformations).

http://kschiess.github.io/parslet/

I haven't used treetop so can't compare parselet with treetop. Anyone?

regards
mark


On Wed, May 14, 2014 at 9:11 AM, Tim Cowlishaw <tim at timcowlishaw.co.uk>wrote:

> It sounds like you need Parsing Expression Grammars, a nice, declarative
> way of solving exactly this sort of problem.
>
> Treetop is probably the most-used ruby implementation:
> http://treetop.rubyforge.org/
>
> However, the canonical use-case for this sort of thing is writing parsers
> for programming languages, and I've been unable to find documentation or
> examples for the use-cases you describe. Still the principles should be the
> same.
>
> Hope this helps!
>
> Cheers,
>
> Tim
>
>
>
> On 14 May 2014 08:49, Andrew Stewart <boss at airbladesoftware.com> wrote:
>
>> Hello El Rug!
>>
>> From time to time I encounter a situation where I would like to parse
>> (semi-)structured text.  I'm sure this is trivial with the correct
>> approach.  Regrettably I don't know anything about parsers/compilers/etc
>> and I end up hand-rolling fragile, line-based state machines which are soon
>> impossible to reason about.
>>
>> I'd like to know how to do this properly but I don't know where to begin.
>>
>> Here are a couple of specific examples:
>>
>> - In 2006 before SASS etc existed, I wrote a Rails plugin for nested CSS.
>>  It read a nested stylesheet and flattened it into normal CSS.  Back then I
>> wasn't sure how to parse a nested stylesheet...and I still don't know how.
>>  (Stop laughing at the back!)
>>
>> - A few months ago I needed to convert hospital admissions records from a
>> PDF to CSV.  Each record had fields like id, name, various dates, clinical
>> history, attending doctor, etc.  The fields weren't always in the same
>> order due to the layout of text in the PDF, and some fields were optional.
>>  Sometimes there were several fields on a line, and a field could be spread
>> over several lines.  I did my usual thing of looping over each line,
>> matching field names with regular expressions, and trying to keep track of
>> where I was with a state variable.  Its sole virtue was that it (sort of)
>> worked; otherwise it was horrible: hard to understand, hard to modify, hard
>> to extend, and very hard to debug.
>>
>> Please could someone enlighten me?
>>
>> Cheers,
>> Andy Stewart
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>
>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20140514/090562a6/attachment-0003.html>


More information about the Chat mailing list