[LRUG] Reading a "large" spreadsheet

Fri Mar 15 06:49:07 PDT 2013

On 15 Mar 2013, at 13:40, Andrew Stewart <boss at airbladesoftware.com> wrote:

> How can I read the 900 actual rows in a few seconds?  

I suspect you'll need a SAX parser. Rather than trying to load the whole thing into RAM, a SAX parser fires events as it reads a stream of bytes such as "found an opening <foo> tag" and "found a closing </foo> tag". You just need to write the code that responds to the events. They're much faster on large documents than XML parsers that attempt to load the entire document into memroy.

Nokogiri may be a SAX parser - I've never tried to look inside it. Maybe Roo is using Nokogiri to load everything before trying to operate on it. Again, I don't know.

My first thought would be to write a simple SAX parser that would read enough of the file into RAM to allow you to create a new XML file from the input XML. The SAX parser just needs to know how to stop reading (e.g. when it's found too many consecutive nodes of a certain type that are empty).

That may all be too hard, and Jon's suggestion of hacking the file up with unix may be far more pragmatic. I think it'll depend on the complexity of the file format (i.e. do all the empty records actually occur at the end of the file, or is there valuable data occurring towards the end of the file).

Admittedly, not that actionable, but it's what sprung to mind...

-- 
Graham Ashton
Founder, Agile Planner
https://www.agileplannerapp.com | @agileplanner | @grahamashton