[LRUG] Reading a "large" spreadsheet

Andrew Stewart boss at airbladesoftware.com
Fri Mar 15 07:51:28 PDT 2013


On 15 Mar 2013, at 14:43, Jon Wood wrote:
> I'm not sure how practical this is in the context of xlsx files, but if its XML and therefore text could you run the file through grep to strip out any empty rows at the end of the sheet before trying to parse it?

Bingo.  It turns out all those 900,000 empty rows looked like <row blah><c blah/></row>.

-   File.open(item) do |file|
-     Nokogiri::XML(file)
-   end
+   f = File.read item
+   f.gsub! /<row [^>]+>(?:<c [^>]+\/>)*<\/row>/, ''
+   Nokogiri::XML f

Problem solved!  Now I can read the spreadsheet in seconds.

The SAX version can wait for another day, phew...

Thanks everybody for all the ideas,

Andy


More information about the Chat mailing list