[LRUG] Reading a "large" spreadsheet

David Burrows david at designsuperbuild.com
Fri Mar 15 06:47:50 PDT 2013


One of the tricks i've used lately with temperamental Excel files is
importing them into Google Docs and then exporting back from there. Fixes
no end of problems including some weird Excel encoding problems.

Is there any reason your not just using CSV?



-- 
David Burrows
079 1234 2125
@dburrows

http://www.designsuperbuild.com/ | @dsgnsprbld


On Fri, Mar 15, 2013 at 1:40 PM, Andrew Stewart
<boss at airbladesoftware.com>wrote:

> Hello El Rug,
>
> In the past I have successfully used roo [1] to read xlsx spreadsheets.
>  However I have run into a stumbling block with the latest spreadsheet I
> need to read – it takes tens of minutes or more, rather than a few seconds.
>
>     require 'rubygems'
>     require 'roo'
>     s = Roo::Excelx.new 'spreadsheet.xlsx'
>
> When I open the spreadsheet in Numbers I can see there are ~900 rows and
> ~40 columns in the first worksheet (the other two worksheets have next to
> nothing).  On disc that worksheet is 53MB of XML.  roo uses Nokogiri to
> read the XML and sure enough when I try to open the worksheet XML file
> directly with Nokogiri it takes ~1.5min – still a long time.
>
>     require 'rubygems'
>     require 'nokogiri'
>     doc = Nokogiri::XML File.open('spreadsheet.xlsx')
>
> The one time I did manage to read the spreadsheet with roo, it told me
> there were ~900,000 rows.  Presumably 899,000 are empty.  And presumably
> this is the problem.
>
> Unfortunately I cannot make the spreadsheet available because it's
> confidential to a customer.
>
> How can I read the 900 actual rows in a few seconds?  (I'm going to ask my
> customer if they can somehow save their Excel file differently but in the
> meantime...)  I'd prefer a pure-Ruby solution but at the end of the day I
> just need to get it read so I don't mind calling out to something else and
> getting the results back as, say, CSV.
>
> Any suggestions?
>
> Many thanks in advance!
>
> Cheers,
> Andy Stewart
>
> [1] https://github.com/Empact/roo
>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20130315/00eb2dda/attachment.html>


More information about the Chat mailing list