<div dir="ltr"><br><div style>One of the tricks i've used lately with temperamental Excel files is importing them into Google Docs and then exporting back from there. Fixes no end of problems including some weird Excel encoding problems.</div>
<div style><br></div><div style>Is there any reason your not just using CSV?</div><div style><br></div><div style><br></div></div><div class="gmail_extra"><br clear="all"><div><div>-- </div><div>David Burrows</div><div>079 1234 2125</div>
<div>@dburrows</div><div><br></div><div><a href="http://www.designsuperbuild.com/" target="_blank">http://www.designsuperbuild.com/</a> | @dsgnsprbld</div></div>
<br><br><div class="gmail_quote">On Fri, Mar 15, 2013 at 1:40 PM, Andrew Stewart <span dir="ltr"><<a href="mailto:boss@airbladesoftware.com" target="_blank">boss@airbladesoftware.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello El Rug,<br>
<br>
In the past I have successfully used roo [1] to read xlsx spreadsheets. However I have run into a stumbling block with the latest spreadsheet I need to read – it takes tens of minutes or more, rather than a few seconds.<br>
<br>
require 'rubygems'<br>
require 'roo'<br>
s = Roo::Excelx.new 'spreadsheet.xlsx'<br>
<br>
When I open the spreadsheet in Numbers I can see there are ~900 rows and ~40 columns in the first worksheet (the other two worksheets have next to nothing). On disc that worksheet is 53MB of XML. roo uses Nokogiri to read the XML and sure enough when I try to open the worksheet XML file directly with Nokogiri it takes ~1.5min – still a long time.<br>
<br>
require 'rubygems'<br>
require 'nokogiri'<br>
doc = Nokogiri::XML File.open('spreadsheet.xlsx')<br>
<br>
The one time I did manage to read the spreadsheet with roo, it told me there were ~900,000 rows. Presumably 899,000 are empty. And presumably this is the problem.<br>
<br>
Unfortunately I cannot make the spreadsheet available because it's confidential to a customer.<br>
<br>
How can I read the 900 actual rows in a few seconds? (I'm going to ask my customer if they can somehow save their Excel file differently but in the meantime...) I'd prefer a pure-Ruby solution but at the end of the day I just need to get it read so I don't mind calling out to something else and getting the results back as, say, CSV.<br>
<br>
Any suggestions?<br>
<br>
Many thanks in advance!<br>
<br>
Cheers,<br>
Andy Stewart<br>
<br>
[1] <a href="https://github.com/Empact/roo" target="_blank">https://github.com/Empact/roo</a><br>
<br>
_______________________________________________<br>
Chat mailing list<br>
<a href="mailto:Chat@lists.lrug.org">Chat@lists.lrug.org</a><br>
<a href="http://lists.lrug.org/listinfo.cgi/chat-lrug.org" target="_blank">http://lists.lrug.org/listinfo.cgi/chat-lrug.org</a><br>
</blockquote></div><br></div>