<div style="font-family: Helvetica; font-size: 13px; ">I'm not sure how practical this is in the context of xlsx files, but if its XML and therefore text could you run the file through grep to strip out any empty rows at the end of the sheet before trying to parse it?</div><div style="font-family: Helvetica; font-size: 13px; "><br></div><div style="font-family: Helvetica; font-size: 13px; ">Jon</div>
<div></div>
<p style="color: #A0A0A8;">On Friday, 15 March 2013 at 13:40, Andrew Stewart wrote:</p>
<blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px;">
<span><div><div><div>Hello El Rug,</div><div><br></div><div>In the past I have successfully used roo [1] to read xlsx spreadsheets. However I have run into a stumbling block with the latest spreadsheet I need to read – it takes tens of minutes or more, rather than a few seconds.</div><div><br></div><div> require 'rubygems'</div><div> require 'roo'</div><div> s = Roo::Excelx.new 'spreadsheet.xlsx'</div><div><br></div><div>When I open the spreadsheet in Numbers I can see there are ~900 rows and ~40 columns in the first worksheet (the other two worksheets have next to nothing). On disc that worksheet is 53MB of XML. roo uses Nokogiri to read the XML and sure enough when I try to open the worksheet XML file directly with Nokogiri it takes ~1.5min – still a long time.</div><div><br></div><div> require 'rubygems'</div><div> require 'nokogiri'</div><div> doc = Nokogiri::XML File.open('spreadsheet.xlsx')</div><div><br></div><div>The one time I did manage to read the spreadsheet with roo, it told me there were ~900,000 rows. Presumably 899,000 are empty. And presumably this is the problem.</div><div><br></div><div>Unfortunately I cannot make the spreadsheet available because it's confidential to a customer.</div><div><br></div><div>How can I read the 900 actual rows in a few seconds? (I'm going to ask my customer if they can somehow save their Excel file differently but in the meantime...) I'd prefer a pure-Ruby solution but at the end of the day I just need to get it read so I don't mind calling out to something else and getting the results back as, say, CSV.</div><div><br></div><div>Any suggestions?</div><div><br></div><div>Many thanks in advance!</div><div><br></div><div>Cheers,</div><div>Andy Stewart</div><div><br></div><div>[1] <a href="https://github.com/Empact/roo">https://github.com/Empact/roo</a></div><div><br></div><div>_______________________________________________</div><div>Chat mailing list</div><div><a href="mailto:Chat@lists.lrug.org">Chat@lists.lrug.org</a></div><div><a href="http://lists.lrug.org/listinfo.cgi/chat-lrug.org">http://lists.lrug.org/listinfo.cgi/chat-lrug.org</a></div></div></div></span>
</blockquote>
<div>
<br>
</div>