<div dir="ltr">> You could try tinkering with Encoding.default_external – maybe set it to Encoding::ASCII_8BIT and see if that helps.<br><br>Theory: Since a plain ASCII file is in UTF-8 encoding there are definitely weird non-ASCII bytes in the file. Doing the above will leave the weird bytes in but they'll manifest as whatever the ASCII/ISO-8859-1 of the individual bytes are. So for example, if the weird bytes are a UTF-16 byte order mark, you might see "ţ˙" in the output. I need to get out and spend more time with nature.<br>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Aug 22, 2013 at 5:05 PM, Leo Cassarani <span dir="ltr"><<a href="mailto:leonardo.cassarani@gmail.com" target="_blank">leonardo.cassarani@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Ruby 2.0 has changed the default encoding of Ruby files to be UTF-8. So the magic comment you suggested should be implicit in all files, unless specified otherwise.<div>
<br></div><div>It sounds like, because your script is running under UTF-8 encoding, it's trying to convert the strings it finds to UTF-8, and throwing a fit when it finds something that it can't convert.</div><div>
<br></div><div>You could try tinkering with Encoding.default_external – maybe set it to Encoding::ASCII_8BIT and see if that helps.</div><span class="HOEnZb"><font color="#888888"><div><br></div></font></span><div><span class="HOEnZb"><font color="#888888">Leo</font></span><div>
<div class="h5"><br><div><br><div><div>On 22 Aug 2013, at 16:24, George Drummond <<a href="mailto:drummond@rentify.com" target="_blank">drummond@rentify.com</a>> wrote:</div><br><blockquote type="cite"><div style="word-wrap:break-word">
Try running it with the magic UTF-8 comment at the top of the file<div><br></div><div><pre style="border:none!important;margin-top:0px!important;margin-bottom:0px!important;width:auto!important;clear:none!important;overflow:visible!important;font-size:12px!important;line-height:16px!important;padding:0px 4px!important;border-top-left-radius:0px!important;border-top-right-radius:0px!important;border-bottom-right-radius:0px!important;border-bottom-left-radius:0px!important;color:rgb(17,0,0);text-align:left">
<span style="color:rgb(0,128,0);font-style:italic"># encoding: UTF-8</span></pre><div><br></div><div><br></div><div><div>On 22 Aug 2013, at 16:21, gvim <<a href="mailto:gvimrc@gmail.com" target="_blank">gvimrc@gmail.com</a>> wrote:</div>
<br><blockquote type="cite">I'm encountering some UTF-8 errors in Ruby 2.0. When installing gems I often see non-fatal errors relating to conversion of ASCII characters to UTF-8. The following script is designed to search a large Maildir folder for lines beginning with 4 word characters:<br>
<br>---------------------------------------------------------<br>dir = 'my/maildir/path'<br>Dir.chdir(dir)<br><br>Dir.foreach(dir) do |file|<br> next unless file =~ /^\d{4}/<br> print "\n\n************* Opening #{file} *************\n"<br>
fh = File.open(file)<br> while fh.gets do<br> print if $_ =~ /^\w{4}\b/<br> end<br> fh.close<br>end<br><br>-------------------------------------------------------------<br><br><br>After successfully scanning 7 email files it dies with a UTF-8 error:<br>
<br><br>************* Opening 1270516984.M407293P18051.mac,S=1601,W=1645:2,Sb *************<br>Paul<br>./1.rb:13:in `block in <main>': invalid byte sequence in UTF-8 (ArgumentError)<br><span style="white-space:pre-wrap"> </span>from ./1.rb:8:in `foreach'<br>
<span style="white-space:pre-wrap"> </span>from ./1.rb:8:in `<main>'<br><br><br>The equivalent Perl script parses the whole directory without any errors:<br><br>------------------------------------------------------------<br>
use 5.016;<br>use autodie;<br><br>my $dir = 'my/mail/path';<br>chdir $dir;<br>opendir my $dh, $dir;<br><br>while (readdir $dh) {<br> next unless /^\d{4}/;<br> open my $fh, '<', $_;<br> say "\n\n************* Opening $_ *************";<br>
while (<$fh>) {<br> chomp;<br> say if /^\w{4}\s/;<br> }<br> close $fh;<br>}<br>closedir $dh;<br><br>-------------------------------------------------------------<br><br>gvim<br>_______________________________________________<br>
Chat mailing list<br><a href="mailto:Chat@lists.lrug.org" target="_blank">Chat@lists.lrug.org</a><br><a href="http://lists.lrug.org/listinfo.cgi/chat-lrug.org" target="_blank">http://lists.lrug.org/listinfo.cgi/chat-lrug.org</a><br>
</blockquote></div><br></div></div>
<br>
<div><img style="font-size:62.5%"></div><div><span style="font-size:small"><br></span></div><div><span style="font-size:small"><br></span></div><div><span style="font-size:small">t. 020 7739 3277</span></div><div><span style="font-size:small">a. 131 Shoreditch High Street, London E1 6JE</span></div>
<div><div><br></div><div><br></div><div><font><a href="http://twitter.com/rentify" target="_blank">Follow us on Twitter</a> | Rentify has acquired Iigloo! Welcome to all Iigloo Landlords</font></div></div>_______________________________________________<br>
Chat mailing list<br><a href="mailto:Chat@lists.lrug.org" target="_blank">Chat@lists.lrug.org</a><br><a href="http://lists.lrug.org/listinfo.cgi/chat-lrug.org" target="_blank">http://lists.lrug.org/listinfo.cgi/chat-lrug.org</a><br>
</blockquote></div><br></div></div></div></div></div><br>_______________________________________________<br>
Chat mailing list<br>
<a href="mailto:Chat@lists.lrug.org">Chat@lists.lrug.org</a><br>
<a href="http://lists.lrug.org/listinfo.cgi/chat-lrug.org" target="_blank">http://lists.lrug.org/listinfo.cgi/chat-lrug.org</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Ali, <a href="http://happybearsoftware.com" target="_blank">http://happybearsoftware.com</a></div>
</div></div>