<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Try running it with the magic UTF-8 comment at the top of the file<div><br></div><div><pre class="ruby" style="border: none !important; margin-top: 0px !important; margin-bottom: 0px !important; width: auto !important; clear: none !important; overflow: visible !important; font-size: 12px !important; line-height: 16px !important; padding: 0px 4px !important; -webkit-box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px !important; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px !important; border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; color: rgb(17, 0, 0); text-align: left; "><span style="color: rgb(0, 128, 0); font-style: italic;"># encoding: UTF-8</span></pre><div><br></div><div><br></div><div><div>On 22 Aug 2013, at 16:21, gvim <<a href="mailto:gvimrc@gmail.com">gvimrc@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">I'm encountering some UTF-8 errors in Ruby 2.0. When installing gems I often see non-fatal errors relating to conversion of ASCII characters to UTF-8. The following script is designed to search a large Maildir folder for lines beginning with 4 word characters:<br><br>---------------------------------------------------------<br>dir = 'my/maildir/path'<br>Dir.chdir(dir)<br><br>Dir.foreach(dir) do |file|<br> next unless file =~ /^\d{4}/<br> print "\n\n************* Opening #{file} *************\n"<br> fh = File.open(file)<br> while fh.gets do<br> print if $_ =~ /^\w{4}\b/<br> end<br> fh.close<br>end<br><br>-------------------------------------------------------------<br><br><br>After successfully scanning 7 email files it dies with a UTF-8 error:<br><br><br>************* Opening 1270516984.M407293P18051.mac,S=1601,W=1645:2,Sb *************<br>Paul<br>./1.rb:13:in `block in <main>': invalid byte sequence in UTF-8 (ArgumentError)<br><span class="Apple-tab-span" style="white-space:pre"> </span>from ./1.rb:8:in `foreach'<br><span class="Apple-tab-span" style="white-space:pre"> </span>from ./1.rb:8:in `<main>'<br><br><br>The equivalent Perl script parses the whole directory without any errors:<br><br>------------------------------------------------------------<br>use 5.016;<br>use autodie;<br><br>my $dir = 'my/mail/path';<br>chdir $dir;<br>opendir my $dh, $dir;<br><br>while (readdir $dh) {<br> next unless /^\d{4}/;<br> open my $fh, '<', $_;<br> say "\n\n************* Opening $_ *************";<br> while (<$fh>) {<br> chomp;<br> say if /^\w{4}\s/;<br> }<br> close $fh;<br>}<br>closedir $dh;<br><br>-------------------------------------------------------------<br><br>gvim<br>_______________________________________________<br>Chat mailing list<br><a href="mailto:Chat@lists.lrug.org">Chat@lists.lrug.org</a><br>http://lists.lrug.org/listinfo.cgi/chat-lrug.org<br></blockquote></div><br></div></body></html>
<br>
<div><img src="https://d2n87d3om9whlx.cloudfront.net/assets/layout/logo-248692a8da145547db527b847bb044eb.png" style="font-size:62.5%"></div><div><span style="font-size:small"><br></span></div><div><span style="font-size:small"><br></span></div><div><span style="font-size:small">t. 020 7739 3277</span></div><div><span style="font-size:small">a. 131 Shoreditch High Street, London E1 6JE</span></div><div><div><br></div><div><br></div><div><font size="2"><a href="http://twitter.com/rentify" target="_blank">Follow us on Twitter</a> | Rentify has acquired Iigloo! Welcome to all Iigloo Landlords</font></div></div>