[LRUG] UTF8 errors parsing mail file
gvim
gvimrc at gmail.com
Thu Aug 22 08:21:04 PDT 2013
I'm encountering some UTF-8 errors in Ruby 2.0. When installing gems I
often see non-fatal errors relating to conversion of ASCII characters to
UTF-8. The following script is designed to search a large Maildir folder
for lines beginning with 4 word characters:
---------------------------------------------------------
dir = 'my/maildir/path'
Dir.chdir(dir)
Dir.foreach(dir) do |file|
next unless file =~ /^\d{4}/
print "\n\n************* Opening #{file} *************\n"
fh = File.open(file)
while fh.gets do
print if $_ =~ /^\w{4}\b/
end
fh.close
end
-------------------------------------------------------------
After successfully scanning 7 email files it dies with a UTF-8 error:
************* Opening 1270516984.M407293P18051.mac,S=1601,W=1645:2,Sb
*************
Paul
./1.rb:13:in `block in <main>': invalid byte sequence in UTF-8
(ArgumentError)
from ./1.rb:8:in `foreach'
from ./1.rb:8:in `<main>'
The equivalent Perl script parses the whole directory without any errors:
------------------------------------------------------------
use 5.016;
use autodie;
my $dir = 'my/mail/path';
chdir $dir;
opendir my $dh, $dir;
while (readdir $dh) {
next unless /^\d{4}/;
open my $fh, '<', $_;
say "\n\n************* Opening $_ *************";
while (<$fh>) {
chomp;
say if /^\w{4}\s/;
}
close $fh;
}
closedir $dh;
-------------------------------------------------------------
gvim
More information about the Chat
mailing list