[LRUG] UTF8 errors parsing mail file

Leo Cassarani leonardo.cassarani at gmail.com
Thu Aug 22 09:05:54 PDT 2013


Ruby 2.0 has changed the default encoding of Ruby files to be UTF-8. So the magic comment you suggested should be implicit in all files, unless specified otherwise.

It sounds like, because your script is running under UTF-8 encoding, it's trying to convert the strings it finds to UTF-8, and throwing a fit when it finds something that it can't convert.

You could try tinkering with Encoding.default_external – maybe set it to Encoding::ASCII_8BIT and see if that helps.

Leo

On 22 Aug 2013, at 16:24, George Drummond <drummond at rentify.com> wrote:

> Try running it with the magic UTF-8 comment at the top of the file
> 
> # encoding: UTF-8
> 
> 
> On 22 Aug 2013, at 16:21, gvim <gvimrc at gmail.com> wrote:
> 
>> I'm encountering some UTF-8 errors in Ruby 2.0. When installing gems I often see non-fatal errors relating to conversion of ASCII characters to UTF-8. The following script is designed to search a large Maildir folder for lines beginning with 4 word characters:
>> 
>> ---------------------------------------------------------
>> dir = 'my/maildir/path'
>> Dir.chdir(dir)
>> 
>> Dir.foreach(dir) do |file|
>>  next unless file =~ /^\d{4}/
>>  print "\n\n************* Opening #{file} *************\n"
>>  fh = File.open(file)
>>  while fh.gets do
>>    print if $_ =~ /^\w{4}\b/
>>  end
>>  fh.close
>> end
>> 
>> -------------------------------------------------------------
>> 
>> 
>> After successfully scanning 7 email files it dies with a UTF-8 error:
>> 
>> 
>> ************* Opening 1270516984.M407293P18051.mac,S=1601,W=1645:2,Sb *************
>> Paul
>> ./1.rb:13:in `block in <main>': invalid byte sequence in UTF-8 (ArgumentError)
>> 	from ./1.rb:8:in `foreach'
>> 	from ./1.rb:8:in `<main>'
>> 
>> 
>> The equivalent Perl script parses the whole directory without any errors:
>> 
>> ------------------------------------------------------------
>> use 5.016;
>> use autodie;
>> 
>> my $dir = 'my/mail/path';
>> chdir $dir;
>> opendir my $dh, $dir;
>> 
>> while (readdir $dh) {
>>  next unless /^\d{4}/;
>>  open my $fh, '<', $_;
>>  say "\n\n************* Opening $_ *************";
>>  while (<$fh>) {
>>    chomp;
>>    say if /^\w{4}\s/;
>>  }
>>  close $fh;
>> }
>> closedir $dh;
>> 
>> -------------------------------------------------------------
>> 
>> gvim
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
> 
> 
> 
> 
> 
> t.  020 7739 3277
> a. 131 Shoreditch High Street, London E1 6JE
> 
> 
> Follow us on Twitter | Rentify has acquired Iigloo! Welcome to all Iigloo Landlords
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20130822/fff812d4/attachment-0003.html>


More information about the Chat mailing list