[LRUG] UTF8 errors parsing mail file

Frederick Cheung frederick.cheung at gmail.com
Thu Aug 22 08:55:50 PDT 2013


On 22 Aug 2013, at 16:46, gvim <gvimrc at gmail.com> wrote:

> On 22/08/2013 16:27, Frederick Cheung wrote:
>> 
>> Are those files guaranteed to contain only valid utf-8 ? If not then if you might be able to get away with opening them as ascii-8bit (assuming that you don't need to work with them in a unicode aware way)
>> 
>> Fred
>> 
> 
> The email file opens fine in Vim and shows no UTF-8 characters. I'm experimenting with Ruby 2.0 and don't understand why Perl parses the file and Ruby doesn't.


Presumably perl doesn't validate encoding as stricly as ruby (or defaults to an encoding in which there can be no invalid byte sequences).

I don't think you can assess UTF-8 correctness just by eyeballing in vim. For example the non ascii characters could be invisible ones (e.g. the many  unicode white space characters), or ones that appear very similar to ascii ones (e.g. curly quotes instead of straight quotes)

Fred


More information about the Chat mailing list