Jump to content

Issue reading double-byte characters


lequebecois

Recommended Posts

I'm writing a piece of code that reads the CSV files exported by the various address books (like Yahoo, Outlook, GMail,...).

 

I have an issue with the file created by GMail as the encoding is not what I'm used to. It seems like normal ascii, except that there are 2 zeros ( 00 )between each character. Here is, for example, the first line (excluding the first 2 chars of the file which confuse me even more):

 

4e 00 61 00 6d 00 65 00 2c 00 45 00 2d 00 6d 00 61 00 69 00 6c 00 2c 00 4e 00 6f 00 74 00 65 00 73 00 0d 00 0a 00

 

After cutting up this string at the commas (using explode), I end up with strings on which stristr doesn't work. For example, when I use stristr to find the string "name" in the string that contains "name", it doesn't work.

 

I'm sure there is a simple trick to working with this kind of data but that I simply don't know it. If anyone can point me in the right direction, I'd really appreciate it.

 

Thank you.

Link to comment
https://forums.phpfreaks.com/topic/38422-issue-reading-double-byte-characters/
Share on other sites

The first two characters are probably a utf byte order marker, and the remainder would be utf16.  That's my guess.  As for how to deal with it, try using mb_convert_encoding($str, 'UTF-8', 'UTF-16');.  utf8 will be much nicer to deal with, as it matches ascii exactly for the ascii subset.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.