lequebecois Posted February 14, 2007 Share Posted February 14, 2007 I'm writing a piece of code that reads the CSV files exported by the various address books (like Yahoo, Outlook, GMail,...). I have an issue with the file created by GMail as the encoding is not what I'm used to. It seems like normal ascii, except that there are 2 zeros ( 00 )between each character. Here is, for example, the first line (excluding the first 2 chars of the file which confuse me even more): 4e 00 61 00 6d 00 65 00 2c 00 45 00 2d 00 6d 00 61 00 69 00 6c 00 2c 00 4e 00 6f 00 74 00 65 00 73 00 0d 00 0a 00 After cutting up this string at the commas (using explode), I end up with strings on which stristr doesn't work. For example, when I use stristr to find the string "name" in the string that contains "name", it doesn't work. I'm sure there is a simple trick to working with this kind of data but that I simply don't know it. If anyone can point me in the right direction, I'd really appreciate it. Thank you. Link to comment https://forums.phpfreaks.com/topic/38422-issue-reading-double-byte-characters/ Share on other sites More sharing options...
fert Posted February 14, 2007 Share Posted February 14, 2007 That looks like hex, not ascii http://us2.php.net/manual/en/function.str-replace.php http://us2.php.net/manual/en/function.base_convert.php Link to comment https://forums.phpfreaks.com/topic/38422-issue-reading-double-byte-characters/#findComment-184319 Share on other sites More sharing options...
btherl Posted February 14, 2007 Share Posted February 14, 2007 The first two characters are probably a utf byte order marker, and the remainder would be utf16. That's my guess. As for how to deal with it, try using mb_convert_encoding($str, 'UTF-8', 'UTF-16');. utf8 will be much nicer to deal with, as it matches ascii exactly for the ascii subset. Link to comment https://forums.phpfreaks.com/topic/38422-issue-reading-double-byte-characters/#findComment-184339 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.