NotionCommotion Posted January 12, 2015 Share Posted January 12, 2015 (edited) Major breakthrough. utf8_encode() allows me to view utf-8 characters in the browser. Therefore, my source data file must be ISO-8859-1 encoded text, right? Am I understanding this correctly? <?php //mb_internal_encoding("UTF-8"); header('Content-type: text/html; charset=utf-8'); $file = fopen('some_csv_file_created_by_excel.csv', "r"); ob_start(); while (($spec = fgetcsv($file, 100000, ",")) !== FALSE){ echo($spec[0].' '.utf8_encode($spec[0]).'<br>'); } $string=ob_get_clean(); ?> <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>utf</title> </head> <body> <p><?php echo($string);?></p> </body> </html> Edited January 12, 2015 by NotionCommotion Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted January 12, 2015 Share Posted January 12, 2015 (edited) utf8_encode() is poorly named. It actually transcodes data from one encoding (ISO-8859-1) to another (UTF-. So it only makes sense if your source data has the “wrong” encoding and can only be fixed at runtime. If the source data is already encoded with UTF-8, or if there's any chance you turn it into that, the function is not necessary. Transcoding data at runtime is obviously inefficient, so it should be avoided whenever possible. Edited January 12, 2015 by Jacques1 Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted January 12, 2015 Author Share Posted January 12, 2015 Thanks Jacques, Yes, I had recently become aware of the poor name of utf8_encode(). As far as I can tell, Excel cannot export text with UTF-8 encoding. It is possible to take several steps to do so (export to Google equivalent, etc), but that is not ideal. Excel can export to Unicode text and since I have only one column, this might work, but it mysteriously puts quotes around some of the entries. Obviously, not a PHP topic and no need to respond unless you want to. My main reason for my original post was making sure I understood what I was witnessing. If without utf8_encode(), it would display � for non-ASCI characters, then the source file was SO-8859-1 (or at least not UTF-? Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted January 12, 2015 Share Posted January 12, 2015 Yes, those symbols mean “not a valid character”. UTF-8 uses a specific byte pattern, and most ISO-encoded characters don't comply to that pattern, so you get no character at all, not just a wrong character. If you try it the other way round (UTF-8 misinterpreted as ISO), you'll see cryptic characters instead, because UTF-8 is formally valid ISO. Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted January 12, 2015 Author Share Posted January 12, 2015 I see my previous response even did an encoding faux pas. Typed UTF-8 followed by a parenthesis, and it displayed a strange smiley face. Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted January 12, 2015 Share Posted January 12, 2015 That's the fault of the forum software. You can deactivate smilies in the “advanced reply” view. I had to do the same thing. Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted January 12, 2015 Author Share Posted January 12, 2015 What? You never smile Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.