Zypherone Posted October 21, 2009 Share Posted October 21, 2009 Now I know most of you guys have probably come across such a topic. But I am not sure if anyone has been able to answer the question. I have google it and yahoo it and so far no luck. Everyone is going around in circles and in most situation when the an "answer" is submitted it is wrong or not in the right language or misunderstood. So my questions... In relation to UTF8 and ANSI. Can we convert a file's encoding using PHP. If the answer is yes than please help me out here. Now is it possible for a UTF 8 encoding be encoded as ANSI? Reason is I have got a program that saves the file as a "csv" but it saves it with the UTF BOM, causing problems when the extracting of that DATA into a database as it does not recognize it. Two solutions has come up, the obvious ones, re-save as ANSI encoding or when extracing the data, manually adjust it to match. All good, but I have to do this each day (including weekend) and have to repeat the process for the number of files I have each day. Everything is all automated up until that point and it is annoying. Be kind and help a guy out, if not possible than thanks for reading this message. Quote Link to comment Share on other sites More sharing options...
Daniel0 Posted October 21, 2009 Share Posted October 21, 2009 Now is it possible for a UTF 8 encoding be encoded as ANSI? ANSI (aka Windows-1252) uses 8 bit, but UTF-8 uses up to 32 bit, so that depends on the contents of your file. Two solutions has come up, the obvious ones, re-save as ANSI encoding or when extracing the data, manually adjust it to match. You can also just remove the BOM from the file. According to this page, the BOM is represented by the character sequence EF BB EF (in hexadecimal) in UTF-8, so you can just remove that from the file: $contents = substr($contents, 3, 0); If you want to make sure you only remove the BOM if it exists you can do like this: if (substr($contents, 0, 3) == pack("CCC", 0xEF, 0xBB, 0xBF)) { $contents = substr($contents, 3, 0); } Quote Link to comment Share on other sites More sharing options...
Zyx Posted October 21, 2009 Share Posted October 21, 2009 To perform an encoding conversion, you can use iconv extension. The conversion from UTF-8 is possible and iconv supports two modes: - TRANSLIT - attempts to transliterate Unicode symbols that do not have a representation in the output encoding. - IGNORE - ingores such symbols. Quote Link to comment Share on other sites More sharing options...
Zypherone Posted October 21, 2009 Author Share Posted October 21, 2009 Zyx, I originally looked at "iconv" but got lost after the first few lines of explanation. Thanks anyways. Daniel0, I have applied the suggested codes in my script. We will see how it goes tomorrow. That is when it runs. So if all good that it is solved. If not well I will be back. Thanks so far for the help. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.