Jump to content

[SOLVED] PHP UTF8 Encoding and ANSI


Zypherone

Recommended Posts

Now I know most of you guys have probably come across such a topic. But I am not sure if anyone has been able to answer the question. I have google it and yahoo it and so far no luck. Everyone is going around in circles and in most situation when the an "answer" is submitted it is wrong or not in the right language or misunderstood.

 

So my questions... In relation to UTF8 and ANSI.

 

Can we convert a file's encoding using PHP. If the answer is yes than please help me out here.

 

Now is it possible for a UTF 8 encoding be encoded as ANSI? Reason is I have got a program that saves the file as a "csv" but it saves it with the UTF BOM, causing problems when the extracting of that DATA into a database as it does not recognize it.

 

Two solutions has come up, the obvious ones, re-save as ANSI encoding or when extracing the data, manually adjust it to match.

 

All good, but I have to do this each day (including weekend) and have to repeat the process for the number of files I have each day. Everything is all automated up until that point and it is annoying.

 

Be kind and help a guy out, if not possible than thanks for reading this message. ;)

 

 

Link to comment
Share on other sites

Now is it possible for a UTF 8 encoding be encoded as ANSI?

 

ANSI (aka Windows-1252) uses 8 bit, but UTF-8 uses up to 32 bit, so that depends on the contents of your file.

 

Two solutions has come up, the obvious ones, re-save as ANSI encoding or when extracing the data, manually adjust it to match.

 

You can also just remove the BOM from the file. According to this page, the BOM is represented by the character sequence EF BB EF (in hexadecimal) in UTF-8, so you can just remove that from the file:

 

$contents = substr($contents, 3, 0);

 

If you want to make sure you only remove the BOM if it exists you can do like this:

if (substr($contents, 0, 3) == pack("CCC", 0xEF, 0xBB, 0xBF)) {
$contents = substr($contents, 3, 0);
}

Link to comment
Share on other sites

To perform an encoding conversion, you can use iconv extension. The conversion from UTF-8 is possible and iconv supports two modes:

 

- TRANSLIT - attempts to transliterate Unicode symbols that do not have a representation in the output encoding.

- IGNORE - ingores such symbols.

Link to comment
Share on other sites

Zyx, I originally looked at "iconv" but got lost after the first few lines of explanation. Thanks anyways.

 

Daniel0, I have applied the suggested codes in my script. We will see how it goes tomorrow. That is when it runs. So if all good that it is solved. If not well I will be back. Thanks so far for the help.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.