Jump to content

Help needed: PHP utf8 writing to file problem


willapp

Recommended Posts

Hi,

 

I really need help asap to fix this annoying problem! What my code has to do (don't ask why), is read data from an XML file - utf8 encoded - parse this data using PHP, write it out to a text file then read this text file into a database.

 

The problem is that when I come to write the text file, it's messing up the encoding so that instead of £ signs, I get "£". I know this is a problem related to utf8/ISO encoding, but I don't know how to stop it writing these html entities to my file.

 

The code looks roughly like this (relevant bits only):

 

foreach ($data as $item) {
    $xml = simplexml_load_string('<?xml version="1.0" encoding="utf-8" ?>'.$item);
    // manipulation of xml...
}

...

$fp = fopen($file_path, 'wb');

foreach ($hash as $data) {
    $output = // do some more parsing of $data
    
    fwrite($fp, $output);
}

fclose($fp);

 

If I echo out $output I don't get any of the "£" rubbish, so why is it in my file??? How can I remove it?

 

Once the file is created, the PHP then executes this sql:

 

'LOAD DATA INFILE "'.$file_path.'" REPLACE INTO TABLE mytablename ('.$columnList.');'

 

Obviously the result of this is that my database contains this spurious "Â" entity which is then screwing up other parts of the application.

 

I've tried various functions like utf8_encode/utf8_decode, but nothing will stop these entities from appearing in the file.

 

(FYI I'm a novice PHP coder (obviously), but have to get this working as part of my job. I don't normally write PHP though I am a software developer).

A quick question I have is why are you reading, writing, and then storing?

 

Something to remember is that PHP's native connections are done in Latin.  I believe I heard that in PHP 6.0 they will allow it to be a customizable setting.  Immediately after connecting to the database, try running these 2 queries before you do anything else:

 

SET NAMES utf8;

SET CHARACTER_SET utf8;

 

I am not sure if that will help or not.

Hi,

 

Thanks for the reply, but I don't think that will help.

 

I know it's stupid reading xml, writing to a file and then loading to a DB, but this is how the existing software was written and I don't have the time/expertise to re-write it all so I really just need to make it compatible.

 

I think the root of the problem is that this XML file is UTF-8 whereas previous files have been ISO-8859-1 and these have worked fine. What I need to do is load the XML file, somehow convert the content into ISO-8859-1 without damaging it and then writing the file as ISO-8859-1 (I think this is how it's written anyway).

 

I think this should get rid of the £ being turned into "£" issue, I just don't know how to do it!!  :-[

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.