Jump to content

SimpleXML UTF-8 encoding issue


Twelvefootsnowman

Recommended Posts

I've been completely stumped by an issue with special characters in XML files and was hoping for a bit of help!

 

I'm using SimpleXML to convert XML files into HTML tables. It works fine for 99% of the files I use it for but that annoying 1% chucks up this error:

 

Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.xml-feed-site.com/xml.php :1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA9 0x3C 0x2F 0x6F on line 3

 

All the XML files have '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>' as the first line which I thought meant they were in UTF-8 format (though, this is my first attempt at working with XML so I'm not really sure  :-\).

 

From what I understand, the issue is that the XML files in question has a few annoying special characters that appear as "�" in them which stops my simplexml_load_file function from converting it. Is there anyway I can get SimpleXML to delete, change or remove these non-standard characters while it's converting the XML?

 

 

PLEASE NOTE: http://www.xml-feed-site.com/xml.php isn't the real URL I'm using, it's just an example!

Link to comment
https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/
Share on other sites

Looking at http://uk3.php.net/manual/en/function.simplexml-load-file.php, it states that:

Convert the well-formed XML document in the given file to an object.

 

I think well formed probably refers to the structure itself and not the encoding (but makes you think of pre-checking).

 

However looking at the options arg (here), you might be able to use LIBXML_NOERROR or LIBXML_NOWARNING, and then parse the crud out later.

 

I use a combo these later anyway...

strip_tags()

htmlspecialchars()

Thanks for the suggestion, but no luck.

 

I tried:

 

$xml = simplexml_load_file($request_url, NULL, LIBXML_NOERROR | LIBXML_NOWARNING);

 

which removed the error message but didn't convert anything - the $xml variable was empty.

 

 

Could it just been down to badly written XML files?

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.