Jump to content

SimpleXML UTF-8 encoding issue


Twelvefootsnowman

Recommended Posts

I've been completely stumped by an issue with special characters in XML files and was hoping for a bit of help!

 

I'm using SimpleXML to convert XML files into HTML tables. It works fine for 99% of the files I use it for but that annoying 1% chucks up this error:

 

Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.xml-feed-site.com/xml.php :1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA9 0x3C 0x2F 0x6F on line 3

 

All the XML files have '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>' as the first line which I thought meant they were in UTF-8 format (though, this is my first attempt at working with XML so I'm not really sure  :-\).

 

From what I understand, the issue is that the XML files in question has a few annoying special characters that appear as "�" in them which stops my simplexml_load_file function from converting it. Is there anyway I can get SimpleXML to delete, change or remove these non-standard characters while it's converting the XML?

 

 

PLEASE NOTE: http://www.xml-feed-site.com/xml.php isn't the real URL I'm using, it's just an example!

Link to comment
Share on other sites

Looking at http://uk3.php.net/manual/en/function.simplexml-load-file.php, it states that:

Convert the well-formed XML document in the given file to an object.

 

I think well formed probably refers to the structure itself and not the encoding (but makes you think of pre-checking).

 

However looking at the options arg (here), you might be able to use LIBXML_NOERROR or LIBXML_NOWARNING, and then parse the crud out later.

 

I use a combo these later anyway...

strip_tags()

htmlspecialchars()

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.