Jump to content

Archived

This topic is now archived and is closed to further replies.

Twelvefootsnowman

SimpleXML UTF-8 encoding issue

Recommended Posts

I've been completely stumped by an issue with special characters in XML files and was hoping for a bit of help!

 

I'm using SimpleXML to convert XML files into HTML tables. It works fine for 99% of the files I use it for but that annoying 1% chucks up this error:

 

Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.xml-feed-site.com/xml.php :1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA9 0x3C 0x2F 0x6F on line 3

 

All the XML files have '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>' as the first line which I thought meant they were in UTF-8 format (though, this is my first attempt at working with XML so I'm not really sure  :-\).

 

From what I understand, the issue is that the XML files in question has a few annoying special characters that appear as "�" in them which stops my simplexml_load_file function from converting it. Is there anyway I can get SimpleXML to delete, change or remove these non-standard characters while it's converting the XML?

 

 

PLEASE NOTE: http://www.xml-feed-site.com/xml.php isn't the real URL I'm using, it's just an example!

Share this post


Link to post
Share on other sites

Looking at http://uk3.php.net/manual/en/function.simplexml-load-file.php, it states that:

Convert the well-formed XML document in the given file to an object.

 

I think well formed probably refers to the structure itself and not the encoding (but makes you think of pre-checking).

 

However looking at the options arg (here), you might be able to use LIBXML_NOERROR or LIBXML_NOWARNING, and then parse the crud out later.

 

I use a combo these later anyway...

strip_tags()

htmlspecialchars()

Share this post


Link to post
Share on other sites

Thanks for the suggestion, but no luck.

 

I tried:

 

$xml = simplexml_load_file($request_url, NULL, LIBXML_NOERROR | LIBXML_NOWARNING);

 

which removed the error message but didn't convert anything - the $xml variable was empty.

 

 

Could it just been down to badly written XML files?

Share this post


Link to post
Share on other sites

just pre parse out any not true ascii before loading with xml

 

Errr.. I don't know how to! Could you point me in the right direction, please?

 

As far as I knew, the XML file is just a URL until it's loaded through SimpleXML.

Share this post


Link to post
Share on other sites

Not that i've done it, but you could put it within here...

function rss_xml_fetch($url)
{
if($xml = file_get_contents($url))
{
	//	Do some parsing here
	//	-->

	$xml = simplexml_load_string($xml);
	return $xml;
}
return -1;
}

Share this post


Link to post
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.