Twelvefootsnowman Posted September 30, 2008 Share Posted September 30, 2008 I've been completely stumped by an issue with special characters in XML files and was hoping for a bit of help! I'm using SimpleXML to convert XML files into HTML tables. It works fine for 99% of the files I use it for but that annoying 1% chucks up this error: Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.xml-feed-site.com/xml.php :1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA9 0x3C 0x2F 0x6F on line 3 All the XML files have '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>' as the first line which I thought meant they were in UTF-8 format (though, this is my first attempt at working with XML so I'm not really sure :-\). From what I understand, the issue is that the XML files in question has a few annoying special characters that appear as "�" in them which stops my simplexml_load_file function from converting it. Is there anyway I can get SimpleXML to delete, change or remove these non-standard characters while it's converting the XML? PLEASE NOTE: http://www.xml-feed-site.com/xml.php isn't the real URL I'm using, it's just an example! Link to comment https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/ Share on other sites More sharing options...
rarebit Posted September 30, 2008 Share Posted September 30, 2008 Looking at http://uk3.php.net/manual/en/function.simplexml-load-file.php, it states that: Convert the well-formed XML document in the given file to an object. I think well formed probably refers to the structure itself and not the encoding (but makes you think of pre-checking). However looking at the options arg (here), you might be able to use LIBXML_NOERROR or LIBXML_NOWARNING, and then parse the crud out later. I use a combo these later anyway... strip_tags() htmlspecialchars() Link to comment https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/#findComment-653890 Share on other sites More sharing options...
Twelvefootsnowman Posted September 30, 2008 Author Share Posted September 30, 2008 Thanks for the suggestion, but no luck. I tried: $xml = simplexml_load_file($request_url, NULL, LIBXML_NOERROR | LIBXML_NOWARNING); which removed the error message but didn't convert anything - the $xml variable was empty. Could it just been down to badly written XML files? Link to comment https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/#findComment-653906 Share on other sites More sharing options...
rarebit Posted September 30, 2008 Share Posted September 30, 2008 just pre parse out any not true ascii before loading with xml Link to comment https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/#findComment-653908 Share on other sites More sharing options...
Twelvefootsnowman Posted September 30, 2008 Author Share Posted September 30, 2008 just pre parse out any not true ascii before loading with xml Errr.. I don't know how to! Could you point me in the right direction, please? As far as I knew, the XML file is just a URL until it's loaded through SimpleXML. Link to comment https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/#findComment-653924 Share on other sites More sharing options...
rarebit Posted September 30, 2008 Share Posted September 30, 2008 Not that i've done it, but you could put it within here... function rss_xml_fetch($url) { if($xml = file_get_contents($url)) { // Do some parsing here // --> $xml = simplexml_load_string($xml); return $xml; } return -1; } Link to comment https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/#findComment-653931 Share on other sites More sharing options...
ravinggenius Posted September 30, 2008 Share Posted September 30, 2008 $xml = simplexml_load_file($request_url, NULL, LIBXML_NOERROR | LIBXML_NOWARNING); I believe the second parameter needs to be a valid class name: http://us3.php.net/manual/en/function.simplexml-load-file.php Link to comment https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/#findComment-654066 Share on other sites More sharing options...
Twelvefootsnowman Posted October 1, 2008 Author Share Posted October 1, 2008 Thanks for all the help guys! Your suggestions put me on the right track and I've got it solved now. Link to comment https://forums.phpfreaks.com/topic/126460-simplexml-utf-8-encoding-issue/#findComment-654572 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.