Mario_Party Posted August 2, 2011 Share Posted August 2, 2011 Hi everyone, I'm currently taking XML files that I download and I have to fix them before processing them, otherwise I get lots of errors from XMLReader. One of the fixes is scanning for a string that may look something like "&" or """ or something like that and replacing it with what it should be i.e. "&" or """. Now I think this feature is working fine and is not the problem (the function is ascii_fix_feed). The next step is then converting these to the numeric equivalent using the function convert_ascii_to_numeric_entity so "&" becomes "&" again I think this feature works fine, as I've tested (but only output the display in a web browser). One of the lines in the XML is <prod id="41462197"><pId>602-0015-01</pId><text><name>Pony Themed 3D Bedroom Wallpaper</name><desc>- Wow, have a complete pony themed bedroom Transform your room into your very own horse and pony stables. With this must have magical mural, with beautiful and wonderful varieties of horses and the cutest prancing ponies and much, much more. This Walltastic has been specially designed with a horse in a picture frame within the mural to help you learn all about the external anatomy of a horse. Every child dreams of having their own stables, and now their dreams can come true with the perfect gift for any horse and pony mad child Walltastic's Horse and Pony Stables. This wallpaper is a wipeable wall covering that covers any wall area up to 10ft x 8ft. Each product is in 12 pieces which means it is easily applicable and flexible according to how much space needs covering. It comes rolled up similar to wallpaper in a postal tube, includeing fitting instructions and is a great addition to any child’s bedroom, nursery, playroom or games room.</desc></text><price><buynow>31.25</buynow><delivery>4.95</delivery></price><cat><awCatId>631</awCatId><awCat>Novelty Gifts</awCat><mCat>Pony Bedroom Wallpaper for Girls who love ponies and who are horse and pony mad</mCat></cat><brand/></prod> Now the little bit somewhere in the middle is "’". When I run my tests with these functions in another code file I made and output the display to the browser it works fine and I get the output in the browser of what it is meant to be (some kind of single quote). However when I run the code that is fixing the feed line by line and manually re-writing the whole file, I run into problems because the output is not the expected "’" but instead I get "’" Functions that I am using function ascii_fix_feed_return($str) { $string = "&"; if(is_numeric($str[4])) { $string .= "#"; } $string .= $str[4] . ";"; return $string; } function ascii_fix_feed($str) { preg_match_all('/&(#)?([\w]+);(#)?([\w]+);/i', $str, $count); if(isset($count[0][0])) { $count = $count[0][0]; } else { unset($count); } while(!empty($count)) { $str = preg_replace_callback('/&(#)?([\w]+);(#)?([\w]+);/i','ascii_fix_feed_return', $str); preg_match_all('/&(#)?([\w]+);(#)?([\w]+);/i', $str, $count); if(isset($count[0][0])) { $count = $count[0][0]; } else { unset($count); } } return $str; } function convertAlphaEntitysToNumericEntity($entity) { return '&#'.ord(html_entity_decode($entity[0])).';'; } function convertAsciiOver127toNumericEntity($entity) { if(($asciiCode = ord($entity[0])) > 127) { return '&#'.$asciiCode.';'; } else { return $entity[0]; } } function convert_ascii_to_numeric_entity($str) { $str = preg_replace_callback('/&([\w]+);/i','convertAlphaEntitysToNumericEntity', $str); $str = preg_replace_callback('/[^\w]/i','convertAsciiOver127toNumericEntity', $str); return $str; } Code that actually uses the functions function xml_clean_up($file, $xml_full_file_name, $message, $fail_message, $display_safe = 0) { global $log_file; $handle = @fopen($xml_full_file_name, "r"); $handle2 = @fopen($xml_full_file_name.".tmp", "w"); while (!feof($handle)) { // Read the file line by line $line = stream_get_line($handle, 10000, "\n"); // Convert to UTF-8 $line = iconv("UTF-8", "UTF-8//IGNORE", $line); // Fix the XML e.g. replace "’" with "’" $line = ascii_fix_feed($line); // Convert all ASCII characters to numeric entity equivalents $line = convert_ascii_to_numeric_entity($line); // If we want to convert the characters back then use this function as well if($display_safe) { // $line = advert_display_safe($line); } // End the line $line .= "\n"; fwrite($handle2, $line, 10000); } fclose($handle); fclose($handle2); // Rename the XML if(rename($xml_full_file_name . ".tmp", $xml_full_file_name)) { flog($log_file, $file . " " . $message . "\r\n"); echo $file . " " . $message . ".\r\n"; } else { flog($log_file, $file . " " . $fail_message . "\r\n\r\n"); exit($file . " " . $fail_message . ".\r\n\r\n"); } } So if anyone knows what is going on I would be very grateful and if you need any more information just let me know. Thanks in advance. Quote Link to comment https://forums.phpfreaks.com/topic/243585-ascii-utf-8-xml-problems/ Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.