mattspriggs28 Posted April 20, 2011 Share Posted April 20, 2011 Hi, Im extracting my product data from the database to store in an XML file for the Google Product Listings. The problem I'm having is that there are a number of incompatible characters in some of the data that is causing the XML file, once generated to be invalid. There are a huge number of products in the database (60,000+) so it would take an age to go through each product and remove any rogue characters, so I am using preg_replace. I've managed to find the rogue characters that occur in the data and used the preg_replace function to strip out these characters. Below is a code sample: $narrLong = strip_tags(nl2br($productArr['narrLong'])); $narrLong = htmlentities(html_entity_decode(preg_replace("/\t/"," ",$narrLong)),ENT_QUOTES,UTF-; $narrLong = preg_replace("/£/","�",$narrLong); $narrLong = preg_replace("/Â/","",$narrLong); $narrLong = preg_replace("/ /"," ",$narrLong); $narrLong = html_entity_decode($narrLong); $narrLong = preg_replace("/ & /"," & ",$narrLong); $narrLong = preg_replace("/(\w)&(\w)/","$1&$2",$narrLong); $narrLong = preg_replace("/\"/",""",$narrLong); $narrLong = preg_replace("/'|`/","'",$narrLong); $narrLong = preg_replace("/</","<",$narrLong); $narrLong = preg_replace("/>/",">",$narrLong); $narrLong = preg_replace("/“/","'",$narrLong); $narrLong = preg_replace("/”/","'",$narrLong); $narrLong = preg_replace("/’/","'",$narrLong); $narrLong = preg_replace("/½/"," 1/2",$narrLong); $narrLong = preg_replace("/®/","",$narrLong); $narrLong = preg_replace("/£/","",$narrLong); $narrLong = preg_replace("/µ/", "u",$narrLong); $narrLong = preg_replace("/é/", "e",$narrLong); $narrLong = preg_replace("/è/", "e",$narrLong); $narrLong = preg_replace("/’/", "'",$narrLong); $narrLong = preg_replace("/…/", "...",$narrLong); $narrLong = preg_replace("/°/", "",$narrLong); $narrLong = preg_replace("/™/", "",$narrLong); $narrLong = preg_replace("/—/", "-",$narrLong); $narrLong = preg_replace("/Ã/", "",$narrLong); $narrLong = preg_replace("/ƒ/", "",$narrLong); $narrLong = preg_replace("/¨/", "",$narrLong); $narrLong = preg_replace("/©/","",$narrLong); $narrLong = preg_replace("/¤/", "",$narrLong); $narrLong = str_replace("·", "",$narrLong); $narrLong = preg_replace("/‰/", "",$narrLong); $narrLong = preg_replace("/¹/", "",$narrLong); I then plug my xml open and close tags onto either end etc... However, once the xml file is generated and saved, none of the characters that I've asked to replace have not been replaced and remain in the XML file. Is there something I'm doing wrong? Are some of the characters not being recognised in my script in the first place? Thanks for your help. Quote Link to comment https://forums.phpfreaks.com/topic/234254-xml-file/ Share on other sites More sharing options...
Muddy_Funster Posted April 20, 2011 Share Posted April 20, 2011 what encoding are you setting in the header of the XML file? Quote Link to comment https://forums.phpfreaks.com/topic/234254-xml-file/#findComment-1203979 Share on other sites More sharing options...
mattspriggs28 Posted April 20, 2011 Author Share Posted April 20, 2011 <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"> Quote Link to comment https://forums.phpfreaks.com/topic/234254-xml-file/#findComment-1203981 Share on other sites More sharing options...
Muddy_Funster Posted April 20, 2011 Share Posted April 20, 2011 You should probably have a look at this: http://www.w3schools.com/xml/xml_encoding.asp for more info and options. Quote Link to comment https://forums.phpfreaks.com/topic/234254-xml-file/#findComment-1203983 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.