robbyc Posted February 1, 2007 Share Posted February 1, 2007 Hi I have an XML document and have it stored as a DomDocument. My problem is I want to check that the header is <?xml version="1.0" encoding="UTF-8"?> if it isn't say for example it is <?xml version="1.0" encoding="ISO-8859-1"?> I want to strip the header off and change it to UTF. I believe this can be done with the aid of regular expressions but I have no idea. Any help will be greatly appreciated Link to comment https://forums.phpfreaks.com/topic/36627-solved-xml-encoding/ Share on other sites More sharing options...
effigy Posted February 1, 2007 Share Posted February 1, 2007 What if the file contains "high ascii" characters that need to be converted to UTF-8? Link to comment https://forums.phpfreaks.com/topic/36627-solved-xml-encoding/#findComment-174605 Share on other sites More sharing options...
robbyc Posted February 1, 2007 Author Share Posted February 1, 2007 Yeah that as well. I don't know too much about encodings so any help will be greatly appreciated. My problem is I am processing xml feeds using php and some feeds that are encoded as ISO-8859-1 end up with symbols such as £ are scrambled. Link to comment https://forums.phpfreaks.com/topic/36627-solved-xml-encoding/#findComment-174644 Share on other sites More sharing options...
effigy Posted February 1, 2007 Share Posted February 1, 2007 Try something like this: <pre> <?php // This is the original ISO-8859-1 file that would really be // a file_get_contents for you (as long as the file isn't huge). $data = '<?xml version="1.0" encoding="ISO-8859-1"?>' . "\n"; $data .= '<root><pound>£</pound></root>' . "\n"; echo htmlspecialchars($data); // Convert to UTF-8. $data = utf8_encode($data); // Change the literal encoding. $data = preg_replace('/(<\?xml.+?encoding=")[^"]+(".*\?>)/', '\1UTF-8\2', $data); echo htmlspecialchars($data); ?> </pre> Link to comment https://forums.phpfreaks.com/topic/36627-solved-xml-encoding/#findComment-174657 Share on other sites More sharing options...
robbyc Posted February 8, 2007 Author Share Posted February 8, 2007 Thanks that works really well now thanks Link to comment https://forums.phpfreaks.com/topic/36627-solved-xml-encoding/#findComment-179804 Share on other sites More sharing options...
robbyc Posted February 8, 2007 Author Share Posted February 8, 2007 I slightly modified the code so that files already encoded as UTF-8 didn't get encoded again. if(!preg_match('/(<\?xml.+?encoding=")UTF-8(".*\?>)/', $contents)) { // Convert to UTF-8. $contents = utf8_encode($contents); // Change the literal encoding. $contents = preg_replace('/(<\?xml.+?encoding=")[^"]+(".*\?>)/', '\1UTF-8\2', $contents); } return $contents; The problem is preg_match('/(<\?xml.+?encoding=")UTF-8(".*\?>)/', $contents) only matches uppercase UTF-8 how do I change the regular expression to also check for utf-8. Thanks Link to comment https://forums.phpfreaks.com/topic/36627-solved-xml-encoding/#findComment-179895 Share on other sites More sharing options...
effigy Posted February 8, 2007 Share Posted February 8, 2007 Make it case insensitive by adding the "i" modifier: /pattern/i Link to comment https://forums.phpfreaks.com/topic/36627-solved-xml-encoding/#findComment-179982 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.