limex Posted December 21, 2009 Share Posted December 21, 2009 Hi, I want to strip the chars that are invalid in XML based on the specs from the W3 Homepage: function strip_invalid_xml_chars( $in ) { $out = ""; $length = strlen($in); for ( $i = 0; $i < $length; $i++) { $current = ord($in{$i}); if ( ($current == 0x9) || ($current == 0xA) || ($current == 0xD) || (($current >= 0x20) && ($current <= 0x7E)) || (($current >= 0xA0) && ($current <= 0xD7FF)) || (($current >= 0xE000) && ($current <= 0xFFFD)) || (($current >= 0x10000) && ($current <= 0x10FFFF))) { $out .= chr($current); } else { $out .= " "; } } return $out; } But the performance is not the best, so I decided to use regex: $input_sting = "abcdefg ™ ´ ®"; $clean_string=preg_replace('/[^\x9\xA\xD\x20-\x7E\xA0-\{xD7FF}\x{E000}-x{FFFD}\x{10000}-\x{10FFFF}]/u','',$input_sting); But I get an Warning and an empty $clear_string: Compilation failed: range out of order in character class at offset 26 Could someone fix this? Thanks a lot Quote Link to comment Share on other sites More sharing options...
ChemicalBliss Posted December 21, 2009 Share Posted December 21, 2009 Hmm, just to let you know - there is a specifric REGEX help sub-forum in this forum, i would suggest moving this topic there as they are the REGEX gurus . -CB- Quote Link to comment Share on other sites More sharing options...
cags Posted December 21, 2009 Share Posted December 21, 2009 I think you (or whoever wrote it) made a typo, I believe \{xD7FF} should be \x{D7FF}. Quote Link to comment Share on other sites More sharing options...
salathe Posted December 21, 2009 Share Posted December 21, 2009 There are a number of silly mistakes in your pattern; for example the hex character typoes \{xD7FF} and x{FFFD}, and it nukes the range \x20-\x7E which are perfectly normal, printable, safe, happy-in-XML characters. Can you links us to where precisely you got these ranges of characters from?.. Quote Link to comment Share on other sites More sharing options...
limex Posted February 22, 2010 Author Share Posted February 22, 2010 THX for your help. Sorry for the late reply. The code works now because of your help. The ranges of chars are from the W3 homepage. Where they list the specs for XML. But I don't find the lik any more. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.