Jump to content

Issues with regex in preg_replace


limex

Recommended Posts

Hi,

 

I want to strip the chars that are invalid in XML based on the specs from the W3 Homepage:

 

function strip_invalid_xml_chars( $in ) {
$out = "";
$length = strlen($in);
for ( $i = 0; $i < $length; $i++) {
$current = ord($in{$i});
if ( ($current == 0x9) || ($current == 0xA) || ($current == 0xD) || (($current >= 0x20) && ($current <= 0x7E)) || (($current >= 0xA0) && ($current <= 0xD7FF)) || (($current >= 0xE000) && ($current <= 0xFFFD)) || (($current >= 0x10000) && ($current <= 0x10FFFF))) {
$out .= chr($current);
} else {
$out .= " ";
}
}
return $out;
}

 

But the performance is not the best, so I decided to use regex:

 

$input_sting = "abcdefg ™ ´ ®";
$clean_string=preg_replace('/[^\x9\xA\xD\x20-\x7E\xA0-\{xD7FF}\x{E000}-x{FFFD}\x{10000}-\x{10FFFF}]/u','',$input_sting); 

 

But I get an Warning and an empty $clear_string:

Compilation failed: range out of order in character class at offset 26

 

Could someone fix this? Thanks a lot

Link to comment
https://forums.phpfreaks.com/topic/185897-issues-with-regex-in-preg_replace/
Share on other sites

There are a number of silly mistakes in your pattern; for example the hex character typoes \{xD7FF} and x{FFFD}, and it nukes the range \x20-\x7E which are perfectly normal, printable, safe, happy-in-XML characters. Can you links us to where precisely you got these ranges of characters from?..

  • 2 months later...

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.