Jump to content

Non Utf-8 Characters & Special Characters - No Errors But No Output


phoenixx

Recommended Posts

I am trying to extract between the <title> </title> tags from several different websites. The code below works perfectly unless there are non ascii characters in the result. But the weird thing is that while it doesn't produce an output if the string has odd characters - it doesn't produce an error either.

 

The characters (and they won't always be the same characters) in the example that breaks are ➹➹➹➹➹ which produces the code below. I want the regex to pickup the characters (in this case ➹) and everything between the title tag.

➹➹➹➹➹ Carnival Cruise Departure Dates

 

 

 

preg_match('/<title">([^"]*)<\/title>/isu',$var2,$title);
$title=$title[1];
 if (preg_last_error() == PREG_NO_ERROR) {
 echo "----------Title: ".$title."<br>";
 }
 else if (preg_last_error() == PREG_INTERNAL_ERROR) {
 echo "----------Title: There is an internal error!";
 }
 else if (preg_last_error() == PREG_BACKTRACK_LIMIT_ERROR) {
 echo "----------Title: Backtrack limit was exhausted!";
 }
 else if (preg_last_error() == PREG_RECURSION_LIMIT_ERROR) {
 echo "----------Title: Recursion limit was exhausted!";
 }
 else if (preg_last_error() == PREG_BAD_UTF8_ERROR) {
 echo "----------Title: Bad UTF8 error!";
 }
 else if (preg_last_error() == PREG_BAD_UTF8_ERROR) {
 echo "----------Title: Bad UTF8 offset error!";
 }

 

Thanks in advance.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.