Jump to content

Special Characters Problem (weird!)


kurbsdude

Recommended Posts

Ok here's my problem,

I need to read a text directly from a website ( suppose a certain div value)... the problem is that this value can sometimes be unicode characters (different languages) and sometimes umlauts...

 

now the problem is how to correctly display these characters... if i use htmlentities() unicode displays fine but umlauts are now turned into ����.. and if i use utf8 encode, umlauts display fine while unicode characters are turned into ����...

 

any suggestions?

cheers!!

 

Link to comment
https://forums.phpfreaks.com/topic/172293-special-characters-problem-weird/
Share on other sites

function getCharSetFromMetaTags($strBody) {
$returns = array ( "/\n/", "/\r/", "/\t/", "/\s+/" );
$nullReturns = array( ' ', ' ', ' ', ' ' );

    preg_match("/<meta\s?[^>]*content\s?=\s?\".*charset\s?=\s?(.*)\"\s?\/?>/Ui", preg_replace($returns,$nullReturns,$strBody), $strCharSet);

    return (isset($strCharSet[1])) ? strtoupper($strCharSet[1]) : false;
}	//	function getCharSetFromMetaTags()

 

Read $content from remote website

$charset = getCharSetFromMetaTags($content);
if ($charset === false) {
   // no idea what charset the remote site is using
} elseif ($charset != 'utf-8') {
   iconv($charset,'UTF-8',$content);
}

$content is now utf-8 (or unknown)

 

Do with it as thou wilt

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.