Jump to content

pls help with changing encoding from ISO 8859-2 to UTF-8 in curl_exec()


mahroch

Recommended Posts

Hi,

I tried to solve this out for hours, without success :(  maybe anybody of you knows the solution;

 

Encoding of my scripts and all is UTF-8. I use this code to extract the content of the page encoded in ISO 8859-2 (the page is czech language with characters containing their special symbols, ...)

 

 

$curl = curl_init();

curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);

curl_setopt($curl, CURLOPT_HEADER, 0);

curl_setopt($curl, CURLOPT_POST, true);

curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");

curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);

curl_setopt($curl, CURLOPT_COOKIEFILE, "cookiefile");

curl_setopt($curl, CURLOPT_COOKIEJAR, "cookiefile"); # SAME cookiefile

curl_setopt($curl, CURLOPT_URL, $search_url); # this is where you first time connect - GET method authorization in my case, if you have POST - need to edit code a bit

$content = curl_exec($curl);

 

After this I try to extract some words from the content. As the page is encoded in different encoding I get results like this: pam�tihodnost, dl��d�n�, ...

 

So I try to change the coding from original page encoding (ISO 8859-2) to mine (Utf-8). I used different methods: iconv, libiconv, differnet user functions from internet (iso88592_2utf8(), convert_charset, ...) but nothing helps. The result is even worse.

 

I don't know what to do to solve it.

 

There is one strange thing that confuses me: If I use mb_detect_encoding

 

 

$content = curl_exec($curl);

$enc = mb_detect_encoding(content );

 

the $enc variable shows UTF-8. So the string is probably wrongly converted within the operation of curl_exec().

 

Any ideas how to solve it ?

 

Thanx

 

Maros

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.