mahroch Posted January 19, 2009 Share Posted January 19, 2009 Hi, I tried to solve this out for hours, without success maybe anybody of you knows the solution; Encoding of my scripts and all is UTF-8. I use this code to extract the content of the page encoded in ISO 8859-2 (the page is czech language with characters containing their special symbols, ...) $curl = curl_init(); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($curl, CURLOPT_HEADER, 0); curl_setopt($curl, CURLOPT_POST, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true); curl_setopt($curl, CURLOPT_COOKIEFILE, "cookiefile"); curl_setopt($curl, CURLOPT_COOKIEJAR, "cookiefile"); # SAME cookiefile curl_setopt($curl, CURLOPT_URL, $search_url); # this is where you first time connect - GET method authorization in my case, if you have POST - need to edit code a bit $content = curl_exec($curl); After this I try to extract some words from the content. As the page is encoded in different encoding I get results like this: pam�tihodnost, dl��d�n�, ... So I try to change the coding from original page encoding (ISO 8859-2) to mine (Utf-. I used different methods: iconv, libiconv, differnet user functions from internet (iso88592_2utf8(), convert_charset, ...) but nothing helps. The result is even worse. I don't know what to do to solve it. There is one strange thing that confuses me: If I use mb_detect_encoding $content = curl_exec($curl); $enc = mb_detect_encoding(content ); the $enc variable shows UTF-8. So the string is probably wrongly converted within the operation of curl_exec(). Any ideas how to solve it ? Thanx Maros Link to comment https://forums.phpfreaks.com/topic/141436-pls-help-with-changing-encoding-from-iso-8859-2-to-utf-8-in-curl_exec/ Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.