dpacmittal Posted July 19, 2009 Share Posted July 19, 2009 I am grabbing some japanese text using CURL but it is garbled when displayed. I came across a thread in PHPFreaks : http://www.phpfreaks.com/forums/index.php?topic=150735 but it doesn't provide a solution. Is there any way to solve this without using fopen? Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/ Share on other sites More sharing options...
ignace Posted July 19, 2009 Share Posted July 19, 2009 have you tried modifying the content type? Of both the html and php to the japanese content type? http://en.wikipedia.org/wiki/Japanese_language_and_computers Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878053 Share on other sites More sharing options...
dpacmittal Posted July 19, 2009 Author Share Posted July 19, 2009 have you tried modifying the content type? Of both the html and php to the japanese content type? http://en.wikipedia.org/wiki/Japanese_language_and_computers No, there are just few words in japanese... the rest are in english. Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878077 Share on other sites More sharing options...
ignace Posted July 19, 2009 Share Posted July 19, 2009 have you tried modifying the content type? Of both the html and php to the japanese content type? http://en.wikipedia.org/wiki/Japanese_language_and_computers No, there are just few words in japanese... the rest are in english. If you use differing languages within one document then you need according to the W3C (http://www.w3.org/TR/html401/struct/dirlang.html) specification add a lang attribute specifying the language of the contents of the element and if necessary even provide a dir attribute. More specific: http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.2 Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878089 Share on other sites More sharing options...
dpacmittal Posted July 19, 2009 Author Share Posted July 19, 2009 ^ But that won't help me, would it? Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878119 Share on other sites More sharing options...
thebadbad Posted July 19, 2009 Share Posted July 19, 2009 Doing a simple test, grabbing the contents of http://en.wikipedia.org/wiki/Japanese_language, gives me properly encoded characters. But that might be because Wikipedia specifies the lang attribute on elements containing Japanese characters, and sets the content charset to UTF-8. I'm using this code: <?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'http://en.wikipedia.org/wiki/Japanese_language'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FORBID_REUSE, true); curl_setopt($ch, CURLOPT_FRESH_CONNECT, true); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; da; rv:1.9.1) Gecko/20090624 Firefox/3.5'); $contents = curl_exec($ch); curl_close($ch); echo $contents; ?> If that doesn't work with your source, try setting the header header('Content-type: text/html; charset=utf-8'); and/or the content-type HTTP header used in the cURL session curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: text/html; charset=utf-8')); Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878135 Share on other sites More sharing options...
dpacmittal Posted July 19, 2009 Author Share Posted July 19, 2009 Doing a simple test, grabbing the contents of http://en.wikipedia.org/wiki/Japanese_language, gives me properly encoded characters. But that might be because Wikipedia specifies the lang attribute on elements containing Japanese characters, and sets the content charset to UTF-8. I'm using this code: <?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'http://en.wikipedia.org/wiki/Japanese_language'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FORBID_REUSE, true); curl_setopt($ch, CURLOPT_FRESH_CONNECT, true); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; da; rv:1.9.1) Gecko/20090624 Firefox/3.5'); $contents = curl_exec($ch); curl_close($ch); echo $contents; ?> If that doesn't work with your source, try setting the header header('Content-type: text/html; charset=utf-8'); and/or the content-type HTTP header used in the cURL session curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: text/html; charset=utf-8')); Thanks.. that was helpful. Half my problem is solved. It displays the characters fine when I run in my script. What the other part of my script does is put it in wordpress using XMLRPC. http://www.timepass247.com/mylife/ Check how the characters are displaying. The characters are fine in the script, that means there's a problem when posting it through XMLRPC. I used this to encode it into UTF-8. $request = xmlrpc_encode_request('metaWeblog.newPost',$params, Array('encoding'=>'utf-8')); Am I wrong somewhere? This encoding thing really baffles me. Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878140 Share on other sites More sharing options...
thebadbad Posted July 19, 2009 Share Posted July 19, 2009 I'm not familiar with XMLRPC, so I don't know. Have you tried not encoding $request? Else, the answer should lie within the source code of the XMLRPC class/functions. Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878160 Share on other sites More sharing options...
dpacmittal Posted July 19, 2009 Author Share Posted July 19, 2009 I'm not familiar with XMLRPC, so I don't know. Have you tried not encoding $request? Else, the answer should lie within the source code of the XMLRPC class/functions. Yeah, I've tried that too but it didn't work. Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878172 Share on other sites More sharing options...
thebadbad Posted July 19, 2009 Share Posted July 19, 2009 Google found this: http://forums.b2evolution.net/viewtopic.php?t=14326. No idea if it'll work. Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878176 Share on other sites More sharing options...
dpacmittal Posted July 19, 2009 Author Share Posted July 19, 2009 Google found this: http://forums.b2evolution.net/viewtopic.php?t=14326. No idea if it'll work. Thats specific to b2evolution. Thanks for help, anyways. I'll try to sortout the problem, myself. Quote Link to comment https://forums.phpfreaks.com/topic/166505-japanese-characters-pulled-using-curl-dont-display-properly/#findComment-878180 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.