manuel2 Posted July 27, 2007 Share Posted July 27, 2007 This is driving me mad! I have tried CURL and the well know HTTPRequest class (uses fsockopen) to scrap translate.google.com/translate_t and always get bogus utf-8 files. Any clue? I have scrapped many utf-8 content pages before and never got into this, HELP! Code is in here: http://www.phpfreaks.com/forums/index.php/topic,138145.0.html Quote Link to comment Share on other sites More sharing options...
btherl Posted July 27, 2007 Share Posted July 27, 2007 Can you give more detail please? How do you know the utf-8 is bogus? Quote Link to comment Share on other sites More sharing options...
manuel2 Posted July 27, 2007 Author Share Posted July 27, 2007 Can you give more detail please? How do you know the utf-8 is bogus? Hello. Thanks for your comment. I get ������� instead of utf-8.... (and I insert the charset = utf-8 on the metatags to display the page) Quote Link to comment Share on other sites More sharing options...
btherl Posted July 27, 2007 Share Posted July 27, 2007 Please post your complete code. Quote Link to comment Share on other sites More sharing options...
manuel2 Posted July 27, 2007 Author Share Posted July 27, 2007 Please post your complete code. $lang = "ar"; //example $url = "http://translate.google.com/translate_t"; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $useragent); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_POST, 4); $postdata="hl=en&ie=UTF8&langpair=en|".$lang."&text=".$text; curl_setopt($ch, CURLOPT_POSTFIELDS,$postdata); $result= curl_exec ($ch); curl_close ($ch); echo $result; Quote Link to comment Share on other sites More sharing options...
manuel2 Posted July 27, 2007 Author Share Posted July 27, 2007 Isn't it strange? Thanks in advance for any help. Regards Quote Link to comment Share on other sites More sharing options...
manuel2 Posted July 27, 2007 Author Share Posted July 27, 2007 Any help? Thanks. This is really weird. Just run the code above and check for yourself... Is Google sending pages in a strange format?! Quote Link to comment Share on other sites More sharing options...
manuel2 Posted July 27, 2007 Author Share Posted July 27, 2007 Please some help! I have used several methods (besides curl) to get the page and still can't get a decent utf-8 page... Quote Link to comment Share on other sites More sharing options...
per1os Posted July 27, 2007 Share Posted July 27, 2007 Maybe the page isn't UTF-8 Encoded ??? Quote Link to comment Share on other sites More sharing options...
manuel2 Posted July 27, 2007 Author Share Posted July 27, 2007 Maybe the page isn't UTF-8 Encoded ??? Damn. Is this a trick from Google to protect itself from scrappers and automatic script translators? Indeed I don't see the utf-8 metatag set on http://translate.google.com/translate_t How can I figure it out how the page is encoded? Sniffing http headers? Quote Link to comment Share on other sites More sharing options...
manuel2 Posted July 28, 2007 Author Share Posted July 28, 2007 I solved it, I solved it! It's indeed a Google problem. Forget the Accept-Charset: utf-8, it will never work... the solution is rather tricky, lol. I wasted hours trying everything. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.