Jump to content

Google translate and utf8


manuel2

Recommended Posts

This is driving me mad!

 

I have tried CURL and the well know HTTPRequest class (uses fsockopen) to scrap translate.google.com/translate_t and always get bogus utf-8 files.

 

Any clue? I have scrapped many utf-8 content pages before and never got into this, HELP!

 

Code is in here:

http://www.phpfreaks.com/forums/index.php/topic,138145.0.html

Link to comment
https://forums.phpfreaks.com/topic/61966-google-translate-and-utf8/
Share on other sites

Please post your complete code.

 

$lang = "ar"; //example

$url = "http://translate.google.com/translate_t";

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_POST, 4);

$postdata="hl=en&ie=UTF8&langpair=en|".$lang."&text=".$text;
curl_setopt($ch, CURLOPT_POSTFIELDS,$postdata);

$result= curl_exec ($ch);
curl_close ($ch);

echo $result;

Maybe the page isn't UTF-8 Encoded ???

 

Damn. Is this a trick from Google to protect itself from scrappers and automatic script translators?

Indeed I don't see the utf-8 metatag set on http://translate.google.com/translate_t

 

How can I figure it out how the page is encoded? Sniffing http headers?

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.