Mcod Posted February 10, 2012 Share Posted February 10, 2012 Hi there, My big question of the day is: Do you know of any tool / script to detect a web site's language? I basically have a list of 1 million URL's (Alexa Top 1 Million by rank) and would like to batch check the URL's to find out which site is written in English and filter out the ones who are not. I have the data in a mysql database, but checking so many URL's one by one by downloading the site, analyzing the text and guess what language it is takes too many server resources (I tried). That said, do you maybe know a tool or API which can "guess" the language by feeding a list of URL's? Thanks Quote Link to comment Share on other sites More sharing options...
Pikachu2000 Posted February 10, 2012 Share Posted February 10, 2012 Maybe Google Translate has an API? Quote Link to comment Share on other sites More sharing options...
scootstah Posted February 10, 2012 Share Posted February 10, 2012 Maybe Google Translate has an API? They might not like it if you loop it 1 million times, though. Quote Link to comment Share on other sites More sharing options...
Pikachu2000 Posted February 10, 2012 Share Posted February 10, 2012 If they have that functionality, I'm sure they charge for API usage over x calls/day, so maybe they actually would like it Quote Link to comment Share on other sites More sharing options...
Philip Posted February 11, 2012 Share Posted February 11, 2012 Yar, it is indeed $$. Figuring out the language (by grabbing a string) and the cost Quote Link to comment Share on other sites More sharing options...
scootstah Posted February 11, 2012 Share Posted February 11, 2012 Yar, it is indeed $$. Figuring out the language (by grabbing a string) and the cost Eek. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.