Rahul Dev Posted January 25, 2011 Share Posted January 25, 2011 Hello guys i have a problem when i screen scrape a piece of text from a url and save it to my db. The text is in french and contains special characters like é. so when i screen scrape it i receive it in this form é. e.g i have a word région in the website but when i screen scrape it, it becomes région. The reason that i want to store it as it is displayed is that i need to perform some operations on the text after saving it in the db as i want. Is there any way to store the screened scrape text in the form that it is displayed or convert it to the way i want(like this - région) my code is as follows: $html = file_get_dom('http://www.defimedia.info/news/8425/Grosses-averses-%3A-les-pompiers-inond%C3%A9s-d%E2%80%99appels-'); foreach($html->find('div[class=PostContent]') as $element) { $tags = array('<div class="PostContent">', '<!-- The Adsense will automatically be inserted half way through the content. Applies for both Side and Middle options. -->', '<font face="Georgia">', '<font size="2">', ''); $new_element = str_replace($tags, "", $element); $sql1 = "UPDATE articles SET original_text = '" . mysql_real_escape_string($new_element) . "' WHERE article_id = '$item_id'"; $result1 = mysql_query($sql1) or die('Query failed: ' . mysql_error()); } Quote Link to comment https://forums.phpfreaks.com/topic/225630-screen-scrape-special-characters-from-url/ Share on other sites More sharing options...
AbraCadaver Posted January 25, 2011 Share Posted January 25, 2011 It is é in the HTML source of the page you are scraping (check it out). In order to display in a browser it will need to be é so why do you wan't to translate it? If you must then try html_entity_decode(). Quote Link to comment https://forums.phpfreaks.com/topic/225630-screen-scrape-special-characters-from-url/#findComment-1165060 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.