Jump to content

screen scrape special characters from url


Rahul Dev

Recommended Posts

Hello guys i have a problem when i screen scrape a piece of text from a url and save it to my db. The text is in french and contains special characters like é. so when i screen scrape it i receive it in this form &eacute. e.g i have a word région in the website but when i screen scrape it, it becomes région. The reason that i want to store it as it is displayed is that i need to perform some operations on the text after saving it in the db as i  want.

Is there any way to store the screened scrape text in the form that it is displayed or convert it to the way i want(like this - région)

my code is as follows:

$html = file_get_dom('http://www.defimedia.info/news/8425/Grosses-averses-%3A-les-pompiers-inond%C3%A9s-d%E2%80%99appels-'); 

foreach($html->find('div[class=PostContent]') as $element)
{
$tags = array('<div class="PostContent">', '<!-- The Adsense will automatically be inserted half way through the content. Applies for both Side and Middle options. -->', '<font face="Georgia">', '<font size="2">', '');
$new_element = str_replace($tags, "", $element);
$sql1 = "UPDATE articles SET original_text = '" . mysql_real_escape_string($new_element) . "' WHERE article_id = '$item_id'";
$result1 = mysql_query($sql1) or die('Query failed: ' . mysql_error());

}

It is é in the HTML source of the page you are scraping (check it out).  In order to display in a browser it will need to be é so why do you wan't to translate it?  If you must then try html_entity_decode().

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.