Jump to content

accented characters


1zeus1

Recommended Posts

or need to extract all tags <p> from a site in Italian

 

////////////////////////////////////////////////////////////////////////

 

 

 

header('Content-Type: text/html; charset=iso-8859-1');

 

function curl_file_get_contents($url)

{

$curl = curl_init();

$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

 

curl_setopt($curl,CURLOPT_URL,$url); //The URL to fetch. This can also be set when initializing a session with curl_init().

curl_setopt($curl,CURLOPT_RETURNTRANSFER,TRUE); //TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.

curl_setopt($curl,CURLOPT_CONNECTTIMEOUT,10); //The number of seconds to wait while trying to connect.

 

curl_setopt($curl, CURLOPT_USERAGENT, $userAgent); //The contents of the "User-Agent: " header to be used in a HTTP request.

curl_setopt($curl, CURLOPT_FAILONERROR, TRUE); //To fail silently if the HTTP code returned is greater than or equal to 400.

curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE); //To follow any "Location: " header that the server sends as part of the HTTP header.

curl_setopt($curl, CURLOPT_AUTOREFERER, TRUE); //To automatically set the Referer: field in requests where it follows a Location: redirect.

curl_setopt($curl, CURLOPT_TIMEOUT, 5); //The maximum number of seconds to allow cURL functions to execute.

 

$contents = curl_exec($curl);

curl_close($curl);

return $contents;

}

$get = curl_file_get_contents($url);

 

 

function getTextBetweenTags($tag, $html, $strict=0)

{

/*** a new dom object ***/

$dom = new domDocument;

/*** load the html into the object ***/

if($strict==1)

{

$dom->loadXML($html);

}

else

{

$dom->loadHTML($html);

}

 

/*** discard white space ***/

$dom->preserveWhiteSpace = false;

 

/*** the tag by its tag name ***/

$content = $dom->getElementsByTagname($tag);

 

/*** the array to return ***/

$out = array();

foreach ($content as $item)

{

/*** add node value to the out array ***/

$out[] = $item->nodeValue;

}

/*** return the results ***/

return $out;

}

 

 

<?php

$content = getTextBetweenTags1('p', $html);

 

foreach( $content as $item )

{

echo $item.'.';

}

?>

 

/////////////////////////////////////////////////////////////////////////////////

My problem is that it does not recognize accented characters.///

////////////////////////////////////////////////////////////////////////////////

 

"con un certo miglioramento del testo e, cosa più importante, con le firme dei sottoscrittori. scusate la ripetizione. che però, dicevano gli antichi, iuvat. un caro saluto. "

 

Or need your help

regards,cristian

Link to comment
https://forums.phpfreaks.com/topic/260594-accented-characters/
Share on other sites

This is the encoding problem. Make sure that your php script and xml file uses the same encoding.

 

The PHP file encoding is here:

header('Content-Type: text/html; charset=iso-8859-1');

 

The XML file encoding should be at the top of your xml file.

 

 

 

Link to comment
https://forums.phpfreaks.com/topic/260594-accented-characters/#findComment-1335882
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.