1zeus1 Posted April 9, 2012 Share Posted April 9, 2012 or need to extract all tags <p> from a site in Italian //////////////////////////////////////////////////////////////////////// header('Content-Type: text/html; charset=iso-8859-1'); function curl_file_get_contents($url) { $curl = curl_init(); $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; curl_setopt($curl,CURLOPT_URL,$url); //The URL to fetch. This can also be set when initializing a session with curl_init(). curl_setopt($curl,CURLOPT_RETURNTRANSFER,TRUE); //TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly. curl_setopt($curl,CURLOPT_CONNECTTIMEOUT,10); //The number of seconds to wait while trying to connect. curl_setopt($curl, CURLOPT_USERAGENT, $userAgent); //The contents of the "User-Agent: " header to be used in a HTTP request. curl_setopt($curl, CURLOPT_FAILONERROR, TRUE); //To fail silently if the HTTP code returned is greater than or equal to 400. curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE); //To follow any "Location: " header that the server sends as part of the HTTP header. curl_setopt($curl, CURLOPT_AUTOREFERER, TRUE); //To automatically set the Referer: field in requests where it follows a Location: redirect. curl_setopt($curl, CURLOPT_TIMEOUT, 5); //The maximum number of seconds to allow cURL functions to execute. $contents = curl_exec($curl); curl_close($curl); return $contents; } $get = curl_file_get_contents($url); function getTextBetweenTags($tag, $html, $strict=0) { /*** a new dom object ***/ $dom = new domDocument; /*** load the html into the object ***/ if($strict==1) { $dom->loadXML($html); } else { $dom->loadHTML($html); } /*** discard white space ***/ $dom->preserveWhiteSpace = false; /*** the tag by its tag name ***/ $content = $dom->getElementsByTagname($tag); /*** the array to return ***/ $out = array(); foreach ($content as $item) { /*** add node value to the out array ***/ $out[] = $item->nodeValue; } /*** return the results ***/ return $out; } <?php $content = getTextBetweenTags1('p', $html); foreach( $content as $item ) { echo $item.'.'; } ?> ///////////////////////////////////////////////////////////////////////////////// My problem is that it does not recognize accented characters./// //////////////////////////////////////////////////////////////////////////////// "con un certo miglioramento del testo e, cosa più importante, con le firme dei sottoscrittori. scusate la ripetizione. che però, dicevano gli antichi, iuvat. un caro saluto. " Or need your help regards,cristian Quote Link to comment https://forums.phpfreaks.com/topic/260594-accented-characters/ Share on other sites More sharing options...
kazymjir Posted April 9, 2012 Share Posted April 9, 2012 This is the encoding problem. Make sure that your php script and xml file uses the same encoding. The PHP file encoding is here: header('Content-Type: text/html; charset=iso-8859-1'); The XML file encoding should be at the top of your xml file. Quote Link to comment https://forums.phpfreaks.com/topic/260594-accented-characters/#findComment-1335882 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.