tlawless Posted November 4, 2007 Share Posted November 4, 2007 Hi, I'm trying to open an RSS feed. I tried fopen, ran into problems, then found that the solution was a security risk. This had to do with changing PHP defaults on opening files. fsockopen() seemed to be the answer. I was able to get a test case to work. But I'm having problems with the one I really want to use. I'm using an RSS parser, LastRSS, and found a version of it that uses fsockopen instead of fopen. Test feed: http://www.freshfolder.com/rss.php works fine Desired feed: http://www.currentinventory.net/search/category/28/Dump_Trucks.rss doesn't The error returned is "php_network_getaddresses: getaddrinfo failed: Name or service not known in" Any help would be greatly appreciated. Thanks, Tim Quote Link to comment Share on other sites More sharing options...
tlawless Posted November 7, 2007 Author Share Posted November 7, 2007 OK ... I guess this was a dumb question, too hard, or not in the right forum. If someone can at least point me in the right direction I would really appreciate it. Thanks, Tim Quote Link to comment Share on other sites More sharing options...
adam291086 Posted November 7, 2007 Share Posted November 7, 2007 post some code and then we can help Quote Link to comment Share on other sites More sharing options...
tlawless Posted November 7, 2007 Author Share Posted November 7, 2007 I was repackaging the code to present for help and somehow solved the problem. Thanks, Tim Quote Link to comment Share on other sites More sharing options...
tbare Posted November 7, 2007 Share Posted November 7, 2007 Will you post the answer in case someone else is having the same issue? (i'm not yet, but i'm thinking about implementing this in the future, and if i have an idea of something look look out for, could save me some frustration later...) Thanks Quote Link to comment Share on other sites More sharing options...
tlawless Posted November 7, 2007 Author Share Posted November 7, 2007 Here is the code. lastRSS.php is a parser available at http://lastrss.oslab.net/ Modified version of lastRSS.php uses fsockopen instead of fopen. I wanted to use it due to security warnings against settings to make fopen work on my site. Also, I had to comment out some character encoding conversion which I didn't understand and was causing another error. RSSDump uses lastRSS_sock to read the file into an object and prints it out. RSSDump.php <?php /* ====================================================================== lastRSS usage DEMO 1 ---------------------------------------------------------------------- This example shows, how to - create lastRSS object - set transparent cache - get RSS file from URL - show result in array structure ====================================================================== */ // include lastRSS //include "./lastRSS.php"; include "./lastRSS_sock.php"; // Create lastRSS object $rss = new lastRSS; // Set cache dir and cache time limit (1200 seconds) // (don't forget to chmod cahce dir to 777 to allow writing) $rss->cache_dir = ''; $rss->cache_time = 0; $rss->cp = 'US-ASCII'; $rss->date_format = 'l'; // Try to load and parse RSS file of Slashdot.org // $rssurl = 'http://www.freshfolder.com/rss.php'; // $rssurl = './DumpTrucks.rss'; $rssurl = 'http://www.currentinventory.net/search/category/28/Dump_Trucks.rss'; if ($rs = $rss->get($rssurl)) { //if ($rs = $rss->get('http://www.freshfolder.com/rss.php')) { echo '<pre>'; print_r($rs); echo '</pre>'; } else { echo "Error: It's not possible to get $rssurl..."; } ?> lastRSS_sock.php <?php /* ====================================================================== lastRSS 0.9.1 Simple yet powerfull PHP class to parse RSS files. by Vojtech Semecky, webmaster @ webdot . cz Latest version, features, manual and examples: http://lastrss.webdot.cz/ WARNING: this is a modified version of lastRSS 0.9.1 and is not an official modification. It uses fsockopen instead of fopen in the PARSE method to get the contents of the RSS feed - this is done because, for my needs the file to open is always a remote file identifiable by a URI. ---------------------------------------------------------------------- LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. To read the license please visit http://www.gnu.org/copyleft/gpl.html ====================================================================== */ /** * lastRSS * Simple yet powerfull PHP class to parse RSS files. */ class lastRSS { // ------------------------------------------------------------------- // Public properties // ------------------------------------------------------------------- var $default_cp = 'UTF-8'; var $CDATA = 'nochange'; var $cp = ''; var $items_limit = 0; var $stripHTML = False; var $date_format = ''; // ------------------------------------------------------------------- // Private variables // ------------------------------------------------------------------- var $channeltags = array ('title', 'link', 'description', 'language', 'copyright', 'managingEditor', 'webMaster', 'lastBuildDate', 'rating', 'docs'); var $itemtags = array('title', 'link', 'description', 'author', 'category', 'comments', 'enclosure', 'guid', 'pubDate', 'source'); var $imagetags = array('title', 'url', 'link', 'width', 'height'); var $textinputtags = array('title', 'description', 'name', 'link'); // ------------------------------------------------------------------- // Parse RSS file and returns associative array. // ------------------------------------------------------------------- function Get ($rss_url) { // If CACHE ENABLED if ($this->cache_dir != '') { $cache_file = $this->cache_dir . '/rsscache_' . md5($rss_url); $timedif = @(time() - filemtime($cache_file)); if ($timedif < $this->cache_time) { // cached file is fresh enough, return cached array $result = unserialize(join('', file($cache_file))); // set 'cached' to 1 only if cached file is correct if ($result) $result['cached'] = 1; } else { // cached file is too old, create new $result = $this->Parse($rss_url); $serialized = serialize($result); if ($f = @fopen($cache_file, 'w')) { fwrite ($f, $serialized, strlen($serialized)); fclose($f); } if ($result) $result['cached'] = 0; } } // If CACHE DISABLED >> load and parse the file directly else { $result = $this->Parse($rss_url); if ($result) $result['cached'] = 0; } // return result return $result; } // ------------------------------------------------------------------- // Modification of preg_match(); return trimed field with index 1 // from 'classic' preg_match() array output // ------------------------------------------------------------------- function my_preg_match ($pattern, $subject) { // start regullar expression preg_match($pattern, $subject, $out); // if there is some result... process it and return it if(isset($out[1])) { // Process CDATA (if present) if ($this->CDATA == 'content') { // Get CDATA content (without CDATA tag) $out[1] = strtr($out[1], array('<![CDATA['=>'', ']]>'=>'')); } elseif ($this->CDATA == 'strip') { // Strip CDATA $out[1] = strtr($out[1], array('<![CDATA['=>'', ']]>'=>'')); } // If code page is set convert character encoding to required // if ($this->cp != '') if (false) //$out[1] = $this->MyConvertEncoding($this->rsscp, $this->cp, $out[1]); $out[1] = iconv($this->rsscp, $this->cp.'//TRANSLIT', $out[1]); // Return result return trim($out[1]); } else { // if there is NO result, return empty string return ''; } } // ------------------------------------------------------------------- // Replace HTML entities &something; by real characters // ------------------------------------------------------------------- function unhtmlentities ($string) { // Get HTML entities table $trans_tbl = get_html_translation_table (HTML_ENTITIES, ENT_QUOTES); // Flip keys<==>values $trans_tbl = array_flip ($trans_tbl); // Add support for ' entity (missing in HTML_ENTITIES) $trans_tbl += array(''' => "'"); // Replace entities by values return strtr ($string, $trans_tbl); } // ------------------------------------------------------------------- // Parse() is private method used by Get() to load and parse RSS file. // Don't use Parse() in your scripts - use Get($rss_file) instead. // ------------------------------------------------------------------- function Parse ($rss_url) { // Open and load RSS file $urlParts = parse_url($rss_url); $host = $urlParts['host']; $uri = $urlParts['path']; if (strcmp($urlParts['query'], '') != 0) { $uri .= '?' . $urlParts['query']; } if(strcmp($urlParts['fragment'],'') !=0){ $fragment = $urlParts['fragment']; $fragment = substr($fragment,4,strlen($fragment)-3); $uri = $uri . $fragment; } if ($f = fsockopen($host, 80, $errno, $errstr, $this->connection_time)) { $rss_content = ''; fputs($f, "GET $uri HTTP/1.0\r\nHost: $host\r\n\r\n"); while (!feof($f)) { $rss_content .= fgets($f, 128); } fclose ($f); // Parse document encoding $result['encoding'] = $this->my_preg_match("'encoding=[\'\"](.*?)[\'\"]'si", $rss_content); // if document codepage is specified, use it if ($result['encoding'] != '') { $this->rsscp = $result['encoding']; } // This is used in my_preg_match() // otherwise use the default codepage else { $this->rsscp = $this->default_cp; } // This is used in my_preg_match() // Parse CHANNEL info preg_match("'<channel.*?>(.*?)</channel>'si", $rss_content, $out_channel); foreach($this->channeltags as $channeltag) { $temp = $this->my_preg_match("'<$channeltag.*?>(.*?)</$channeltag>'si", $out_channel[1]); if ($temp != '') $result[$channeltag] = $temp; // Set only if not empty } // If date_format is specified and lastBuildDate is valid if ($this->date_format != '' && ($timestamp = strtotime($result['lastBuildDate'])) !==-1) { // convert lastBuildDate to specified date format $result['lastBuildDate'] = date($this->date_format, $timestamp); } // Parse TEXTINPUT info preg_match("'<textinput(|[^>]*[^/])>(.*?)</textinput>'si", $rss_content, $out_textinfo); // This a little strange regexp means: // Look for tag <textinput> with or without any attributes, but skip truncated version <textinput /> (it's not beggining tag) if (isset($out_textinfo[2])) { foreach($this->textinputtags as $textinputtag) { $temp = $this->my_preg_match("'<$textinputtag.*?>(.*?)</$textinputtag>'si", $out_textinfo[2]); if ($temp != '') $result['textinput_'.$textinputtag] = $temp; // Set only if not empty } } // Parse IMAGE info preg_match("'<image.*?>(.*?)</image>'si", $rss_content, $out_imageinfo); if (isset($out_imageinfo[1])) { foreach($this->imagetags as $imagetag) { $temp = $this->my_preg_match("'<$imagetag.*?>(.*?)</$imagetag>'si", $out_imageinfo[1]); if ($temp != '') $result['image_'.$imagetag] = $temp; // Set only if not empty } } // Parse ITEMS preg_match_all("'<item(| .*?)>(.*?)</item>'si", $rss_content, $items); $rss_items = $items[2]; $i = 0; $result['items'] = array(); // create array even if there are no items foreach($rss_items as $rss_item) { // If number of items is lower then limit: Parse one item if ($i < $this->items_limit || $this->items_limit == 0) { foreach($this->itemtags as $itemtag) { $temp = $this->my_preg_match("'<$itemtag.*?>(.*?)</$itemtag>'si", $rss_item); if ($temp != '') $result['items'][$i][$itemtag] = $temp; // Set only if not empty } // Strip HTML tags and other bullshit from DESCRIPTION if ($this->stripHTML && $result['items'][$i]['description']) $result['items'][$i]['description'] = strip_tags($this->unhtmlentities(strip_tags($result['items'][$i]['description']))); // Strip HTML tags and other bullshit from TITLE if ($this->stripHTML && $result['items'][$i]['title']) $result['items'][$i]['title'] = strip_tags($this->unhtmlentities(strip_tags($result['items'][$i]['title']))); // If date_format is specified and pubDate is valid if ($this->date_format != '' && ($timestamp = strtotime($result['items'][$i]['pubDate'])) !==-1) { // convert pubDate to specified date format $result['items'][$i]['pubDate'] = date($this->date_format, $timestamp); } // Item counter $i++; } } $result['items_count'] = $i; return $result; } else { return False; //die("Network error: $errstr ($errno)"); } } } ?> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.