Jump to content

[SOLVED] Opening RSS Feed


tlawless

Recommended Posts

Hi,

I'm trying to open an RSS feed. I tried fopen, ran into problems, then found that the solution was a security risk. This had to do with changing PHP defaults on opening files.

 

fsockopen() seemed to be the answer. I was able to get a test case to work. But I'm having problems with the one I really want to use.

 

I'm using an RSS parser, LastRSS, and found a version of it that uses fsockopen instead of fopen.

 

Test feed: http://www.freshfolder.com/rss.php works fine

Desired feed: http://www.currentinventory.net/search/category/28/Dump_Trucks.rss doesn't

 

The error returned is "php_network_getaddresses: getaddrinfo failed: Name or service not known in"

 

Any help would be greatly appreciated.

 

Thanks,

 

Tim

Link to comment
Share on other sites

Will you post the answer in case someone else is having the same issue? :) (i'm not yet, but i'm thinking about implementing this in the future, and if i have an idea of something look look out for, could save me some frustration later...)

 

Thanks ;)

Link to comment
Share on other sites

Here is the code. lastRSS.php is a parser available at http://lastrss.oslab.net/ Modified version of lastRSS.php uses fsockopen instead of fopen. I wanted to use it due to security warnings against settings to make fopen work on my site. Also, I had to comment out some character encoding conversion which I didn't understand and was causing another error.

 

RSSDump uses lastRSS_sock to read the file into an object and prints it out.

 

RSSDump.php

<?php
/* 
======================================================================
lastRSS usage DEMO 1
----------------------------------------------------------------------
This example shows, how to
     - create lastRSS object
    - set transparent cache
    - get RSS file from URL
    - show result in array structure
======================================================================
*/

// include lastRSS
//include "./lastRSS.php";
include "./lastRSS_sock.php";

// Create lastRSS object
$rss = new lastRSS;

// Set cache dir and cache time limit (1200 seconds)
// (don't forget to chmod cahce dir to 777 to allow writing)
$rss->cache_dir = '';
$rss->cache_time = 0;
$rss->cp = 'US-ASCII';
$rss->date_format = 'l';

// Try to load and parse RSS file of Slashdot.org
// $rssurl = 'http://www.freshfolder.com/rss.php';

//  $rssurl = './DumpTrucks.rss';
$rssurl = 'http://www.currentinventory.net/search/category/28/Dump_Trucks.rss';

if ($rs = $rss->get($rssurl)) {
//if ($rs = $rss->get('http://www.freshfolder.com/rss.php')) {
    echo '<pre>';
    print_r($rs);
    echo '</pre>';
    }
else {
    echo "Error: It's not possible to get $rssurl...";
}

?> 

 

lastRSS_sock.php

<?php
/*
======================================================================
lastRSS 0.9.1

Simple yet powerfull PHP class to parse RSS files.

by Vojtech Semecky, webmaster @ webdot . cz

Latest version, features, manual and examples:
	http://lastrss.webdot.cz/






WARNING:
this is a modified version of lastRSS 0.9.1 and is not an official
modification.  It uses fsockopen instead of fopen in the PARSE method
to get the contents of the RSS feed - this is done because, for my needs
the file to open is always a remote file identifiable by a URI.

----------------------------------------------------------------------
LICENSE

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License (GPL)
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

To read the license please visit http://www.gnu.org/copyleft/gpl.html
======================================================================
*/

/**
* lastRSS
* Simple yet powerfull PHP class to parse RSS files.
*/
class lastRSS {
// -------------------------------------------------------------------
// Public properties
// -------------------------------------------------------------------
var $default_cp = 'UTF-8';
var $CDATA = 'nochange';
var $cp = '';
var $items_limit = 0;
var $stripHTML = False;
var $date_format = '';

// -------------------------------------------------------------------
// Private variables
// -------------------------------------------------------------------
var $channeltags = array ('title', 'link', 'description', 'language', 'copyright', 'managingEditor', 'webMaster', 'lastBuildDate', 'rating', 'docs');
var $itemtags = array('title', 'link', 'description', 'author', 'category', 'comments', 'enclosure', 'guid', 'pubDate', 'source');
var $imagetags = array('title', 'url', 'link', 'width', 'height');
var $textinputtags = array('title', 'description', 'name', 'link');

// -------------------------------------------------------------------
// Parse RSS file and returns associative array.
// -------------------------------------------------------------------
function Get ($rss_url) {
	// If CACHE ENABLED
	if ($this->cache_dir != '') {
		$cache_file = $this->cache_dir . '/rsscache_' . md5($rss_url);
		$timedif = @(time() - filemtime($cache_file));
		if ($timedif < $this->cache_time) {
			// cached file is fresh enough, return cached array
			$result = unserialize(join('', file($cache_file)));
			// set 'cached' to 1 only if cached file is correct
			if ($result) $result['cached'] = 1;
		} else {
			// cached file is too old, create new
			$result = $this->Parse($rss_url);
			$serialized = serialize($result);
			if ($f = @fopen($cache_file, 'w')) {
				fwrite ($f, $serialized, strlen($serialized));
				fclose($f);
			}
			if ($result) $result['cached'] = 0;
		}
	}
	// If CACHE DISABLED >> load and parse the file directly
	else {
		$result = $this->Parse($rss_url);
		if ($result) $result['cached'] = 0;
	}
	// return result
	return $result;
}

// -------------------------------------------------------------------
// Modification of preg_match(); return trimed field with index 1
// from 'classic' preg_match() array output
// -------------------------------------------------------------------
function my_preg_match ($pattern, $subject) {
	// start regullar expression
	preg_match($pattern, $subject, $out);

	// if there is some result... process it and return it
	if(isset($out[1])) {
		// Process CDATA (if present)
		if ($this->CDATA == 'content') { // Get CDATA content (without CDATA tag)
			$out[1] = strtr($out[1], array('<![CDATA['=>'', ']]>'=>''));
		} elseif ($this->CDATA == 'strip') { // Strip CDATA
			$out[1] = strtr($out[1], array('<![CDATA['=>'', ']]>'=>''));
		}

		// If code page is set convert character encoding to required
//			if ($this->cp != '')
if (false)
			//$out[1] = $this->MyConvertEncoding($this->rsscp, $this->cp, $out[1]);
			$out[1] = iconv($this->rsscp, $this->cp.'//TRANSLIT', $out[1]);
		// Return result
		return trim($out[1]);
	} else {
	// if there is NO result, return empty string
		return '';
	}
}

// -------------------------------------------------------------------
// Replace HTML entities &something; by real characters
// -------------------------------------------------------------------
function unhtmlentities ($string) {
	// Get HTML entities table
	$trans_tbl = get_html_translation_table (HTML_ENTITIES, ENT_QUOTES);
	// Flip keys<==>values
	$trans_tbl = array_flip ($trans_tbl);
	// Add support for ' entity (missing in HTML_ENTITIES)
	$trans_tbl += array(''' => "'");
	// Replace entities by values
	return strtr ($string, $trans_tbl);
}

// -------------------------------------------------------------------
// Parse() is private method used by Get() to load and parse RSS file.
// Don't use Parse() in your scripts - use Get($rss_file) instead.
// -------------------------------------------------------------------
function Parse ($rss_url) {
	// Open and load RSS file
	$urlParts =	parse_url($rss_url);
	$host = $urlParts['host'];
	$uri  = $urlParts['path'];



	if (strcmp($urlParts['query'], '') != 0) {
		$uri .= '?' . $urlParts['query'];
	}

	if(strcmp($urlParts['fragment'],'') !=0){
		$fragment = $urlParts['fragment'];
		$fragment = substr($fragment,4,strlen($fragment)-3);
		$uri = $uri . $fragment;
	}

	if ($f = fsockopen($host, 80, $errno, $errstr, $this->connection_time)) {
		$rss_content = '';
		fputs($f, "GET $uri HTTP/1.0\r\nHost: $host\r\n\r\n");
		while (!feof($f)) {
			$rss_content .= fgets($f, 128);
		}
		fclose ($f);

		// Parse document encoding
		$result['encoding'] = $this->my_preg_match("'encoding=[\'\"](.*?)[\'\"]'si", $rss_content);
		// if document codepage is specified, use it
		if ($result['encoding'] != '')
			{ $this->rsscp = $result['encoding']; } // This is used in my_preg_match()
		// otherwise use the default codepage
		else
			{ $this->rsscp = $this->default_cp; } // This is used in my_preg_match()

		// Parse CHANNEL info
		preg_match("'<channel.*?>(.*?)</channel>'si", $rss_content, $out_channel);
		foreach($this->channeltags as $channeltag)
		{
			$temp = $this->my_preg_match("'<$channeltag.*?>(.*?)</$channeltag>'si", $out_channel[1]);
			if ($temp != '') $result[$channeltag] = $temp; // Set only if not empty
		}
		// If date_format is specified and lastBuildDate is valid
		if ($this->date_format != '' && ($timestamp = strtotime($result['lastBuildDate'])) !==-1) {
					// convert lastBuildDate to specified date format
					$result['lastBuildDate'] = date($this->date_format, $timestamp);
		}

		// Parse TEXTINPUT info
		preg_match("'<textinput(|[^>]*[^/])>(.*?)</textinput>'si", $rss_content, $out_textinfo);
			// This a little strange regexp means:
			// Look for tag <textinput> with or without any attributes, but skip truncated version <textinput /> (it's not beggining tag)
		if (isset($out_textinfo[2])) {
			foreach($this->textinputtags as $textinputtag) {
				$temp = $this->my_preg_match("'<$textinputtag.*?>(.*?)</$textinputtag>'si", $out_textinfo[2]);
				if ($temp != '') $result['textinput_'.$textinputtag] = $temp; // Set only if not empty
			}
		}
		// Parse IMAGE info
		preg_match("'<image.*?>(.*?)</image>'si", $rss_content, $out_imageinfo);
		if (isset($out_imageinfo[1])) {
			foreach($this->imagetags as $imagetag) {
				$temp = $this->my_preg_match("'<$imagetag.*?>(.*?)</$imagetag>'si", $out_imageinfo[1]);
				if ($temp != '') $result['image_'.$imagetag] = $temp; // Set only if not empty
			}
		}
		// Parse ITEMS
		preg_match_all("'<item(| .*?)>(.*?)</item>'si", $rss_content, $items);
		$rss_items = $items[2];
		$i = 0;
		$result['items'] = array(); // create array even if there are no items
		foreach($rss_items as $rss_item) {
			// If number of items is lower then limit: Parse one item
			if ($i < $this->items_limit || $this->items_limit == 0) {
				foreach($this->itemtags as $itemtag) {
					$temp = $this->my_preg_match("'<$itemtag.*?>(.*?)</$itemtag>'si", $rss_item);
					if ($temp != '') $result['items'][$i][$itemtag] = $temp; // Set only if not empty
				}
				// Strip HTML tags and other bullshit from DESCRIPTION
				if ($this->stripHTML && $result['items'][$i]['description'])
					$result['items'][$i]['description'] = strip_tags($this->unhtmlentities(strip_tags($result['items'][$i]['description'])));
				// Strip HTML tags and other bullshit from TITLE
				if ($this->stripHTML && $result['items'][$i]['title'])
					$result['items'][$i]['title'] = strip_tags($this->unhtmlentities(strip_tags($result['items'][$i]['title'])));
				// If date_format is specified and pubDate is valid
				if ($this->date_format != '' && ($timestamp = strtotime($result['items'][$i]['pubDate'])) !==-1) {
					// convert pubDate to specified date format
					$result['items'][$i]['pubDate'] = date($this->date_format, $timestamp);
				}
				// Item counter
				$i++;
			}
		}

		$result['items_count'] = $i;
		return $result;
	} else {
		return False;
		//die("Network error: $errstr ($errno)");
	}

}
}

?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.