mac_gabe Posted July 12, 2011 Share Posted July 12, 2011 I've written a php script to take the contents of an RSS feed and display an index of items on a web page. It works OK but I can't explain why it is displaying data which is no longer contained in the feed URLs! Because the RSS feed is large I've created an intermediate small_feed.txt page which is in turn scraped by my display_index.php script. So the structure is a two-stage process: original_rss.xml is stripped of unneccessary data with condense.php and using file_put_contents () exported manually to small_feed.txt small_feed.txt is scraped by display_index.php using file_get_contents () and then displayed on display_index.php I can open small_feed.txt, look at it, and see all the links end ".php" as they should. But when I view display_index.php in a web browser all the links end ".php#unique_id_983745" (number varies) The unique-ids do exist in the original_rss.xml, and were being passed through in an early version of condense.php to small_feed.txt, but I've since removed those #unique_ids from small_feed.txt. So I don't understand how that data is persisting. I can only imagine it's some caching being done somewhere, but I don't know where. I've tried different browsers , cleaning caches, deleting all backup copies of files, and always get the same result. Anyone have any explanation for what's going on? My display_index.php script starts like this: <?php $feed=file_get_contents ("http://mysite.com/small_feed.txt"); $feed= explode("<item>", $feed); //makes array $y= count($feed) -1; // counts lines of index, subtracts 1 sort( $feed); print '<ul class="index">'; for ($u=1; $u < $y; $u++) { $search="@<title>([^<]*)</title><link>([^<]*)</link></item>@s"; //search term $titles[$u] = preg_replace($search,'$1', $feed[$u]); // gets titles $links[$u] = preg_replace($search,'$2', $feed[$u]); // gets links if (titles[$u][0]=="A") {print "<li><a href='".$links[$u]."'>".$titles[$u]."</a></li>"; } // prints item lines for A's } //etc ?> Link to comment https://forums.phpfreaks.com/topic/241791-cant-explain-odd-url-behaviour-scraping-feed/ Share on other sites More sharing options...
mac_gabe Posted July 12, 2011 Author Share Posted July 12, 2011 Wait! It works now !!! In the 20 minutes or so it took to write that message, somehow it fixed itself ??! Link to comment https://forums.phpfreaks.com/topic/241791-cant-explain-odd-url-behaviour-scraping-feed/#findComment-1241758 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.