rotx Posted January 31, 2011 Share Posted January 31, 2011 Hi , i've spend some time looking how its possible to spider a phpbb forum with a php script. I'd like to -for example- do a search with the CURL functions , and read out some of the links in the searchresults(topics..). Finally save the links that i want into a mysql database. Somebody got an idea? Quote Link to comment https://forums.phpfreaks.com/topic/226249-phpbb-spider/ Share on other sites More sharing options...
ChemicalBliss Posted January 31, 2011 Share Posted January 31, 2011 I would suggest against reinventing the wheel per say but basically you would use a "recursive function", ie a function that calls itself. This function would take a single argument, a webpage URL. it would return true or false (depending if there are any more links to follow). The function would grab the URL (page), scan it for links, then loop through each link calling itself. It would also save whatever data you want to save (with the url) and the page title in an array - most likely a global array to make things easy. At the end you would have an array something like: array( "http://www.somedomain.com/somepagephp" => array( "title"=>"Some Page!", "keywords"=>"Some content from the page...as the penguin dropped the peanut...etc" ) I would also use another global array containing a simple list of links already scanned, so it doesnt endlessly loop. hope this helps Quote Link to comment https://forums.phpfreaks.com/topic/226249-phpbb-spider/#findComment-1167960 Share on other sites More sharing options...
Adam Posted January 31, 2011 Share Posted January 31, 2011 I wouldn't use a recursive function. A forum typically has many links, and doing it that way you're going to exhaust the memory limit in no time. Build up a list of URLs, like Google's "index", and process one at a time. You need to differentiate between internal and external and index them accordingly. You should also be considerate and limit the number of requests you make to their servers; one every 10 second or so at most. If not they're likely to block you anyway. Quote Link to comment https://forums.phpfreaks.com/topic/226249-phpbb-spider/#findComment-1167977 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.