contra10 Posted August 13, 2011 Share Posted August 13, 2011 Hello, this is more of a question since I don't know where to begin. I want to use a spider to search a website and gather information and then organize it into a table the specific information from the site that I need. For example lets say I wanted to search a sports website and only get the details of the teams and their scores and then place it into my database without getting other information of that website? Quote Link to comment Share on other sites More sharing options...
trq Posted August 13, 2011 Share Posted August 13, 2011 You haven't asked a question. Quote Link to comment Share on other sites More sharing options...
contra10 Posted August 13, 2011 Author Share Posted August 13, 2011 sorry, I guess my question is: How do I create a spider that searches a website extracts links, and certain keywords, then organizes it so that it will be saved in a database? Quote Link to comment Share on other sites More sharing options...
trq Posted August 13, 2011 Share Posted August 13, 2011 You will need to fetch the file (you can use file_get_contents for this), parse it using either regex (see preg_match) or the dom extension (see dom), then save your results to a database. Quote Link to comment Share on other sites More sharing options...
contra10 Posted August 17, 2011 Author Share Posted August 17, 2011 I don't really understand preg_match when using it, although i understand the concept i did this as a trial <?php $html = file_get_contents('http://www.google.com/'); preg_match('google', "$html", $matches); preg_match_all("<a href", $html, $match); foreach($match[1] as $val); { echo $val."<br>"; } ?> i don't really get how to get the actual a href links from all the links on the page Quote Link to comment Share on other sites More sharing options...
xyph Posted August 17, 2011 Share Posted August 17, 2011 This is a very bad starter project. I'd suggest buying a good PHP and Regular Expression book. By the time you get through both, scraping a page should be easy enough. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.