contra10 Posted August 13, 2011 Share Posted August 13, 2011 Hello, this is more of a question since I don't know where to begin. I want to use a spider to search a website and gather information and then organize it into a table the specific information from the site that I need. For example lets say I wanted to search a sports website and only get the details of the teams and their scores and then place it into my database without getting other information of that website? Link to comment https://forums.phpfreaks.com/topic/244667-get-information-from-a-website-ie-search-engine/ Share on other sites More sharing options...
trq Posted August 13, 2011 Share Posted August 13, 2011 You haven't asked a question. Link to comment https://forums.phpfreaks.com/topic/244667-get-information-from-a-website-ie-search-engine/#findComment-1256718 Share on other sites More sharing options...
contra10 Posted August 13, 2011 Author Share Posted August 13, 2011 sorry, I guess my question is: How do I create a spider that searches a website extracts links, and certain keywords, then organizes it so that it will be saved in a database? Link to comment https://forums.phpfreaks.com/topic/244667-get-information-from-a-website-ie-search-engine/#findComment-1256719 Share on other sites More sharing options...
trq Posted August 13, 2011 Share Posted August 13, 2011 You will need to fetch the file (you can use file_get_contents for this), parse it using either regex (see preg_match) or the dom extension (see dom), then save your results to a database. Link to comment https://forums.phpfreaks.com/topic/244667-get-information-from-a-website-ie-search-engine/#findComment-1256720 Share on other sites More sharing options...
contra10 Posted August 17, 2011 Author Share Posted August 17, 2011 I don't really understand preg_match when using it, although i understand the concept i did this as a trial <?php $html = file_get_contents('http://www.google.com/'); preg_match('google', "$html", $matches); preg_match_all("<a href", $html, $match); foreach($match[1] as $val); { echo $val."<br>"; } ?> i don't really get how to get the actual a href links from all the links on the page Link to comment https://forums.phpfreaks.com/topic/244667-get-information-from-a-website-ie-search-engine/#findComment-1258333 Share on other sites More sharing options...
xyph Posted August 17, 2011 Share Posted August 17, 2011 This is a very bad starter project. I'd suggest buying a good PHP and Regular Expression book. By the time you get through both, scraping a page should be easy enough. Link to comment https://forums.phpfreaks.com/topic/244667-get-information-from-a-website-ie-search-engine/#findComment-1258336 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.