Get information from a website (i.e. search engine)

contra10 · August 13, 2011

Hello, this is more of a question since I don't know where to begin. I want to use a spider to search a website and gather information and then organize it into a table the specific information from the site that I need. For example lets say I wanted to search a sports website and only get the details of the teams and their scores and then place it into my database without getting other information of that website?

trq · August 13, 2011

You haven't asked a question.

contra10 · August 13, 2011

sorry, I guess my question is: How do I create a spider that searches a website extracts links, and certain keywords, then organizes it so that it will be saved in a database?

trq · August 13, 2011

You will need to fetch the file (you can use file_get_contents for this), parse it using either regex (see preg_match) or the dom extension (see dom), then save your results to a database.

contra10 · August 17, 2011

I don't really understand preg_match when using it, although i understand the concept

i did this as a trial

<?php 

$html = file_get_contents('http://www.google.com/');

preg_match('google', "$html", $matches);

    preg_match_all("<a href", $html, $match);

foreach($match[1] as $val);  

{        

echo $val."<br>";   
    } 

?>

i don't really get how to get the actual a href links from all the links on the page

xyph · August 17, 2011

This is a very bad starter project.

I'd suggest buying a good PHP and Regular Expression book. By the time you get through both, scraping a page should be easy enough.

Sign In

Get information from a website (i.e. search engine)

Recommended Posts

contra10

Link to comment

Share on other sites

trq

Link to comment

Share on other sites

contra10

Link to comment

Share on other sites

trq

Link to comment

Share on other sites

contra10

Link to comment

Share on other sites

xyph

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information