Jump to content


Photo

PHP web crawler


  • Please log in to reply
4 replies to this topic

#1 jasonxxx102

jasonxxx102

    Newbie

  • New Members
  • Pip
  • 9 posts

Posted 08 January 2013 - 05:44 PM

I have a basic PHP web crawler script and I need to expand its functionality, the problem is I'm a total noob at PHP and my knowledge is very basic so I'm coming here for some help.

My goal is to have a basic user input (text box) and when the user types in a phrase; let's say "Red Apples" and hits the enter button the script should start crawling the web for the phrase "Red Apples" and store the plain text results along with the URL they originated from in a database.

Here is what I've got so far:

error_reporting( E_ERROR );
 
define( "CRAWL_LIMIT_PER_DOMAIN", 50 );
 

$domains = array();

$urls = array();
 
function crawl( $url )
{
  global $domains, $urls;
 
  echo "Crawling $url... ";
 
  $parse = parse_url( $url );

  $domains[ $parse['host'] ]++;
  $urls[] = $url;
 
  $content = file_get_contents( $url );
  if ( $content === FALSE )
  {
    echo "Error.\n";
    return;
  }
 
 
  $content = stristr( $content, "body" );
  preg_match_all( '/http:\/\/[^ "\']+/', $content, $matches );
 
  echo 'Found ' . count( $matches[0] ) . " urls.\n";
 
  foreach( $matches[0] as $crawled_url )
  {
    $parse = parse_url( $crawled_url );
 
    if ( count( $domains[ $parse['host'] ] ) < CRAWL_LIMIT_PER_DOMAIN
	    && !in_array( $crawled_url, $urls ) )
    {
	  sleep( 1 );
	  crawl( $crawled_url );
    }
  }
}

If anybody could point me in the right direction that would be awesome.

#2 cpd

cpd

    ¬_¬

  • Members
  • PipPipPip
  • 892 posts
  • LocationLondon, UK

Posted 08 January 2013 - 06:49 PM

I see no specific problem and I see no offer of payment for work so what exactly are you looking for because you're sure-as-hell not gonna get someone to write the code for you...
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

"One of my most productive days was throwing away 1000 lines of code."

#3 jasonxxx102

jasonxxx102

    Newbie

  • New Members
  • Pip
  • 9 posts

Posted 08 January 2013 - 07:03 PM

Did I ask for somebody to write the code for me? I asked for somebody to point me in the right direction. If you're not going to be constructive just save your time and don't post.

Edited by jasonxxx102, 08 January 2013 - 07:07 PM.


#4 haku

haku

    Advanced Member

  • Staff Alumni
  • 6,177 posts

Posted 08 January 2013 - 09:28 PM

What are you asking? You've showed us a code, but didn't tell us what the problem is or what issues you are facing.

#5 gizmola

gizmola

    Advanced Member

  • Administrators
  • 4,116 posts
  • LocationLos Angeles, CA USA

Posted 09 January 2013 - 06:52 AM

Not really looking at the code you have, it's clear that there are 2 obvious elements to your question:

1. Accept input from a text box

How about an html form? Code that up, and have the form post to your crawler script. The phrase will be available in the $_POST superglob

2. Store the results in a database

Pick a database... many to choose from including no-sql db's like mongodb. You'll have to design an appropriate schema. It's not clear what the structure should be, or the purpose of storing the data in the first place.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Cheap Linux VPS from $5
SSD Storage, 30 day Guarantee
1 TB of BW, 100% Network Uptime

AlphaBit.com