Jump to content

how to fetch some URLS with cURL - a first approach


dil_bert

Recommended Posts

good evening dear php-freaks
    

I'm pretty new to php programming especially.

I would like to scrape the web finding a certain string using curl. I've been trying setting different user agents, and setting other options but I just can't seem to get the urls of that pages, I believe it has something to do with the fact that the query string gets encoded somewhere but I'm really not sure how to get around that.

Nonetheless, I'm trying to get done a very simple script  for retriving data web and parsing.i think that this is a a great strarting point to for beginners on weekend codes. but at the moment i got troubles.

i need to fetch all the urls that contain this term;

 

this term:  /participants-database/
 


    //$url is the same as the link above
    $ch = curl_init();
    $user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0'
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt ($ch, CURLOPT_HEADER, 0);
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
    curl_setopt ($ch,CURLOPT_TIMEOUT,120);
    curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
    curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
    curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
    echo curl_exec ($ch);

   


i need to fetch all the urls that contain this term;

/participants-database/


Well - Probably this kind of stuff is trivial in PHP.  Here PHP is a great language - a good language.


love to hear from you

 

dil-bert

Link to comment
Share on other sites

What debugging you done so far and what was the output? I'm afraid it's impossible to provide you any assistance at this point. To summarize, you need to explain the issue more clearly because the function itself looks OK, and the problem is somewhere else (most likely).

Link to comment
Share on other sites

hello dear phpmillion,

 

 

many many thanks for the quick reply  - great to hear from you.

 

 

What debugging you done so far and what was the output? I'm afraid it's impossible to provide you any assistance at this point. To summarize, you need to explain the issue more clearly because the function itself looks OK, and the problem is somewhere else (most likely).

 

 

well i think that i fetch alot of URLs, - afterwards i have to try to iterate through the result to find ` elements

I've tried my parser code on a single cURL and it works (returns an array with the URLs ).


first of all i have to develope a cURL code:

$urls = Array(
 
 'http://www.example1.com/foo_bar/1.htm',
 'http://www.example2.com/foo_bar/2.htm',
 'http://www.example3.com/foo_bar/3.htm',
 'http://www.example4.com/foo_bar/4.htm'
 'http://www.example5.com/foo_bar/1.htm',
 'http://www.example6.com/foo_bar/2.htm',
 'http://www.example7.com/foo_bar/3.htm',
 'http://www.example8.com/foo_bar/4.htm'

 );

phpmilliion - well youre right:

now i have to find a regular expression that filters out the foo_bar - in other
words - helps to find each URL that contains the _foo_bar

 

 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.