dil_bert Posted November 12, 2017 Share Posted November 12, 2017 good evening dear php-freaks I'm pretty new to php programming especially.I would like to scrape the web finding a certain string using curl. I've been trying setting different user agents, and setting other options but I just can't seem to get the urls of that pages, I believe it has something to do with the fact that the query string gets encoded somewhere but I'm really not sure how to get around that.Nonetheless, I'm trying to get done a very simple script for retriving data web and parsing.i think that this is a a great strarting point to for beginners on weekend codes. but at the moment i got troubles.i need to fetch all the urls that contain this term; this term: /participants-database/ //$url is the same as the link above $ch = curl_init(); $user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0' curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent); curl_setopt ($ch, CURLOPT_HEADER, 0); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120); curl_setopt ($ch,CURLOPT_TIMEOUT,120); curl_setopt ($ch,CURLOPT_MAXREDIRS,10); curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt"); curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt"); echo curl_exec ($ch); i need to fetch all the urls that contain this term;/participants-database/Well - Probably this kind of stuff is trivial in PHP. Here PHP is a great language - a good language.love to hear from you dil-bert Quote Link to comment Share on other sites More sharing options...
phpmillion Posted November 12, 2017 Share Posted November 12, 2017 What debugging you done so far and what was the output? I'm afraid it's impossible to provide you any assistance at this point. To summarize, you need to explain the issue more clearly because the function itself looks OK, and the problem is somewhere else (most likely). 1 Quote Link to comment Share on other sites More sharing options...
dil_bert Posted November 12, 2017 Author Share Posted November 12, 2017 hello dear phpmillion, many many thanks for the quick reply - great to hear from you. What debugging you done so far and what was the output? I'm afraid it's impossible to provide you any assistance at this point. To summarize, you need to explain the issue more clearly because the function itself looks OK, and the problem is somewhere else (most likely). well i think that i fetch alot of URLs, - afterwards i have to try to iterate through the result to find ` elementsI've tried my parser code on a single cURL and it works (returns an array with the URLs ).first of all i have to develope a cURL code: $urls = Array( 'http://www.example1.com/foo_bar/1.htm', 'http://www.example2.com/foo_bar/2.htm', 'http://www.example3.com/foo_bar/3.htm', 'http://www.example4.com/foo_bar/4.htm' 'http://www.example5.com/foo_bar/1.htm', 'http://www.example6.com/foo_bar/2.htm', 'http://www.example7.com/foo_bar/3.htm', 'http://www.example8.com/foo_bar/4.htm' ); phpmilliion - well youre right: now i have to find a regular expression that filters out the foo_bar - in otherwords - helps to find each URL that contains the _foo_bar Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.