Jump to content

setup of curl-multi: looping over a bunch of sites [how to adress the array]


dilbertone

Recommended Posts

hello  dear php-friends

 

i currently work on a little parser project

 

 

i have to find solutions for the

 

a. fetching part

b. parser part

 

 

here we go - the target urls:

 

see the overview:  http://dms-schule.bildung.hessen.de/index.html

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html

Search by pressing the button "type" and then choose all schools with the mouse!

Results 2400 schools

Here i can provide some "more help for getting the target!" -

 

 

btw: see some details for this target-server:

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9009

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9742

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9871

 

 

 

well - you see i have to itterate over the sites - with a function /(a loop)

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=1000 to 10000

 

BTW - after fetching the page i have to see which one are empty - those ones do not need to be parsed!

 

 

Well - i want to do this with curl-multi since this is the most advanced way to do this:

 

I see i have an array that can be filled -... but i have to think about the string-concatenation - i guess that i have make some sophisticated string concatenation.

 

this one does not fit -

 

for($i=1;$i<=$match[1];$i++)
{
$url = "http://www.example.com/page?page={$i}";

 

and besides this i have an array - i c an fill the array.

 

can you help me how to run in a loop with

 

 




<?php
/************************************\
* Multi interface in PHP with curl  *
* Requires PHP 5.0, Apache 2.0 and  *
* Curl 				    *
*************************************
* Writen By Cyborg 19671897         *
* Bugfixed by Jeremy Ellman         *
\***********************************/

$urls = array(
   "http://www.google.com/",
   "http://www.altavista.com/",
   "http://www.yahoo.com/"
   );

$mh = curl_multi_init();

foreach ($urls as $i => $url) {
       $conn[$i]=curl_init($url);
       curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,1);//return data as string 
       curl_setopt($conn[$i],CURLOPT_FOLLOWLOCATION,1);//follow redirects
       curl_setopt($conn[$i],CURLOPT_MAXREDIRS,2);//maximum redirects
       curl_setopt($conn[$i],CURLOPT_CONNECTTIMEOUT,10);//timeout
       curl_multi_add_handle ($mh,$conn[$i]);
}

do { $n=curl_multi_exec($mh,$active); } while ($active);

foreach ($urls as $i => $url) {
       $res[$i]=curl_multi_getcontent($conn[$i]);
       curl_multi_remove_handle($mh,$conn[$i]);
       curl_close($conn[$i]);
}
curl_multi_close($mh);


print_r($res);

?>

Question: Do the people running that hessen.de site know you're going to take information from it? Have they specifically told you it's okay?

 

 

well - you see i have to itterate over the sites - with a function /(a loop)

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=1000 to 10000

 

BTW - after fetching the page i have to see which one are empty - those ones do not need to be parsed!

That is a horrible idea. Get a list of schools from the site - one way or another.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.