Jump to content

file_get_contents or Curl - which one to take for a little parser


dilbertone

Recommended Posts

i currently write a little parser & harvester that collects the data of this website: (see below)

 

http://www.aktive-buergerschaft.de/buergerstiftungsfinder

 

i want to have all foundations that are listed on this page (see examples below).- Well i think, that i

need to choose between file_get_contents and curl - to fetch the datas.

And i have tu use some ideas of a parser - i do not know which one i should use here. Can you give me some hints!?

 

first .- i present my FETCHING-Part:  with curl:

 

well I've never needed to use curl myself, but, obvious resource php.net's example is;

 

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
$data = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);


//Then you can use $data for parsing
?>

 

well to be frank:

 

If we dont have curl a slower function is file_get_contents() - this will work too! Well i think that it just is about 1-2 seconds slower, but the call is much easier!

<?php
$html = file_get_contents('http://www.example.com');

//now all the html is the $html
?>

 

anyway - i think the much more interesting part is the parsing

 

 

i have to parse the stuff - in order to get the  following data: See  the site with examples..http://www.aktive-buergerschaft.de/buergerstiftungsfinder

 

 

Bürgerstiftung Lebensraum Aachen

    rechtsfähige Stiftung des bürgerlichen Rechts

    Ansprechpartner: Hubert Schramm

    Alexanderstr. 69/ 71

    52062 Aachen

    Telefon: 0241 - 4500130

    Telefax: 0241 - 4500131

    Email: info@buergerstiftung-aachen.de

    www.buergerstiftung-aachen.de

    >> Weitere Details zu dieser Stiftung

 

Bürgerstiftung Achim

    rechtsfähige Stiftung des bürgerlichen Rechts

    Ansprechpartner: Helga Kühn

    Rotkehlchenstr. 72

    28832 Achim

    Telefon: 04202-84981

    Telefax: 04202-955210

    Email: info@buergerstiftung-achim.de

    www.buergerstiftung-achim.de

    >> Weitere Details zu dieser Stiftung

 

BürgerStiftung Region Ahrensburg

    rechtsfähige Stiftung des bürgerlichen Rechts

    Ansprechpartner: Dr. Michael Eckstein

    An der Reitbahn 3

    22926 Ahrensburg

    Telefon: 04102 - 67 84 89

    Telefax: 04102 - 82 34 56

    Email: info@buergerstiftung-ahrensburg.de

    www.buergerstiftung-region-ahrensburg.de

    >> Weitere Details zu dieser Stiftung

 

 

i have to parse the stuff - in order to get the  following data: See  the site with examples..http://www.aktive-buergerschaft.de/buergerstiftungsfinder

 

Note: see the link here -    >> Weitere Details zu dieser Stiftung i need to grab the datas that is "behind"  this link!

 

 

 

 

 

 

Link to comment
Share on other sites

i currently write a little parser & harvester that collects the data of this website: (see below)

 

http://www.aktive-buergerschaft.de/buergerstiftungsfinder

 

i want to have all foundations that are listed on this page (see examples below).- Well i think, that i

need to choose between file_get_contents and curl - to fetch the datas.

And i have tu use some ideas of a parser - i do not know which one i should use here. Can you give me some hints!?

 

first .- i present my FETCHING-Part:  with curl:

 

well I've never needed to use curl myself, but, obvious resource php.net's example is;

 

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
$data = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);


//Then you can use $data for parsing
?>

 

well to be frank:

 

If we dont have curl a slower function is file_get_contents() - this will work too! Well i think that it just is about 1-2 seconds slower, but the call is much easier!

<?php
$html = file_get_contents('http://www.example.com');

//now all the html is the $html
?>

 

anyway - i think the much more interesting part is the parsing

 

 

i have to parse the stuff - in order to get the  following data: See  the site with examples..http://www.aktive-buergerschaft.de/buergerstiftungsfinder

 

 

Bürgerstiftung Lebensraum Aachen

    rechtsfähige Stiftung des bürgerlichen Rechts

    Ansprechpartner: Hubert Schramm

    Alexanderstr. 69/ 71

    52062 Aachen

    Telefon: 0241 - 4500130

    Telefax: 0241 - 4500131

    Email: info@buergerstiftung-aachen.de

    www.buergerstiftung-aachen.de

    >> Weitere Details zu dieser Stiftung

 

Bürgerstiftung Achim

    rechtsfähige Stiftung des bürgerlichen Rechts

    Ansprechpartner: Helga Kühn

    Rotkehlchenstr. 72

    28832 Achim

    Telefon: 04202-84981

    Telefax: 04202-955210

    Email: info@buergerstiftung-achim.de

    www.buergerstiftung-achim.de

    >> Weitere Details zu dieser Stiftung

 

BürgerStiftung Region Ahrensburg

    rechtsfähige Stiftung des bürgerlichen Rechts

    Ansprechpartner: Dr. Michael Eckstein

    An der Reitbahn 3

    22926 Ahrensburg

    Telefon: 04102 - 67 84 89

    Telefax: 04102 - 82 34 56

    Email: info@buergerstiftung-ahrensburg.de

    www.buergerstiftung-region-ahrensburg.de

    >> Weitere Details zu dieser Stiftung

 

 

i have to parse the stuff - in order to get the  following data: See  the site with examples..http://www.aktive-buergerschaft.de/buergerstiftungsfinder

 

Note: see the link here -    >> Weitere Details zu dieser Stiftung i need to grab the datas that is "behind"  this link!

Link to comment
Share on other sites

hello dear xyph

 

many many thanks to you  for the quick  reply!

 

Once you have the html data, the easiest way to grab parts is using RegEx

 

thx  for the hint i will try it out!  With this ...:

 

function do_reg($text, $regex, $regs)
{
if (preg_match($regex, $text, $regs)) {
	$result = $regs[0];
} 
else {
	$result = "";
}
return $result;
}

 

or this::

 

 


function do_reg($text, $regex)
{
preg_match_all($regex, $text, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
$result[0][$i];
}
}

 

i  will try out and see which regex fits the most

 

again many thanks for any and all help!

 

db1

Link to comment
Share on other sites

The key is in the pattern. Something like

 

preg_match_all(
'%<dt>([^<]++)</dt>\s++
<dd\ class="refo">([^<]++)</dd>\s++
<dd>Ansprechpartner:\s++([^<]++)</dd>
# etc
%x', 
$subject, $result, PREG_SET_ORDER);
print_r($result);

is what you want

Link to comment
Share on other sites

as file_get_contents() takes less code and is quicker to write for many, I prefer curl as when I write scripts, a big factor is how fast I can get it to grab data from a header, like my scripts are optimized for servers and I mainly write the scripts to automate actions on myspace and facebook applications, but all of my scripts need to load a header fast, grab data, parse it, throw the data I need into variables and then I manipulate new headers and send the information to the server, Im just rambling, my bad :P I don't know anything about preg_match yet, I think I might try to learn that soon

Link to comment
Share on other sites

 

hello CueL3SS  - many many thanks

 

as file_get_contents() takes less code and is quicker to write for many, I prefer curl as when I write scripts, a big factor is how fast I can get it to grab data from a header, like my scripts are optimized for servers and I mainly write the scripts to automate actions on myspace and facebook applications, but all of my scripts need to load a header fast, grab data, parse it, throw the data I need into variables and then I manipulate new headers and send the information to the server, Im just rambling, my bad :P I don't know anything about preg_match yet, I think I might try to learn that soon

 

great to read you and your ideas!

 

i will  try all that is written in the thread! Greetings

;)

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.