Jump to content

best way to parse webpage using curl


graham23s

Recommended Posts

I've never needed to use curl myself, but, obvious resource php.net's example is;

 

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);
?>

 

What more were you hoping to do with regards to parsing?

OK, from what I can tell, to get the page content into a string simply;

 

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
$data = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);


//Then you can use $data for parsing
?>

 

Forgive me if I'm not giving you great answers, learning curl as I go along, lol

If you dont have curl a slower function is file_get_contents

 

That tends to work, just about 1-2 seconds slower, but the call is much easier

 

<?php
$html = file_get_contents('http://www.example.com');

//now all the html is the $html
?>

 

=)

Hi Guys,

 

I was wondering what the best way to parse the html from a webpage recovered by curl?

 

$go = curl_exec($c);

 

kinda thing, any advice would be great

 

cheers

 

Graham

 

Back to your original question, after you grab the contents just utilize regex, substrings, etc. to grab what you need.  You need to figure out the structure of how they display their content first.  You may want to look up something similar to screen scraping.

  • 2 years later...

once you get the curl results you will need to replace a bunch of stuff to get the page to work.

 

use parse_url and then do a regex or replace and fix all relative references.  If they stay relative then the links will try to reference data on YOUR site instead of data on THEIR site. 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.