Jump to content

best way to parse webpage using curl


graham23s

Recommended Posts

I've never needed to use curl myself, but, obvious resource php.net's example is;

 

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);
?>

 

What more were you hoping to do with regards to parsing?

Link to comment
Share on other sites

OK, from what I can tell, to get the page content into a string simply;

 

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
$data = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);


//Then you can use $data for parsing
?>

 

Forgive me if I'm not giving you great answers, learning curl as I go along, lol

Link to comment
Share on other sites

Hi Guys,

 

I was wondering what the best way to parse the html from a webpage recovered by curl?

 

$go = curl_exec($c);

 

kinda thing, any advice would be great

 

cheers

 

Graham

 

Back to your original question, after you grab the contents just utilize regex, substrings, etc. to grab what you need.  You need to figure out the structure of how they display their content first.  You may want to look up something similar to screen scraping.

Link to comment
Share on other sites

  • 2 years later...

once you get the curl results you will need to replace a bunch of stuff to get the page to work.

 

use parse_url and then do a regex or replace and fix all relative references.  If they stay relative then the links will try to reference data on YOUR site instead of data on THEIR site. 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.