Jump to content

Curl Scraping Script...How to load page dynamically while scanning?


schapel

Recommended Posts

I have written a script that scrapes a few of our partner websites and displays various sections of their content directly on the page where the script is running.

 

The script works fine, and is grabbing the content perfectly, but what I'm looking to do is optimize it so that it will actually load the page and display some 'Loading...' animations while the actual scraping operations are running. I'm assuming I could put the scraping operations possibly in some external .php files or something to that effect.  Is that the best way of attacking this issue or am I overlooking something?  I guess the end result that would be perfect is to have the basic HTML of the page load first, and then have the PHP operations running in the background...

 

Also, I'm wondering if anyone who has experience with these types of scripts knows of any better way to grab html content off various pages.  Curl seems to work great, just curious if there are any other options that are faster or more efficient.

You would have to use AJAX to load the remote content asynchronous after the main page has loaded. And I don't think that any other option is faster than cURL (at least not significantly).

 

I use both cURL and file_get_contents(). Mainly cURL for more advanced cases, e.g. when you need to POST stuff to a page.

So I guess the solution would be to load the PHP functions via javascript directly on the page.  I'll have to read up more on AJAX as my usage of it so far has been limited.  I know the theory and the usage just haven't actually utilized it very much.

  • 2 weeks later...

Just in case anyone bumps into this post I thought I'd elaborate on what I ended up doing.  There is a nice set of functions that I'm sure most of you know stored in the prototype.js library.  One in particular called Ajax.updater.  It enabled me fairly easily to post the data from my form onload() through javascript, and then run a .php script externally from the post data.  When finished loading it updated the results in a DIV on the page.

 

Good stuff.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.