Fetching multiple sites at once.

Anidazen · January 20, 2007

Hey guys,

Asked about this a couple days ago and got no response, so I am re-posting and elaborating.

Basically, here's the issue:
- I have a script that fetches several websites while a user waits. Due to the custom nature of each search, this information cannot be cached efficiently - it has to be real time.

- To do standard CURL requests, fetching and parsing each website in sequence produces an uncomfortable load time. A high quality load-bar can only buy you so much time!

- A very kind member of these forums (printf) helped with a class that helped, but this was ages ago and printf no longer visits the boards (i believe). The problem is that the class is simply too unstable - with random timeouts occuring for one reason or another.

So I'm looking for some stable way to download more than one website at once in PHP. I really can't believe that wanting to do this is as rare as it seems to be, I'd have thought it'd be mainstream.

Anyway - does anyone have any suggestions how to do this? I am considering taking an AJAX style approach, loading each request in individual frames then either passing the information through the browser (JavaScript) or through the server (MySQL).

One glimmer of hope appears to be the "PECL HTTP" class, of this website: http://pecl.php.net/package/pecl_http

It says it supports parallel requests in PHP 5+. I don't know anything about this, and maybe somebody on this forum can give me some more info. (Does this mean seperate, concurrent pages are possible?) Seems to be very, very little community-based information on this class, and the documentation is far from helpful.

Edit: Forgot to mention: is there some other technology that would be more suited to this task than PHP?

So I know I've raised a lot of questions in one single post, but if people could give some help or advice to any of it, then it would be appreciated.

ShogunWarrior · January 20, 2007

You could have a look at the cURL multi-url functions.

As for the other technology question, yes it could probably be done more efficiently in other languages, even if it's only Perl.

printf · January 20, 2007

The class has been updated many times, I think it on version 1.2 now, I know earlier versions had some problems, but without knowing what your doing with the class, makes it difficult to even help you figure how I can make you a custom version for what your doing. The new version can fetch 1000 pages with 20 concurrent streams in less than 5 seconds. I have people using it with the XML extended class fetching thousands of document every hour. For Windows users I even added a service option, the class can listen on a certain port and handle soap, xml, http request. I have it running as a spider, it does around 400,000 + pages a hour, that includes full indexing with the extended extractor class (page, images, CSS, JavaScrpt). PM me and I will help you...

printf

Anidazen · January 20, 2007

Print!

Awesome to see you're still around, I thought you'd left the boards. :)

PM incoming.

wpt394 · July 21, 2007

Does anyone know what class is being referred to here???? I'm using multi curl to get information from a bunch of webpages, but my request still takes 20seconds or so....Would be nice to try to speed it up a little bit.

Sign In

Fetching multiple sites at once.

Recommended Posts

Anidazen

Link to comment

Share on other sites

ShogunWarrior

Link to comment

Share on other sites

printf

Link to comment

Share on other sites

Anidazen

Link to comment

Share on other sites

wpt394

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information