Jump to content

Archived

This topic is now archived and is closed to further replies.

Anidazen

A way to do the impossible? Would this work?

Recommended Posts

Hey guys.

A while back I posted a thread about trying to use CURL to fetch several seperate pages at the same time, instead of waiting 3 seconds for each to load, in a line.

I was assured this was impossible - but couldn't this be achieved using the following?


1) Instead of loading a script, I load a HTML page with 8 seperate frames/iframes/whatever. This triggers 8 *different* scripts on my server simultaneously.

2) When each script concludes, they simply output one thing - a couple of lines of JavaScript that pass the information from the page they were mining into the browser. (Probably using <input type='hidden'> in a form to store this info. When all 8 have concluded, this form auto submits.)

3) In theory, I would then have all the 8 pages looked up in 3 seconds (whatever the longest page takes) instead of 3 seconds per page?




I can't see why this wouldn't work, but I am highly inexperienced compared to you guys. The thing is, given PHP's wonderful level of sophistication - I simply can't believe that the most efficient way to do this is to use tacky, inefficient client-side scripting, and then use a HTML form as a buffer. Can't this be done entirely server side somehow?

Share this post


Link to post
Share on other sites
Well I guess that would work but..

If you are using a *nix system you could use [url=http://us2.php.net/manual/en/function.pcntl-fork.php]pcntl_fork[/url].

Now if this is on a windows system then you may want to have a look at this tutorial http://www.phpfreaks.com/tutorials/71/0.php If you go this root then you may have to send the results to a db or write a text file.

Good Luck,
Tom

Share this post


Link to post
Share on other sites
Use socket streaming, you don't need to fork or do anything else. I mean you can download as many pages as you have memory at the exact same time using socket streams. Works the same way on all systems that PHP runs on. If you want a simple example tell me, I'll write you one! The only thing you have to have is socket support configured in your install or if on windows the php_sockets.dll loaded in your php.ini

me!

Share this post


Link to post
Share on other sites
Wow.

Thanks both of you - seems like very good answers. Both over my head, but very good answers!

I've been doing a lot of research on this, but finding very little - I could find absolutely NO documentation or help explaining pcntl_fork (obviously the manual pages, but no examples or anything).

Curl_multi_exec() function again seemed promising, but little to no documentation - and what I did find seemed to suggest (although I'm not certain) that this was only for multiple pages from the same site.

I discovered something called ares and c-ares libraries which also might seem to be able to help me with this problem - but they seem EXTREMELY advanced (extensions to PHP itself... *gulp*) and I think they might even involve utilising C++ apps in PHP. Again the lack of documentation is stifling to the point where I couldn't even find a basic description. :(


If anyone's got any more advice I'd love to hear it, in particular any experiences with any of the above or further infos.

Print - You're a legend, lol. I'd really like to see a simple example for using sockets or what have-ya to retrieve pages simultaneously from multiple sources - if that's what these can do?

Share this post


Link to post
Share on other sites
Yes that is what it can do, I'll give you a GET example, but if you want POST / GET / SSL / REDIRECT support I might be able to make it option a little later, like tomorrow. But I will post the GET example in a little bit. A simple class with a scipting example usage!


me!

Share this post


Link to post
Share on other sites
I am almost done, I figured instead of just giving just the GET Class handler, I would rewrite it to handle GET / POST and SSL AND COOKIES, I am also making some good usage examples, so you see how easy it is to use. I've been testing it and it's much faster than what I thought it would be. I tested it againt a CURL loop, (50) pages CURL 34 seconds, my class (8) seconds, that includes writing out the data to file handles! I should have it ready tomorrow sometime!


me!

Share this post


Link to post
Share on other sites
Hi

I didn't forget you, I've just been busy...

Anyway I will PM you a download link sometime tonight when I have a chance to write some real world examples...

Here is example of just fetching (3) pages

The array used in the example...

[code]$file = array (
array (
'url' => 'http://www.msn.com'
),
array (
'url' => 'http://www.adobe.com'
),
array (
'url' => 'http://www.google.com'
)
);[/code]

The reason why I am using an array for each request is so that you can supply different request information for each request! Like...

array ( 'url' => 'https://www.site.com', 'method' => 'POST', 'browser' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)' );

If [b]browser[/b] is not supplied it default to [b]mozilla[/b], if [b]method[/b] is not supplied it defaults to [b]GET[/b], the [b]url[/b] tell the class how to make the request, (use SSL (https) or don't use SSL (http)), if the [b]scheme[/b] is not supplied, it defaults to [b]http://[/b]!

So my example array, lists (msn, adobe, google) in that order, but when the array is sent to the class, the class returns each request in the order it is completed, so the quickest response will be returned first. All of them are monitored at the same time!

Quick examples...

// this uses HTTP 1.0 (no chunk or gz handling)

http://www.dinningoutoftown.com/one.php

// this uses HTTP 1.1 (with chunk and gz handling)

http://www.dinningoutoftown.com/two.php


It will be depended on your system which will work better!



me!

Share this post


Link to post
Share on other sites
Print you really are a legend. You've no idea how much this is gonna help me out.

Guess there are some really genuinely helpful people around. :D

Very much forwards to the PM.




It occurs to me: I've spent a looooooong time looking, and I couldn't find a way to do this before you released this. You might want to stick it on SourceForge? Solves a real need.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.