Jump to content

A way to do the impossible? Would this work?


Anidazen

Recommended Posts

Hey guys.

A while back I posted a thread about trying to use CURL to fetch several seperate pages at the same time, instead of waiting 3 seconds for each to load, in a line.

I was assured this was impossible - but couldn't this be achieved using the following?


1) Instead of loading a script, I load a HTML page with 8 seperate frames/iframes/whatever. This triggers 8 *different* scripts on my server simultaneously.

2) When each script concludes, they simply output one thing - a couple of lines of JavaScript that pass the information from the page they were mining into the browser. (Probably using <input type='hidden'> in a form to store this info. When all 8 have concluded, this form auto submits.)

3) In theory, I would then have all the 8 pages looked up in 3 seconds (whatever the longest page takes) instead of 3 seconds per page?




I can't see why this wouldn't work, but I am highly inexperienced compared to you guys. The thing is, given PHP's wonderful level of sophistication - I simply can't believe that the most efficient way to do this is to use tacky, inefficient client-side scripting, and then use a HTML form as a buffer. Can't this be done entirely server side somehow?

Link to comment
Share on other sites

Well I guess that would work but..

If you are using a *nix system you could use [url=http://us2.php.net/manual/en/function.pcntl-fork.php]pcntl_fork[/url].

Now if this is on a windows system then you may want to have a look at this tutorial http://www.phpfreaks.com/tutorials/71/0.php If you go this root then you may have to send the results to a db or write a text file.

Good Luck,
Tom
Link to comment
Share on other sites

Use socket streaming, you don't need to fork or do anything else. I mean you can download as many pages as you have memory at the exact same time using socket streams. Works the same way on all systems that PHP runs on. If you want a simple example tell me, I'll write you one! The only thing you have to have is socket support configured in your install or if on windows the php_sockets.dll loaded in your php.ini

me!
Link to comment
Share on other sites

Wow.

Thanks both of you - seems like very good answers. Both over my head, but very good answers!

I've been doing a lot of research on this, but finding very little - I could find absolutely NO documentation or help explaining pcntl_fork (obviously the manual pages, but no examples or anything).

Curl_multi_exec() function again seemed promising, but little to no documentation - and what I did find seemed to suggest (although I'm not certain) that this was only for multiple pages from the same site.

I discovered something called ares and c-ares libraries which also might seem to be able to help me with this problem - but they seem EXTREMELY advanced (extensions to PHP itself... *gulp*) and I think they might even involve utilising C++ apps in PHP. Again the lack of documentation is stifling to the point where I couldn't even find a basic description. :(


If anyone's got any more advice I'd love to hear it, in particular any experiences with any of the above or further infos.

Print - You're a legend, lol. I'd really like to see a simple example for using sockets or what have-ya to retrieve pages simultaneously from multiple sources - if that's what these can do?
Link to comment
Share on other sites

Yes that is what it can do, I'll give you a GET example, but if you want POST / GET / SSL / REDIRECT support I might be able to make it option a little later, like tomorrow. But I will post the GET example in a little bit. A simple class with a scipting example usage!


me!
Link to comment
Share on other sites

I am almost done, I figured instead of just giving just the GET Class handler, I would rewrite it to handle GET / POST and SSL AND COOKIES, I am also making some good usage examples, so you see how easy it is to use. I've been testing it and it's much faster than what I thought it would be. I tested it againt a CURL loop, (50) pages CURL 34 seconds, my class (8) seconds, that includes writing out the data to file handles! I should have it ready tomorrow sometime!


me!
Link to comment
Share on other sites

Hi

I didn't forget you, I've just been busy...

Anyway I will PM you a download link sometime tonight when I have a chance to write some real world examples...

Here is example of just fetching (3) pages

The array used in the example...

[code]$file = array (
array (
'url' => 'http://www.msn.com'
),
array (
'url' => 'http://www.adobe.com'
),
array (
'url' => 'http://www.google.com'
)
);[/code]

The reason why I am using an array for each request is so that you can supply different request information for each request! Like...

array ( 'url' => 'https://www.site.com', 'method' => 'POST', 'browser' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)' );

If [b]browser[/b] is not supplied it default to [b]mozilla[/b], if [b]method[/b] is not supplied it defaults to [b]GET[/b], the [b]url[/b] tell the class how to make the request, (use SSL (https) or don't use SSL (http)), if the [b]scheme[/b] is not supplied, it defaults to [b]http://[/b]!

So my example array, lists (msn, adobe, google) in that order, but when the array is sent to the class, the class returns each request in the order it is completed, so the quickest response will be returned first. All of them are monitored at the same time!

Quick examples...

// this uses HTTP 1.0 (no chunk or gz handling)

http://www.dinningoutoftown.com/one.php

// this uses HTTP 1.1 (with chunk and gz handling)

http://www.dinningoutoftown.com/two.php


It will be depended on your system which will work better!



me!
Link to comment
Share on other sites

Print you really are a legend. You've no idea how much this is gonna help me out.

Guess there are some really genuinely helpful people around. :D

Very much forwards to the PM.




It occurs to me: I've spent a looooooong time looking, and I couldn't find a way to do this before you released this. You might want to stick it on SourceForge? Solves a real need.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.