Jump to content


Photo

A way to do the impossible? Would this work?


  • Please log in to reply
12 replies to this topic

#1 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 05 October 2006 - 05:32 PM

Hey guys.

A while back I posted a thread about trying to use CURL to fetch several seperate pages at the same time, instead of waiting 3 seconds for each to load, in a line.

I was assured this was impossible - but couldn't this be achieved using the following?


1) Instead of loading a script, I load a HTML page with 8 seperate frames/iframes/whatever. This triggers 8 *different* scripts on my server simultaneously.

2) When each script concludes, they simply output one thing - a couple of lines of JavaScript that pass the information from the page they were mining into the browser. (Probably using <input type='hidden'> in a form to store this info. When all 8 have concluded, this form auto submits.)

3) In theory, I would then have all the 8 pages looked up in 3 seconds (whatever the longest page takes) instead of 3 seconds per page?




I can't see why this wouldn't work, but I am highly inexperienced compared to you guys. The thing is, given PHP's wonderful level of sophistication - I simply can't believe that the most efficient way to do this is to use tacky, inefficient client-side scripting, and then use a HTML form as a buffer. Can't this be done entirely server side somehow?



#2 tomfmason

tomfmason
  • Staff Alumni
  • Advanced Member
  • 1,696 posts
  • Locationstealing your wifi

Posted 05 October 2006 - 05:44 PM

Well I guess that would work but..

If you are using a *nix system you could use pcntl_fork.

Now if this is on a windows system then you may want to have a look at this tutorial http://www.phpfreaks...orials/71/0.php If you go this root then you may have to send the results to a db or write a text file.

Good Luck,
Tom

Traveling East in search of instruction, and West to propagate the knowledge I have had gained.

current projects: pokersource

My Blog | My Pastebin | PHP Validation class | Backtrack linux


#3 printf

printf
  • Staff Alumni
  • Advanced Member
  • 889 posts

Posted 05 October 2006 - 07:36 PM

Use socket streaming, you don't need to fork or do anything else. I mean you can download as many pages as you have memory at the exact same time using socket streams. Works the same way on all systems that PHP runs on. If you want a simple example tell me, I'll write you one! The only thing you have to have is socket support configured in your install or if on windows the php_sockets.dll loaded in your php.ini

me!

#4 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 05 October 2006 - 11:51 PM

Wow.

Thanks both of you - seems like very good answers. Both over my head, but very good answers!

I've been doing a lot of research on this, but finding very little - I could find absolutely NO documentation or help explaining pcntl_fork (obviously the manual pages, but no examples or anything).

Curl_multi_exec() function again seemed promising, but little to no documentation - and what I did find seemed to suggest (although I'm not certain) that this was only for multiple pages from the same site.

I discovered something called ares and c-ares libraries which also might seem to be able to help me with this problem - but they seem EXTREMELY advanced (extensions to PHP itself... *gulp*) and I think they might even involve utilising C++ apps in PHP. Again the lack of documentation is stifling to the point where I couldn't even find a basic description. :(


If anyone's got any more advice I'd love to hear it, in particular any experiences with any of the above or further infos.

Print - You're a legend, lol. I'd really like to see a simple example for using sockets or what have-ya to retrieve pages simultaneously from multiple sources - if that's what these can do?

#5 printf

printf
  • Staff Alumni
  • Advanced Member
  • 889 posts

Posted 06 October 2006 - 12:01 AM

Yes that is what it can do, I'll give you a GET example, but if you want POST / GET / SSL / REDIRECT support I might be able to make it option a little later, like tomorrow. But I will post the GET example in a little bit. A simple class with a scipting example usage!


me!

#6 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 07 October 2006 - 12:57 AM

If you could do that - it'd be much appreciated.

#7 printf

printf
  • Staff Alumni
  • Advanced Member
  • 889 posts

Posted 07 October 2006 - 01:22 AM

I am almost done, I figured instead of just giving just the GET Class handler, I would rewrite it to handle GET / POST and SSL AND COOKIES, I am also making some good usage examples, so you see how easy it is to use. I've been testing it and it's much faster than what I thought it would be. I tested it againt a CURL loop, (50) pages CURL 34 seconds, my class (8) seconds, that includes writing out the data to file handles! I should have it ready tomorrow sometime!


me!

#8 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 07 October 2006 - 10:41 AM

Print - you sir are an absolute legend!

#9 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 08 October 2006 - 12:05 PM

I await it very eagerly mate.

#10 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 09 October 2006 - 11:42 AM

Bump.

#11 printf

printf
  • Staff Alumni
  • Advanced Member
  • 889 posts

Posted 09 October 2006 - 03:17 PM

Hi

I didn't forget you, I've just been busy...

Anyway I will PM you a download link sometime tonight when I have a chance to write some real world examples...

Here is example of just fetching (3) pages

The array used in the example...

$file = array ( 
		array ( 
			'url' => 'http://www.msn.com' 
		), 
		array ( 
			'url' => 'http://www.adobe.com' 
		), 
		array ( 
			'url' => 'http://www.google.com' 
		) 
	);

The reason why I am using an array for each request is so that you can supply different request information for each request! Like...

array ( 'url' => 'https://www.site.com' 'method' => 'POST', 'browser' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)' );

If browser is not supplied it default to mozilla, if method is not supplied it defaults to GET, the url tell the class how to make the request, (use SSL (https) or don't use SSL (http)), if the scheme is not supplied, it defaults to http://!

So my example array, lists (msn, adobe, google) in that order, but when the array is sent to the class, the class returns each request in the order it is completed, so the quickest response will be returned first. All of them are monitored at the same time!

Quick examples...

// this uses HTTP 1.0 (no chunk or gz handling)

http://www.dinningou...own.com/one.php

// this uses HTTP 1.1 (with chunk and gz handling)

http://www.dinningou...own.com/two.php


It will be depended on your system which will work better!



me!

#12 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 09 October 2006 - 05:15 PM

Print you really are a legend. You've no idea how much this is gonna help me out.

Guess there are some really genuinely helpful people around. :D

Very much forwards to the PM.




It occurs to me: I've spent a looooooong time looking, and I couldn't find a way to do this before you released this. You might want to stick it on SourceForge? Solves a real need.

#13 phporcaffeine

phporcaffeine
  • Members
  • PipPipPip
  • Advanced Member
  • 361 posts
  • LocationOhio, USA

Posted 09 October 2006 - 05:17 PM

It would be intense on resources but you could probably do it with AJAX
Thanks,

Ryan Huff
President & Founder, MyCodeTree
support@mycodetree.com | http://mycodetree.com




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users