enoyhs Posted October 22, 2007 Share Posted October 22, 2007 Hi! I would like to know what is fastest way to read contents of web page? I would like to know what possibilities are there. Currently with "file_get_contents()" reads around 10-13 pages in 30 seconds (Default max execution time). I know I can change max execution time so it can read more pages but I would like to find another way if it is possible... Also I would like to know is it possible to print out the results while still loading page? Maybe it is browser(my settings), php setup or script fault so I'm not sure... If code is needed I will offer it after asked. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/ Share on other sites More sharing options...
MadTechie Posted October 22, 2007 Share Posted October 22, 2007 maybe try using stream_get_contents and cycle throught the page in X bytes, also use flush to output the curently steamed bytes Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375480 Share on other sites More sharing options...
Daniel0 Posted October 22, 2007 Share Posted October 22, 2007 You can print it out while reading it, but Apache (or any other web server) won't send the data before the script is done executing so you wouldn't notice. The only way you would notice is if you are using the script from command line Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375513 Share on other sites More sharing options...
enoyhs Posted October 22, 2007 Author Share Posted October 22, 2007 Hm... I'm currently testing "stream_get_contents" I will let you know. So after Daniel0's answer here is my next question: Is it possible to execute part of code and see result and then continue next step? For example I will give for cycle: <?php for ($i=1; $i<=10; $i++) { // Do some stuff. // Currently halt script to print out current results. // Continue with script. } ?> Just to clarify: I'm using W.A.M.P. on Windows Vista and I will firefox for outputting result. My idea would be to execute some code, output it and then refresh page (with headers or how ever you could refresh it after specified time) and continue script from last place. Or is there better way? Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375525 Share on other sites More sharing options...
Daniel0 Posted October 22, 2007 Share Posted October 22, 2007 You could refresh and do stuff like ?step=1 and ?step=2 and such if that's what you mean? Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375526 Share on other sites More sharing options...
enoyhs Posted October 22, 2007 Author Share Posted October 22, 2007 Hm... Not what I meant but that is good idea too... But with your approach I would need a way to store all previously outputted data so I can see it at end. What would you suggest for that? I think just change method to POST and recognize steps by ifs.. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375530 Share on other sites More sharing options...
Daniel0 Posted October 22, 2007 Share Posted October 22, 2007 Any specific reason why you won't run it all at once? Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375557 Share on other sites More sharing options...
enoyhs Posted October 22, 2007 Author Share Posted October 22, 2007 As I mentioned earlier it will really take long (I have around 2500 pages to check) and I would like to see results as fast as I can... Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375562 Share on other sites More sharing options...
The Little Guy Posted October 22, 2007 Share Posted October 22, 2007 If you use CURL, you can scan the pages as fast as your server, and their server will load, with A code I have, I can scan (if the pages load quickly) about 3,000 pages in 1-2 minutes. It can gather: URL's as deep as you want, and read in all the url's on that page to scan for later use.... or if you don't want to scan the urls on the page you don't have too. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375583 Share on other sites More sharing options...
enoyhs Posted October 22, 2007 Author Share Posted October 22, 2007 It sounds great. Could you please link to more info about CURL and/or explain in short what is CURL. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375588 Share on other sites More sharing options...
Daniel0 Posted October 22, 2007 Share Posted October 22, 2007 It sounds great. Could you please link to more info about CURL and/or explain in short what is CURL. http://php.net/curl Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375600 Share on other sites More sharing options...
The Little Guy Posted October 22, 2007 Share Posted October 22, 2007 Save these two files: http://tzfiles.com/users/ryan/LIB_parse.php http://tzfiles.com/users/ryan/LIB_http.php Then do this example (Returns Google's Title Tag): <?php include 'LIB_http.php'; include 'LIB_parse.php'; $url = http_get($target='http://google.com',''); echo return_between($url['FILE'],'<title>','</title>',EXCL); echo '<br />'; echo return_between($url['FILE'],'<title>','</title>',INCL); ?> Example: http://secret.publicsize.com/run.php Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375614 Share on other sites More sharing options...
enoyhs Posted October 22, 2007 Author Share Posted October 22, 2007 Daniel0: Thanks! But already found it, but I wasted around half hour trying to find resources to install it but it was already built in with wamp only needed to activate. The Little Guy: Thanks for those codes Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-375625 Share on other sites More sharing options...
enoyhs Posted October 23, 2007 Author Share Posted October 23, 2007 After several testing I got bad results: Using cURL only got me around 20 records in 30 seconds. So changed max_execution_time in php.ini and got a bit better result ~1000 results in 6 minutes was max. Could anyone give some advices how to speed up things? I will explain a bit further what I'm trying to do. I got website which has many entries around 3.000 and they look like this: http://www.website.com/result.php?result=1 So I used for loop to loop trough various pages (changing result value). My goal: To find all valid results. For example if result doesn't exist in database it returns page with text: "Invalid page" (though all page formatting is kept). But if page exist for most of pages they are very long so I think this really slows down my gathering. But actually I only need a top of page from valid pages so is there way to limit how much bytes I need to receive to speed up things? I hope I made myself clear. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-376119 Share on other sites More sharing options...
Azu Posted October 23, 2007 Share Posted October 23, 2007 You can print it out while reading it, but Apache (or any other web server) won't send the data before the script is done executing so you wouldn't notice. The only way you would notice is if you are using the script from command line I think you're wrong sorry. I'm pretty sure that as long as you don't buffer your output it should display as soon as you echo it. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-376132 Share on other sites More sharing options...
Daniel0 Posted October 23, 2007 Share Posted October 23, 2007 No. Try to run this <?php echo "first string"; sleep(10); echo "\nsecond string"; ?> from a web server and then from CLI. You'll find that I am correct. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-376163 Share on other sites More sharing options...
enoyhs Posted October 23, 2007 Author Share Posted October 23, 2007 Daniel0: I'm not sure about this... For example take a look at big pages which have lot of content. When you load page content is showing up while you are loading. Not sure if it is php (maybe it is another so I may be mistaking here), but I'm pretty sure that I have seen php pages give out content while it is still loading... Or am I missing something here? Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-376173 Share on other sites More sharing options...
Azu Posted October 23, 2007 Share Posted October 23, 2007 No. Try to run this <?php echo "first string"; sleep(10); echo "\nsecond string"; ?> from a web server and then from CLI. You'll find that I am correct. I'm sorry but I personally use lots of sites that do, there is no doubt. If you want a really good example.. https://www.grc.com/x/ne.dll?rh1dkyd2 and click full scan. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-376199 Share on other sites More sharing options...
BlueSkyIS Posted October 23, 2007 Share Posted October 23, 2007 here is a quick example of output buffering. i send 'hello world 1' to the browser, sleep 5 seconds, then send 'hello world 2' to the browser: http://www.blueskyis.com/test.php code. i practically never use ob_ anything, so i may have unecessary or redundant functions here: <? ob_start(); ob_implicit_flush(1); echo "hello world 1<BR>"; ob_flush(); ob_end_flush(); sleep(5); ob_start(); ob_implicit_flush(1); echo "hello world 2"; ob_flush(); ob_end_flush(); exit; ?> Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-376233 Share on other sites More sharing options...
Azu Posted October 23, 2007 Share Posted October 23, 2007 Bleh my link broke :/ this one should work https://www.grc.com/x/ne.dll?bh0bkyd2 Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-376247 Share on other sites More sharing options...
The Little Guy Posted October 23, 2007 Share Posted October 23, 2007 Could anyone give some advices how to speed up things? Remember that cURL (or any processing method) can only process the information as fast as the server you are retrieving it from can send it. if you are using this: http://tzfiles.com/users/ryan/LIB_http.php if you scroll down, to where you see this: # Length of time cURL will wait for a response (seconds) define("CURL_TIMEOUT", 25); you can change how long cURL waits for a response, so by changing 25 to 3, it will wait 3 seconds before it moves on. Quote Link to comment https://forums.phpfreaks.com/topic/74320-fastest-way-to-read-urls/#findComment-376270 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.