Jump to content

Fastest way to read URLs.


enoyhs

Recommended Posts

Hi!

I would like to know what is fastest way to read contents of web page?

I would like to know what possibilities are there.

Currently with "file_get_contents()" reads around 10-13 pages in 30 seconds (Default max execution time). I know I can change max execution time so it can read more pages but I would like to find another way if it is possible...

 

Also I would like to know is it possible to print out the results while still loading page? Maybe it is browser(my settings), php setup or script fault so I'm not sure...

 

If code is needed I will offer it after asked.

Link to comment
Share on other sites

Hm...

I'm currently testing "stream_get_contents" I will let you know.

 

So after Daniel0's answer here is my next question:

Is it possible to execute part of code and see result and then continue next step?

For example I will give for cycle:

<?php
for ($i=1; $i<=10; $i++) {
  // Do some stuff.
  // Currently halt script to print out current results.
  // Continue with script.
}
?>

Just to clarify: I'm using W.A.M.P. on Windows Vista and I will firefox for outputting result.

 

My idea would be to execute some code, output it and then refresh page (with headers or how ever you could refresh it after specified time) and continue script from last place. Or is there better way?

Link to comment
Share on other sites

Hm...

 

Not what I meant but that is good idea too...

But with your approach I would need a way to store all previously outputted data so I can see it at end. What would you suggest for that? I think just change method to POST and recognize steps by ifs..

Link to comment
Share on other sites

If you use CURL, you can scan the pages as fast as your server, and their server will load, with A code I have, I can scan (if the pages load quickly) about 3,000 pages in 1-2 minutes.

 

It can gather:

URL's as deep as you want, and read in all the url's on that page to scan for later use.... or if you don't want to scan the urls on the page you don't have too.

Link to comment
Share on other sites

Save these two files:

http://tzfiles.com/users/ryan/LIB_parse.php

http://tzfiles.com/users/ryan/LIB_http.php

 

Then do this example (Returns Google's Title Tag):

<?php
include 'LIB_http.php';
include 'LIB_parse.php';
$url = http_get($target='http://google.com','');

echo return_between($url['FILE'],'<title>','</title>',EXCL);
echo '<br />';
echo return_between($url['FILE'],'<title>','</title>',INCL);
?>

 

 

Example:

http://secret.publicsize.com/run.php

Link to comment
Share on other sites

After several testing I got bad results:

Using cURL only got me around 20 records in 30 seconds. So changed max_execution_time in php.ini and got a bit better result ~1000 results in 6 minutes was max.

 

Could anyone give some advices how to speed up things?

 

I will explain a bit further what I'm trying to do. I got website which has many entries around 3.000 and they look like this:

http://www.website.com/result.php?result=1

So I used for loop to loop trough various pages (changing result value).

My goal: To find all valid results.

For example if result doesn't exist in database it returns page with text: "Invalid page" (though all page formatting is kept). But if page exist for most of pages they are very long so I think this really slows down my gathering.

But actually I only need a top of page from valid pages so is there way to limit how much bytes I need to receive to speed up things?

 

I hope I made myself clear.

Link to comment
Share on other sites

You can print it out while reading it, but Apache (or any other web server) won't send the data before the script is done executing so you wouldn't notice. The only way you would notice is if you are using the script from command line

I think you're wrong sorry. I'm pretty sure that as long as you don't buffer your output it should display as soon as you echo it.
Link to comment
Share on other sites

Daniel0: I'm not sure about this...

For example take a look at big pages which have lot of content.

When you load page content is showing up while you are loading. Not sure if it is php (maybe it is another so I may be mistaking here), but I'm pretty sure that I have seen php pages give out content while it is still loading...

 

Or am I missing something here?

Link to comment
Share on other sites

here is a quick example of output buffering. i send 'hello world 1' to the browser, sleep 5 seconds, then send 'hello world 2' to the browser:

 

http://www.blueskyis.com/test.php

 

code. i practically never use ob_ anything, so i may have unecessary or redundant functions here:

 

<?
ob_start();
ob_implicit_flush(1);
echo "hello world 1<BR>";
ob_flush();
ob_end_flush();

sleep(5);

ob_start();
ob_implicit_flush(1);
echo "hello world 2";
ob_flush();
ob_end_flush();
exit;

?>

Link to comment
Share on other sites

Could anyone give some advices how to speed up things?

 

Remember that cURL (or any processing method) can only process the information as fast as the server you are retrieving it from can send it.

 

if you are using this:

http://tzfiles.com/users/ryan/LIB_http.php

 

if you scroll down, to where you see this:

# Length of time cURL will wait for a response (seconds)
define("CURL_TIMEOUT", 25);

 

you can change how long cURL waits for a response, so by changing 25 to 3, it will wait 3 seconds before it moves on.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.