Reading header HTTP Status Information from URL's

kanth1 · June 15, 2007

Hi Guys,

Iam kinda new to PHP and would appreciate if any of you geeks shed some light on this topic.

I have a MySQL Database which has a URL field.It has roughly 5000 records.

My Goal is to read all the URL's and print their HTTP Status Code (Like if its 200 OK or 404 etc)Information to the Screen.

Now i did this using the CURL below CURL FUNCTIONS.

********** Code *******************************************

Loop

$ch = curl_init("http://".$Array["host"]);

curl_setopt($ch, CURLOPT_HEADERFUNCTION, writeHeader);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_exec($ch);

curl_close($ch);

End Loop

Function writeHeader($ch,$str)

{echo "<td>".$str."</td></tr>";}

********** End of Code ***************************************

The above script works fine but takes around 7 mins to print all the Status Information to the screen.

is there anything that i can do to tweak the code so that i get the HTTP Status Information for all the

5000 records in few sec's.

Any Suggestions regarding this matter are greatly appreciated.

Thanks in advance

Arun

dsaba · June 15, 2007

You are using cURL which in essence makes the server surf the website as if it was a human and retrieve information about it, meaning you are having to surf remote url for every different url

Unless you can "surf" all 5000 pages in one go, and then retrieve information about them all at once, I don't think this is possible....

I'm also surprised that your php script has not timed out as it is, because I think the max for php scripts to timeout is something like 2 minutes, you can change this in your php.ini file however

maybe you can save time as this is your goal, it a different method of what you are trying to do as a WHOLE, why you are needing to retrieve information about the url is the question, and figuring out an algorithim to retrieve information about the url on a need to know basis may save you time, instead of retrieving information from all of them

Tell us more of what you're trying to do and we can brainstorm an algorithim to save you time, if that is what you want...

kanth1 · June 15, 2007

Thanks a Lot for a quick turnaround.

You are using cURL which in essence makes the server surf the website as if it was a human and retrieve information about it, meaning you are having to surf remote url for every different url

Unless you can "surf" all 5000 pages in one go, and then retrieve information about them all at once, I don't think this is possible....

I'm also surprised that your php script has not timed out as it is, because I think the max for php scripts to timeout is something like 2 minutes, you can change this in your php.ini file however

Actually, the script has timed out for the first time. then i changed the script time out to "0" secs.Then it took like 7 mins to print the HTTP Status Information.

maybe you can save time as this is your goal, it a different method of what you are trying to do as a WHOLE, why you are needing to retrieve information about the url is the question, and figuring out an algorithim to retrieve information about the url on a need to know basis may save you time, instead of retrieving information from all of them

Tell us more of what you're trying to do and we can brainstorm an algorithim to save you time, if that is what you want...

We are trying to bulild an application that checks the status of the various servers registered with us.

we need to run cron job that goes and fetches all the registered servers by reading the HTTP Status response to make sure that the server is responding.

if the server is responding then we need to run a battery of tests based on under what type of service they have registered.

So my first step in the process would be to loop throug all the registered URL's and find out if they are UP or Down.

Let me know if there is a better way of doing this.Also let me know if my explanation makes sense to you.

Note: I used the get_headers method too , but its very slow compared to the CURL Method.

Do you suggest using PEAR Classes ..??

Thanks

Arun

dsaba · June 15, 2007

well the question is when you do something based on the status of a url

if you act as many active urls as possible at the same time in your application, then you would need the largest supply of active urls, (so what you are doing currently is the best you can do)

if you act on a certain url or one at a time, for example if a person or human clicks or initiates to request information about a certain url, then you can simply cURL for each url on a need to know basis

if you need to have the UP or DOWN status of all urls all at once, then I'm not aware of any other faster method other than what you're doing currently, to deal with timeout issues, you could use javascript or ajax to simply cURL each url with php, but run an ajax or javascript loop, so the php script executes fast as it is only cURLING one url at a time, and the javascript and the html output is what takes longer to load, that way your php script won't timeout

Sign In

Reading header HTTP Status Information from URL's

Recommended Posts

kanth1

Link to comment

Share on other sites

dsaba

Link to comment

Share on other sites

kanth1

Link to comment

Share on other sites

dsaba

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information