Jump to content

Reading header HTTP Status Information from URL's


kanth1

Recommended Posts

Hi Guys,

 

Iam kinda new to PHP and would appreciate if any of you geeks shed some light on this topic.

 

I have a MySQL Database which has a URL field.It has roughly 5000 records.

My Goal is to read all the URL's and print their HTTP Status Code (Like if its 200 OK or 404 etc)Information to the Screen.

 

Now i did this using the CURL below CURL FUNCTIONS.

 

********** Code *******************************************

Loop

$ch = curl_init("http://".$Array["host"]);

curl_setopt($ch, CURLOPT_HEADERFUNCTION, writeHeader);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_exec($ch);

curl_close($ch);

End Loop

 

 

Function writeHeader($ch,$str)

{echo "<td>".$str."</td></tr>";}

 

********** End of Code ***************************************

 

The above script works fine but takes around 7 mins to print all the Status Information to the screen.

is there anything that i can do to tweak the code so that i get the HTTP Status Information for all the

5000 records in few sec's.

 

Any Suggestions regarding this matter are greatly appreciated.

 

Thanks in advance

Arun

Link to comment
Share on other sites

You are using cURL which in essence makes the server surf the website as if it was a human and retrieve information about it, meaning you are having to surf remote url for every different url

 

Unless you can "surf" all 5000 pages in one go, and then retrieve information about them all at once, I don't think this is possible....

 

I'm also surprised that your php script has not timed out as it is, because I think the max for php scripts to timeout is something like 2 minutes, you can change this in your php.ini file however

 

 

maybe you can save time as this is your goal, it a different method of what you are trying to do as a WHOLE, why you are needing to retrieve information about the url is the question, and figuring out an algorithim to retrieve information about the url on a need to know basis may save you time, instead of retrieving information from all of them

 

Tell us more of what you're trying to do and we can brainstorm an algorithim to save you time, if that is what you want...

Link to comment
Share on other sites

Thanks  a Lot for a quick turnaround.

 

You are using cURL which in essence makes the server surf the website as if it was a human and retrieve information about it, meaning you are having to surf remote url for every different url

 

Unless you can "surf" all 5000 pages in one go, and then retrieve information about them all at once, I don't think this is possible....

 

I'm also surprised that your php script has not timed out as it is, because I think the max for php scripts to timeout is something like 2 minutes, you can change this in your php.ini file however

 

Actually, the script has timed out for the first time. then i changed the script time out to "0" secs.Then it took like 7 mins to print the HTTP Status Information.

 

 

maybe you can save time as this is your goal, it a different method of what you are trying to do as a WHOLE, why you are needing to retrieve information about the url is the question, and figuring out an algorithim to retrieve information about the url on a need to know basis may save you time, instead of retrieving information from all of them

 

Tell us more of what you're trying to do and we can brainstorm an algorithim to save you time, if that is what you want...

 

We are trying to bulild an application that checks the status of the various servers registered with us.

we need to run cron job that goes and fetches all the  registered servers by reading the HTTP Status response to make sure that the server is responding.

 

if the server is responding then we need to run a battery of tests based on under what type of service they have registered.

 

So my first step in the process would be to loop throug all the registered URL's and find out if they are UP or Down.

 

Let me know if there is a better way of doing this.Also let me know if my explanation makes sense to you.

 

Note: I used the get_headers method too , but its very slow compared to the CURL Method.

Do you suggest using PEAR Classes ..??

 

Thanks

Arun

Link to comment
Share on other sites

well the question is when you do something based on the status of a url

 

if you act as many active urls as possible at the same time in your application, then you would need the largest supply of active urls, (so what you are doing currently is the best you can do)

 

if you act on a certain url or one at a time, for example if a person or human clicks or initiates to request information about a certain url, then you can simply cURL for each url on a need to know basis

 

if you need to have the UP or DOWN status of all urls all at once, then I'm not aware of any other faster method other than what you're doing currently, to deal with timeout issues, you could use javascript or ajax to simply cURL each url with php, but run an ajax or javascript loop, so the php script executes fast as it is only cURLING one url at a time, and the javascript and the html output is what takes longer to load, that way your php script won't timeout

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.