cURL : Get only headers

champrock · June 17, 2008

HI

I have a list of websites and i want to check if they are working or not. Is there any way to configure cURL to just get the header code that it gets? I dont want to download the full page as that will consume a lot of bandwidth.

I believe "http_code" does that trick but that is after it fetches the full page. I dont want to fetch the full page.

any suggestion or comments on how this can be done?

thanks a lot

champrock · June 17, 2008

(i will be trying to use the multithreaded option in curl to process several requests at once to save time.)

champrock · June 17, 2008

is this even possible?

mikeschroeder · June 17, 2008

function check_link($link) {
    $main = array();
    $ch = curl_init();
    curl_setopt ($ch, CURLOPT_URL, $link);
    curl_setopt ($ch, CURLOPT_HEADER, 1);
    curl_setopt ($ch, CURLOPT_NOBODY, 1);
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ($ch, CURLOPT_NETRC, 1); // omit if you know no urls are FTP links...
    curl_setopt ($ch, CURLOPT_TIMEOUT, 300);
    curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
    
    ob_start();
       curl_exec ($ch);
       $stuff = ob_get_contents();
    ob_end_clean();

    curl_close ($ch);

    $parts = split("n",$stuff,2);
    $main = split(" ",$parts[0],3);

return $main;

}

$start_time = microtime(true);
$db_starttime = mktime();
$return = check_link($_GET['url']);
$end_time = microtime(true);

// $return[0] is the protocol information
// $return[1] is the header information

$load_time = round($end_time - $start_time,2);

print_r($return);

champrock · June 17, 2008

does this only get the headers? I dont want to download the full file cos that means loads of BW usage.

curl_exec will get the full page right ? or am i missing something in this?

mikeschroeder · June 18, 2008

Well personally I probably would have just made the function return $stuff, by commenting out the parts where i break it apart.

Since $stuff contains the RAW output from curl.

Then displayed it with print_r();


function check_link($link) {
    $main = array();
    $ch = curl_init();
    curl_setopt ($ch, CURLOPT_URL, $link);
    curl_setopt ($ch, CURLOPT_HEADER, 1);
    curl_setopt ($ch, CURLOPT_NOBODY, 1);
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ($ch, CURLOPT_NETRC, 1); // omit if you know no urls are FTP links...
    curl_setopt ($ch, CURLOPT_TIMEOUT, 300);
    curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
    
    ob_start();
       curl_exec ($ch);
       $stuff = ob_get_contents();
    ob_end_clean();

    curl_close ($ch);

   // $parts = split("n",$stuff,2);
   // $main = split(" ",$parts[0],3);

return $stuff;

}


$link = 'http://www.phpfreaks.com';
$returnData = check_link($link);
print_r($returnData);

But regardless, the function ONLY returns the headers just as you had asked for in your original post.

Expect a response similar to this.

HTTP/1.1 200 OK
Date: Wed, 18 Jun 2008 15:47:46 GMT
Server: Apache/1.3.37 (Unix) PHP/5.2.3 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.28 OpenSSL/0.9.7a
X-Powered-By: PHP/5.2.3
Set-Cookie: PHPSESSID=63b59aa92e02bfe3bce4b575e05ce992; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Type: text/html

champrock · June 18, 2008

I think i need to clarify my problem a bit. I dont want to download the entire page on the server also . Its not about outputting only the headers. I want the server to just check the headers.

For example: if the file size is 10mb, then according the cURL command provided above, cURL will download the full file on the server and then output only the headers. My problem is that I dont want the server to download 10mb ! I just need the headers

mikeschroeder · June 18, 2008

Just in case you were right, which in this case you are not.

I generated a 100MB .bin file on one of my servers.

I then tested my curl function against it. Resulting in just the headers being returned and a total execution time of 0.11 seconds.

   HTTP/1.1 200 OK
   Date: Wed, 18 Jun 2008 17:56:32 GMT
   Server: Apache/2
   Last-Modified: Wed, 18 Jun 2008 17:55:45 GMT
   ETag: "467070-5f5e100-44ff491e28a40"
   Accept-Ranges: bytes
   Content-Length: 100000000
   Vary: Accept-Encoding,User-Agent
   Content-Type: application/octet-stream

0.11 Seconds

So, to clarify again, it is not downloading the entire 100MB file.

ONLY returning the headers.

Sign In

cURL : Get only headers

Recommended Posts

champrock

Link to comment

Share on other sites

champrock

Link to comment

Share on other sites

champrock

Link to comment

Share on other sites

mikeschroeder

Link to comment

Share on other sites

champrock

Link to comment

Share on other sites

mikeschroeder

Link to comment

Share on other sites

champrock

Link to comment

Share on other sites

mikeschroeder

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information