Jump to content

Confusion with mail.google.com, cURL and http://validator.w3.org/checklink


mejpark

Recommended Posts

Hello.

 

I am building a basic link checker at work using cURL. My application has a function called getHeaders() that returns an array of HTTP headers:

 

function getHeaders($url) {

    if(function_exists('curl_init')) {
        // create a new cURL resource
        $ch = curl_init();
        // set URL and other appropriate options
        $options = array(
            CURLOPT_URL => $url,
            CURLOPT_HEADER => true,
            CURLOPT_NOBODY => true,
            CURLOPT_FOLLOWLOCATION => 1,
            CURLOPT_RETURNTRANSFER => true );
        curl_setopt_array($ch, $options);
        // grab URL and pass it to the browser
        curl_exec($ch);
        $headers = curl_getinfo($ch);
        // close cURL resource, and free up system resources
        curl_close($ch);
    } else {
        echo "<p>Error: <a href=\"http://uk.php.net/manual/en/book.curl.php\">cURL<a/> is not installed on the web server. Unable to continue.</p>";
        return false;
    }
    return $headers;
}

print_r(getHeaders('mail.google.com'));

 

Which yields the following results:

 

Array

(

    [ur1] => http://mail.google.com

    [content_type] => text/html; charset=UTF-8

    [http_code] => 404

    [header_size] => 338

    [request_size] => 55

    [filetime] => -1

    [ssl_verify_result] => 0

    [redirect_count] => 0

    [total_time] => 0.128

    [namelookup_time] => 0.042

    [connect_time] => 0.095

    [pretransfer_time] => 0.097

    [size_upload] => 0

    [size_download] => 0

    [speed_download] => 0

    [speed_upload] => 0

    [download_content_length] => 0

    [upload_content_length] => 0

    [starttransfer_time] => 0.128

    [redirect_time] => 0

)

 

(In case you're wondering, I changed the 'url' key to stop the forum interpreting it as BB Code)

 

I've tested it with several long links, and the function acknowledges redirects, all apart from mail.google.com it seems.

 

For fun, I passed the same URL (mail.google.com) to the W3C link checker, which produced:

 

Results

Links

 

Valid links!

 

List of redirects

 

The links below are not broken, but the document does not use the exact URL, and the links were redirected. It may be a good idea to link to the final location, for the sake of speed.

 

warning Line: 1 http://mail.google.com/mail/ redirected to

 

https://www.google.com/accounts/ServiceLogin?service=mail&passive=true&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3Dl&bsv=zpwhtygjntrz&scc=1&ltmpl=default&ltmplcache=2

 

Status: 302 -> 200 OK

 

This is a temporary redirect. Update the link if you believe it makes sense, or leave it as is.

 

Anchors

 

Found 0 anchors.

 

Checked 1 document in 4.50 seconds.

 

Which is correct, as the address above is where I am redirected to when I enter mail.google.com into my browser.

 

What cURL options would I need to use to make my function return 200 for mail.google.com?

 

Why is it that the function above returns 404 status code as opposed to 302 status code?

 

TIA

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.