Jump to content

Need help expanding a plethora of shortened URLs


wee493

Recommended Posts

I'm using the Twitter streaming API and receiving a feed of recently tweeted links. I'm trying to create an index of popular links. I would like to expand all the shortened links. I'm using the bit.ly API to expand the bitly links as that accounts for almost exactly 1/3 of the links, but I still have 2/3 of other links that could be expanded. I have over 600,000 links in the database and about 200,000 are bit.ly links.

 

So, this means I still have 400,000 links to expand, though not all links can be expanded. For example twitpic links are usually not shortened. But anyways, I'm using the funciton below to expand URLs right now

function untinyurl($tinyurl) {
        $url = parse_url($tinyurl);
        $host = $url['host'];
        $port = isset($url['port']) ? $url['port'] : 80;
        $query = isset($url['query']) ? '?'.$url['query'] : '';
        $fragment = isset($url['fragment']) ? '#'.$url['fragment'] : '';
        if (empty($url['path'])) {
            return '';
        } else {
            $path = $url['path'];
        }

        $ch = curl_init();
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_URL, "http://$host:$port".$path.$query.$fragment);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); // seconds
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_NOBODY, true);
        curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
        $response = curl_exec($ch);
        if ($response === false) {
            $tinyurl = '';
        }
        curl_close($ch);

        $lines = explode("\r\n", $response);
        foreach ($lines as $line) {
            if (stripos($line, 'Location:') === 0) {
                list(, $location) = explode(':', $line, 2);
                return ltrim($location);
            }
        }

        if (strpos($response, 'HTTP/1.1 404 Not Found') === 0) {
            return '';
        }
        return $tinyurl;
    }

 

The only thing is it can take a second or more for this to work. I was wondering if there is any quicker way to do this? I know I'm going to have to ping the servers because there is not an index of shortened urls, so my speeds will depend on the remote servers.

It seems that the get_headers function is what I'm looking for.

 

Here's a demo code for anyone interested

function expand_url($url) {
$h = get_headers($url);

// if no redirect header then return the original url
return isset($h['Location']) ? $h['Location'] : $url;
}

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.