Jump to content

Parsing URLs


monkeytooth

Recommended Posts

Ok, say a user on my site passes a URL in a comment hand typed no HTML tags of any sort wrapped around it.

 

http://www.example.com how can I run through the string that is the comment they made and catch the URL and wrap it with HTML myself to make it a link? Is there some time of regex I could use in a preg_match() or something that would catch virtually any form of URL from just ending with a domain extension to ending with a line of variables either the good ol fashion blah=variable or MVC style example.com/blah/variable (with or without a trailing slash)

 

ive tried google for a bit but I guess maybe if something exists telling me a good means of doing it then my search terms are off, cause I can't find anything appropriate telling me even where to begin on the concept.

Link to comment
Share on other sites

Some old code there, I modified and got this working, even handles when just www as well.

 

<?php
function formatUrlsInText($text){
            $text = str_ireplace( "www.", "http://www.", $text );
            $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
            preg_match_all($reg_exUrl, $text, $matches);
            $usedPatterns = array();
            foreach($matches[0] as $pattern){
                if(!array_key_exists($pattern, $usedPatterns)){
                    $usedPatterns[$pattern]=true;
                    $text = str_replace($pattern, "<a href='$pattern' rel='nofollow'>$pattern</a> ", $text);
                }
            }
            return $text;
}

$text = "Some sample text with www.google.com http://google.com  and https://google.com";
echo formatUrlsInText($text);
?>

 

 

Result would be:

this:

Some sample text with www.google.com http://google.com  and https://google.com

 

to this:

Some sample text with http://www.google.com http://google.com  and https://google.com

Link to comment
Share on other sites

I thought I would make this a bit more deluxe.

 

I added the ability to check if alive or dead and also add the titles.

 

It can't handle if someone just writes aol.com with no www. or http://, but to check for all top level and second level domains is crazy..plus also is common words like com,in,no,net and so on.

 

https secure sites will read as original.

 

I handled already made hyperlinks best I could think of at the time, might be some better ways.

It's something to use as is or improve upon.

 

<?php
function titleHyper($text){
            $text = preg_replace( "/(www\.)/is", "http://", $text);
            $text = str_replace(array("http://http://","http://https://"), "http://", $text);
            $text = str_replace(array("<a href='", "<a href=\"", "</a>", "'>", "\">"), "", $text);
            $reg_exUrl = "/(http|https|ftp|ftps|)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
            preg_match_all($reg_exUrl, $text, $matches);
            $usedPatterns = array();
            $context = stream_context_create(array(
            'http' => array(
            'timeout' => 5
            )
            ));
            foreach($matches[0] as $pattern){
                if(!array_key_exists($pattern, $usedPatterns)){
                    $usedPatterns[$pattern]=true;
                    $the_contents = @file_get_contents($pattern, 0, $context);
                    if(substr(trim($pattern), 0,  != "https://"){
                    $color = "#FF0000";
                    }
                    if (empty($the_contents)) {
                    $title = $pattern;
                    } else {
                    preg_match("/<title>(.*)<\/title>/Umis", $the_contents, $title);
                    $title = $title[1];
                    $color = "#00FF00";                    
                    //$title = htmlspecialchars($title, ENT_QUOTES); //saving data to database
                    }                    
                    $text = str_ireplace($pattern, "<a style='font-size: 14px; background-color: #FFFFFF; color: $color;' href='$pattern' rel='nofollow' TARGET='_blank'> $title </a>", $text);
                    
                }
            }
            return $text;
}

$text = "Some sample text with WWW.AOL.com<br />http://www.youtube.com/watch?v=YaxKiZfQcX8 <br />Anyone use www.myspace.com?  <br />Some people are nuts, look at this stargate link at http://www.youtube.com/watch?v=ZKoUm6z5SzU&feature=grec_index , like aliens exist or something. http://www.youtube.com/watch?v=sfN-7HczmOU&feature=grec_index  and here's a secure site https://familyhistory.hhs.gov, unless you use curl or allow secure connections it will never get a title. <br /> This is a not valid site http://zzzzzzz and this is a dead site http://zwzwzwxzw.com.<br /> Lastly lets try an already made hyperlink and see what it does <a href='http://phpfreaks.com'>phpfreaks</a>";
echo titleHyper($text);
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.