Jump to content

Parsing URLs


monkeytooth

Recommended Posts

Ok, say a user on my site passes a URL in a comment hand typed no HTML tags of any sort wrapped around it.

 

http://www.example.com how can I run through the string that is the comment they made and catch the URL and wrap it with HTML myself to make it a link? Is there some time of regex I could use in a preg_match() or something that would catch virtually any form of URL from just ending with a domain extension to ending with a line of variables either the good ol fashion blah=variable or MVC style example.com/blah/variable (with or without a trailing slash)

 

ive tried google for a bit but I guess maybe if something exists telling me a good means of doing it then my search terms are off, cause I can't find anything appropriate telling me even where to begin on the concept.

Link to comment
https://forums.phpfreaks.com/topic/241028-parsing-urls/
Share on other sites

Some old code there, I modified and got this working, even handles when just www as well.

 

<?php
function formatUrlsInText($text){
            $text = str_ireplace( "www.", "http://www.", $text );
            $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
            preg_match_all($reg_exUrl, $text, $matches);
            $usedPatterns = array();
            foreach($matches[0] as $pattern){
                if(!array_key_exists($pattern, $usedPatterns)){
                    $usedPatterns[$pattern]=true;
                    $text = str_replace($pattern, "<a href='$pattern' rel='nofollow'>$pattern</a> ", $text);
                }
            }
            return $text;
}

$text = "Some sample text with www.google.com http://google.com  and https://google.com";
echo formatUrlsInText($text);
?>

 

 

Result would be:

this:

Some sample text with www.google.com http://google.com  and https://google.com

 

to this:

Some sample text with http://www.google.com http://google.com  and https://google.com

Link to comment
https://forums.phpfreaks.com/topic/241028-parsing-urls/#findComment-1238042
Share on other sites

I thought I would make this a bit more deluxe.

 

I added the ability to check if alive or dead and also add the titles.

 

It can't handle if someone just writes aol.com with no www. or http://, but to check for all top level and second level domains is crazy..plus also is common words like com,in,no,net and so on.

 

https secure sites will read as original.

 

I handled already made hyperlinks best I could think of at the time, might be some better ways.

It's something to use as is or improve upon.

 

<?php
function titleHyper($text){
            $text = preg_replace( "/(www\.)/is", "http://", $text);
            $text = str_replace(array("http://http://","http://https://"), "http://", $text);
            $text = str_replace(array("<a href='", "<a href=\"", "</a>", "'>", "\">"), "", $text);
            $reg_exUrl = "/(http|https|ftp|ftps|)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
            preg_match_all($reg_exUrl, $text, $matches);
            $usedPatterns = array();
            $context = stream_context_create(array(
            'http' => array(
            'timeout' => 5
            )
            ));
            foreach($matches[0] as $pattern){
                if(!array_key_exists($pattern, $usedPatterns)){
                    $usedPatterns[$pattern]=true;
                    $the_contents = @file_get_contents($pattern, 0, $context);
                    if(substr(trim($pattern), 0,  != "https://"){
                    $color = "#FF0000";
                    }
                    if (empty($the_contents)) {
                    $title = $pattern;
                    } else {
                    preg_match("/<title>(.*)<\/title>/Umis", $the_contents, $title);
                    $title = $title[1];
                    $color = "#00FF00";                    
                    //$title = htmlspecialchars($title, ENT_QUOTES); //saving data to database
                    }                    
                    $text = str_ireplace($pattern, "<a style='font-size: 14px; background-color: #FFFFFF; color: $color;' href='$pattern' rel='nofollow' TARGET='_blank'> $title </a>", $text);
                    
                }
            }
            return $text;
}

$text = "Some sample text with WWW.AOL.com<br />http://www.youtube.com/watch?v=YaxKiZfQcX8 <br />Anyone use www.myspace.com?  <br />Some people are nuts, look at this stargate link at http://www.youtube.com/watch?v=ZKoUm6z5SzU&feature=grec_index , like aliens exist or something. http://www.youtube.com/watch?v=sfN-7HczmOU&feature=grec_index  and here's a secure site https://familyhistory.hhs.gov, unless you use curl or allow secure connections it will never get a title. <br /> This is a not valid site http://zzzzzzz and this is a dead site http://zwzwzwxzw.com.<br /> Lastly lets try an already made hyperlink and see what it does <a href='http://phpfreaks.com'>phpfreaks</a>";
echo titleHyper($text);
?>

Link to comment
https://forums.phpfreaks.com/topic/241028-parsing-urls/#findComment-1238328
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.