How to convert links to anchor tags ?

Manixat · January 12, 2013

Hello,

I'm currently trying to turn links that users post into anchor tags that actually redirect, and I want to ask you which is the best way of doing that?

What I think of doing is explode the article on spaces and find like "http://" or "https://" in each word ( element of the array ) and then convert it to an anchor tag, but if there is a really big article or something I think this is not the proper way of achieving what I seek here?

scootstah · January 12, 2013

That is going to be horribly inefficient.

Use Regular Expressions.

Manixat · January 12, 2013

Alrighty, I've come up with some regex but I need confirmation it is correctly written before I code it into the website

"/https?:\/\/.*/i"

seems to work fine on my test server but what is bothering me is that .* is supposed to match eveything except new lines right? Why isn't it matching the space and everything after it where the link ends?

Christian F. · January 12, 2013

It does:

php > $string = "This is a string with\nsome text and a http://link.com/url.php link to an article\nwhich contains information you want.";
php > preg_match ("/https?:\/\/.*/i", $string, $matches);
php > var_dump ($matches);
array(1) {
 [0]=>
 string(42) "http://link.com/url.php link to an article"
}

Though, why you'd want to do match everything after the link as well I don't know.

Show us your code, not just what you think is the problem, then we can tell you what's wrong.

Manixat · January 12, 2013

I don't want to match everything after the link, it is what I'm concerned about, how do I find the end of the link?

EDIT: apparently in my articles all links are followed by a new line and that's why I'm not experiencing any issues for now.

Manixat · January 12, 2013

Update:

I found a regex that works quite well and the whole url can be returned really easily by using $1

"|(([A-Za-z]{3,9})://([-;:&=\+\$,\w]+@{1})?([-A-Za-z0-9\.]+)+:?(\d+)?((/[-\+~%/\.\w]+)?\??([-\+=&;%@\.\w]+)?#?([\w]+)?)?)|"

Christian F. · January 12, 2013

I did some looking around the net, to see if I could find a better RegExp than what I was using already, and came across this article:

http://www.devshed.com/c/a/PHP/PHP-URL-Validation-Functions/

It's a bit old (nearly 2 years by now), but it did have a nice list of valid and invalid domains. Plus some of the most cited RegExps for URL validation. However, when I looked at the results they were rather depressing; None of them passed perfectly, and none were better than my own.

First option came close to mine, as it had 18,66% failure overall: 4 slippage (14.8%) and 11 overjudgements (27,5%).

Anyway, close is not good enough, so I decided to fix it:

 $RegExp = '#^(??:(?:f|ht)tps?|dchub|sftp|steam)://)?'.
// Username-password combos.
'(?:\\w+(?::\\w+)?@)?'.

// Domain or IP address
'(?(?:[\\w\\pL][\\w\\pL-]*(?<!\\-)\\.)+[a-z\\pL]{2,5})(?::\\d{1,5})?'.
'|(??:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))'.

// URL Path
'((??<!/)/(?:\w(?:%[a-f\\d]{2}|[\\w\\., -])*)*)+(??:\\.\\w{1,6})?'.
// URL param
'(\\?(??:%[a-f\\d]{2}|[\\w\\.-])+=(?:%[a-f\\d]{2}|[\\w\\.-])+)(?:&(?:%[a-f\\d]{2}|[\\w\\.-])+(?:=(?:%[a-f\\d]{2}|[\\w\\.-])+)?)*&?)?'.
')?)?\\z#ui';

This passes with a 0% failure rate: 0 slippage and 0 overjudgements.

To make it extract URLs from a larger text, and not just validate complete URLs, just remove the anchors (^ and \\z).

Added:

The one you found was rather abysmal, with 16 overjudgements (40%) and 14 slippages (51.9%). Making for an overall failure rate of 45,9%.

sid0972 · February 9, 2013

@christian f

can you tell me the difference between URL path and URL param??

does URL parameters include GET values??

Christian F. · February 10, 2013

Everything before the ? is the path, as per the definition of "path":

5. computing the directions for reaching a particular file or directory, as traced hierarchically through each of the parent directories usually from the root; the file or directory and all parent directories are separated from one another in the path by slashes

Everything after the ? is the parameters, as per the definition of "parameter":

3. Computers. a variable that must be given a specific value during the execution of a program or of a procedure within a program.

The act of requesting a web page via HTTP being the execution of the procedure, in this case.

http://dictionary.reference.com/browse/path

http://dictionary.reference.com/browse/parameter

sid0972 · February 10, 2013

so this is a path

http://www.google.com

and everything after google.com is a param

https://www.google.c...iw=1855&bih=968

right?

Christian F. · February 10, 2013

Hmm.. You just pointed out something that I've missed in mine, bookmarks.

That said, there isn't actually a path in that first URL. In the second only the slash after "google.com" is a part of the path.

If we take the first link you posted, you have the protocol (http://), domain (www.google.com). In the second you have the above, plus the path (/), and then the parameters. (Though Google is using the bookmark identifier, so I assume that the parameters are handled by JS and not the server)

Zane · February 10, 2013

Here ya go.

http://www.sitepoint.com/forums/showthread.php?530093-Regex-Help&p=3713338#post3713338

Sign In

How to convert links to anchor tags ?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information