Jump to content

How to convert links to anchor tags ?


Manixat

Recommended Posts

Hello,

 

I'm currently trying to turn links that users post into anchor tags that actually redirect, and I want to ask you which is the best way of doing that?

 

What I think of doing is explode the article on spaces and find like "http://" or "https://" in each word ( element of the array ) and then convert it to an anchor tag, but if there is a really big article or something I think this is not the proper way of achieving what I seek here?

Link to comment
https://forums.phpfreaks.com/topic/273066-how-to-convert-links-to-anchor-tags/
Share on other sites

Alrighty, I've come up with some regex but I need confirmation it is correctly written before I code it into the website

 

"/https?:\/\/.*/i"

 

seems to work fine on my test server but what is bothering me is that .* is supposed to match eveything except new lines right? Why isn't it matching the space and everything after it where the link ends?

It does:

php > $string = "This is a string with\nsome text and a http://link.com/url.php link to an article\nwhich contains information you want.";
php > preg_match ("/https?:\/\/.*/i", $string, $matches);
php > var_dump ($matches);
array(1) {
 [0]=>
 string(42) "http://link.com/url.php link to an article"
}

Though, why you'd want to do match everything after the link as well I don't know.

 

Show us your code, not just what you think is the problem, then we can tell you what's wrong.

I did some looking around the net, to see if I could find a better RegExp than what I was using already, and came across this article:

http://www.devshed.com/c/a/PHP/PHP-URL-Validation-Functions/

 

It's a bit old (nearly 2 years by now), but it did have a nice list of valid and invalid domains. Plus some of the most cited RegExps for URL validation. However, when I looked at the results they were rather depressing; None of them passed perfectly, and none were better than my own.

First option came close to mine, as it had 18,66% failure overall: 4 slippage (14.8%) and 11 overjudgements (27,5%).

 

Anyway, close is not good enough, so I decided to fix it: ;)

 $RegExp = '#^(??:(?:f|ht)tps?|dchub|sftp|steam)://)?'.
// Username-password combos.
'(?:\\w+(?::\\w+)?@)?'.

// Domain or IP address
'(?(?:[\\w\\pL][\\w\\pL-]*(?<!\\-)\\.)+[a-z\\pL]{2,5})(?::\\d{1,5})?'.
'|(??:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))'.

// URL Path
'((??<!/)/(?:\w(?:%[a-f\\d]{2}|[\\w\\., -])*)*)+(??:\\.\\w{1,6})?'.
// URL param
'(\\?(??:%[a-f\\d]{2}|[\\w\\.-])+=(?:%[a-f\\d]{2}|[\\w\\.-])+)(?:&(?:%[a-f\\d]{2}|[\\w\\.-])+(?:=(?:%[a-f\\d]{2}|[\\w\\.-])+)?)*&?)?'.
')?)?\\z#ui';
This passes with a 0% failure rate: 0 slippage and 0 overjudgements. :)

 

To make it extract URLs from a larger text, and not just validate complete URLs, just remove the anchors (^ and \\z).

 

Added:

The one you found was rather abysmal, with 16 overjudgements (40%) and 14 slippages (51.9%). Making for an overall failure rate of 45,9%.

  • 4 weeks later...

Everything before the ? is the path, as per the definition of "path":

5. computing  the directions for reaching a particular file or directory, as traced hierarchically through each of the parent directories usually from the root; the file or directory and all parent directories are separated from one another in the path by slashes

 

Everything after the ? is the parameters, as per the definition of "parameter":

3. Computers. a variable that must be given a specific value during the execution of a program or of a procedure within a program.

The act of requesting a web page via HTTP being the execution of the procedure, in this case.

 

http://dictionary.reference.com/browse/path

http://dictionary.reference.com/browse/parameter

Hmm.. You just pointed out something that I've missed in mine, bookmarks.

 

That said, there isn't actually a path in that first URL. In the second only  the slash after "google.com" is a part of the path.

If we take the first link you posted, you have the protocol (http://), domain (www.google.com). In the second you have the above, plus the path (/), and then the parameters. (Though Google is using the bookmark identifier, so I assume that the parameters are handled by JS and not the server)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.