Jump to content

Recommended Posts

Hello,

 

I'm currently trying to turn links that users post into anchor tags that actually redirect, and I want to ask you which is the best way of doing that?

 

What I think of doing is explode the article on spaces and find like "http://" or "https://" in each word ( element of the array ) and then convert it to an anchor tag, but if there is a really big article or something I think this is not the proper way of achieving what I seek here?

Edited by Manixat
Link to comment
https://forums.phpfreaks.com/topic/273066-how-to-convert-links-to-anchor-tags/
Share on other sites

Alrighty, I've come up with some regex but I need confirmation it is correctly written before I code it into the website

 

"/https?:\/\/.*/i"

 

seems to work fine on my test server but what is bothering me is that .* is supposed to match eveything except new lines right? Why isn't it matching the space and everything after it where the link ends?

It does:

php > $string = "This is a string with\nsome text and a http://link.com/url.php link to an article\nwhich contains information you want.";
php > preg_match ("/https?:\/\/.*/i", $string, $matches);
php > var_dump ($matches);
array(1) {
 [0]=>
 string(42) "http://link.com/url.php link to an article"
}

Though, why you'd want to do match everything after the link as well I don't know.

 

Show us your code, not just what you think is the problem, then we can tell you what's wrong.

I don't want to match everything after the link, it is what I'm concerned about, how do I find the end of the link?

 

EDIT: apparently in my articles all links are followed by a new line and that's why I'm not experiencing any issues for now.

Edited by Manixat

I did some looking around the net, to see if I could find a better RegExp than what I was using already, and came across this article:

http://www.devshed.com/c/a/PHP/PHP-URL-Validation-Functions/

 

It's a bit old (nearly 2 years by now), but it did have a nice list of valid and invalid domains. Plus some of the most cited RegExps for URL validation. However, when I looked at the results they were rather depressing; None of them passed perfectly, and none were better than my own.

First option came close to mine, as it had 18,66% failure overall: 4 slippage (14.8%) and 11 overjudgements (27,5%).

 

Anyway, close is not good enough, so I decided to fix it: ;)

 $RegExp = '#^(??:(?:f|ht)tps?|dchub|sftp|steam)://)?'.
// Username-password combos.
'(?:\\w+(?::\\w+)?@)?'.

// Domain or IP address
'(?(?:[\\w\\pL][\\w\\pL-]*(?<!\\-)\\.)+[a-z\\pL]{2,5})(?::\\d{1,5})?'.
'|(??:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))'.

// URL Path
'((??<!/)/(?:\w(?:%[a-f\\d]{2}|[\\w\\., -])*)*)+(??:\\.\\w{1,6})?'.
// URL param
'(\\?(??:%[a-f\\d]{2}|[\\w\\.-])+=(?:%[a-f\\d]{2}|[\\w\\.-])+)(?:&(?:%[a-f\\d]{2}|[\\w\\.-])+(?:=(?:%[a-f\\d]{2}|[\\w\\.-])+)?)*&?)?'.
')?)?\\z#ui';
This passes with a 0% failure rate: 0 slippage and 0 overjudgements. :)

 

To make it extract URLs from a larger text, and not just validate complete URLs, just remove the anchors (^ and \\z).

 

Added:

The one you found was rather abysmal, with 16 overjudgements (40%) and 14 slippages (51.9%). Making for an overall failure rate of 45,9%.

Edited by Philip
  • 4 weeks later...

Everything before the ? is the path, as per the definition of "path":

5. computing  the directions for reaching a particular file or directory, as traced hierarchically through each of the parent directories usually from the root; the file or directory and all parent directories are separated from one another in the path by slashes

 

Everything after the ? is the parameters, as per the definition of "parameter":

3. Computers. a variable that must be given a specific value during the execution of a program or of a procedure within a program.

The act of requesting a web page via HTTP being the execution of the procedure, in this case.

 

http://dictionary.reference.com/browse/path

http://dictionary.reference.com/browse/parameter

Hmm.. You just pointed out something that I've missed in mine, bookmarks.

 

That said, there isn't actually a path in that first URL. In the second only  the slash after "google.com" is a part of the path.

If we take the first link you posted, you have the protocol (http://), domain (www.google.com). In the second you have the above, plus the path (/), and then the parameters. (Though Google is using the bookmark identifier, so I assume that the parameters are handled by JS and not the server)

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.