Jump to content

Match URL unless it's a specific one


Chivalry

Recommended Posts

I have this regex that works for me to match web page URLs:

 

((https?://)([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)

 

Is there an edit I can make to this that will match only if a specific URL pattern is not the match? For example, if the above matches "http://mycompany.com/folder/123456.html". All the URLs I want to include are like that, having a pattern of "http://mycompany.com/folder/\d{4,6}\.html". I need a pattern that matches URLs only if they don't also match this pattern.

 

Is this possible.

 

Thanks,

Chuck

Link to comment
Share on other sites

My question is, why all the captures?

You can use ?: inside capures that won't be needed like so: (?: ... ). As a side note could you not match a non-specific 'domain format' and parse that using parse_url and go from there instead? (that pattern is flaky looking).

 

(Note to CV: This time I made use of [m] ... [/m] for manual link references lol). See? no more doing things the hard way ;)

Link to comment
Share on other sites

I didn't write this pattern, and I understand a little more than half of it, but I've tested it on sample URLs and it seems to work. Most of the captures were inserted by the original author, although I added one or two so that I could better understand what was going on. I'm still not completely clear.

 

I looked up the documentation on parse_url, and I don't think it's useful to me. Here's some more details about what I'm trying to do, which may prove helpful.

 

I have a block of text, including numerous paragraphs, that may contain multiple URLs. I need to transform those URLs into links. For example, if it finds

http://domain.com/index.php

this could be translated into

<a href="http://domain.com/index.php">http://domain.com/index.php</a>

However, if the URL is of a specific URL, the translation should be something different, so that

http://specialdomain.com/specialpage.html

should be transated into

<a href="http://differentdomain.com/folder/123456.html">http://specialdomain.com/specialpage.html</a>

 

I hope this is clear. Perhaps I'm tackling this in the wrong manner. Suggestions to another path would also be appreciated, but as far as I can tell, I need a pattern that says, "match X unless what is matched also matches Y."

 

Thanks,

Chuck

Link to comment
Share on other sites

So my line of thinking is perhaps using arrays as search and replace. (untested)

 

$str = '...' // Your block of text
$search = array('#((?:https?://|www\.)specialdomain\.com[^ ]+)#', '#((?:https?://|www\.)[^ ]+)#'); // quick and dirty url detection
$replace = array('<a href="http://www.whatever.com">$1</a>', '<a href="$1">$1</a>'); // first key is replacement for specialdomain.com, otherwise, replace with second key
$str = preg_replace($search, $replace, $str);

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.