Jump to content

Match URL unless it's a specific one


Chivalry

Recommended Posts

I have this regex that works for me to match web page URLs:

 

((https?://)([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)

 

Is there an edit I can make to this that will match only if a specific URL pattern is not the match? For example, if the above matches "http://mycompany.com/folder/123456.html". All the URLs I want to include are like that, having a pattern of "http://mycompany.com/folder/\d{4,6}\.html". I need a pattern that matches URLs only if they don't also match this pattern.

 

Is this possible.

 

Thanks,

Chuck

Link to comment
https://forums.phpfreaks.com/topic/144825-match-url-unless-its-a-specific-one/
Share on other sites

My question is, why all the captures?

You can use ?: inside capures that won't be needed like so: (?: ... ). As a side note could you not match a non-specific 'domain format' and parse that using parse_url and go from there instead? (that pattern is flaky looking).

 

(Note to CV: This time I made use of [m] ... [/m] for manual link references lol). See? no more doing things the hard way ;)

I didn't write this pattern, and I understand a little more than half of it, but I've tested it on sample URLs and it seems to work. Most of the captures were inserted by the original author, although I added one or two so that I could better understand what was going on. I'm still not completely clear.

 

I looked up the documentation on parse_url, and I don't think it's useful to me. Here's some more details about what I'm trying to do, which may prove helpful.

 

I have a block of text, including numerous paragraphs, that may contain multiple URLs. I need to transform those URLs into links. For example, if it finds

http://domain.com/index.php

this could be translated into

<a href="http://domain.com/index.php">http://domain.com/index.php</a>

However, if the URL is of a specific URL, the translation should be something different, so that

http://specialdomain.com/specialpage.html

should be transated into

<a href="http://differentdomain.com/folder/123456.html">http://specialdomain.com/specialpage.html</a>

 

I hope this is clear. Perhaps I'm tackling this in the wrong manner. Suggestions to another path would also be appreciated, but as far as I can tell, I need a pattern that says, "match X unless what is matched also matches Y."

 

Thanks,

Chuck

So my line of thinking is perhaps using arrays as search and replace. (untested)

 

$str = '...' // Your block of text
$search = array('#((?:https?://|www\.)specialdomain\.com[^ ]+)#', '#((?:https?://|www\.)[^ ]+)#'); // quick and dirty url detection
$replace = array('<a href="http://www.whatever.com">$1</a>', '<a href="$1">$1</a>'); // first key is replacement for specialdomain.com, otherwise, replace with second key
$str = preg_replace($search, $replace, $str);

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.