Jump to content

Archived

This topic is now archived and is closed to further replies.

grum

Regular Expression problem

Recommended Posts

Hi guys,

I hope everyone is well. Regex. Not my strong point. Any pointers appreciated.

[code]$temp = preg_replace("/[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+/ie", "'<a href=\"\\0\" target=\"_blank\">'.urlreduce('\\0').'</a><div class=\"new_window\"> (Opens in new window) </div>\n'", $temp);[/code]

The above code works fine, for an autolinking feature. But, I want it to ignore any URLs preceded by

[code]image=[/code]

I thought I could simply add a condition to the start of my match case, like so:

[code]$temp = preg_replace("/[^image=][a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+/ie", "'<a href=\"\\0\" target=\"_blank\">'.urlreduce('\\0').'</a><div class=\"new_window\"> (Opens in new window) </div>\n'", $temp);[/code]

It doesn't work though - still matches on URLs preceded by image=.

Cheers for your help,
G.


Share this post


Link to post
Share on other sites
The [^image=] part won't work because it's not interpreted as "image=" but rather as each individual character (i, m, a, g, e, =).

You need to use a negative lookbehind "(?<!)". Try this expression:

'/(?<!image=)\\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i'

Share this post


Link to post
Share on other sites
That did the trick, cheers.

Do you mind if I ask more about it?

'/(?<!image=)\\b(https?|ftp|file):\/\/[!--coloro:#FF0000--][span style=\"color:#FF0000\"][!--/coloro--][-A-Z0-9+&@#\/%?=~_|!:,.;][!--colorc--][/span][!--/colorc--]*[!--coloro:#33CC00--][span style=\"color:#33CC00\"][!--/coloro--][-A-Z0-9+&@#\/%=~_|] [!--colorc--][/span][!--/colorc--]/ie'

In psuedo code, is that..
any number of these characters [!--coloro:#FF0000--][span style=\"color:#FF0000\"][!--/coloro--][-A-Z0-9+&@#\/%?=~_|!:,.;] [!--colorc--][/span][!--/colorc--]followed by one of any of these characters [!--coloro:#33CC00--][span style=\"color:#33CC00\"][!--/coloro--][-A-Z0-9+&@#\/%=~_|][!--colorc--][/span][!--/colorc--]?

If so, why?

And I don't really get the word boundary thing. For my own curiosity, I tried your lookbehind statement and the word boundary at the start of my first code snippet. Why does it not work in that scenario?

Thanks again!
G.

Share this post


Link to post
Share on other sites
I use RegexBuddy to help with formatting regex/PREG. It's well worth the $30 USD cost. See: [a href=\"http://www.regexbuddy.com\" target=\"_blank\"]http://www.regexbuddy.com[/a]

Here's documentation on that regex for you to put in your code for future reference:
[code]
// (?<!image=)\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]
//
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!image=)»
//    Match the characters "image=" literally «image=»
// Assert position at a word boundary «\b»
// Match the regular expression below and capture its match into backreference number 1 «(https?|ftp|file)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «https?»
//       Match the characters "http" literally «http»
//       Match the character "s" literally «s?»
//          Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
//    Or match regular expression number 2 below (attempting the next alternative only if this one fails) «ftp»
//       Match the characters "ftp" literally «ftp»
//    Or match regular expression number 3 below (the entire group fails if this one fails to match) «file»
//       Match the characters "file" literally «file»
// Match the characters "://" literally «://»
// Match a single character present in the list below «[-A-Z0-9+&@#/%?=~_|!:,.;]*»
//    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    The character "-" «-»
//    A character in the range between "A" and "Z" «A-Z»
//    A character in the range between "0" and "9" «0-9»
//    One of the characters "+&@#/%?=~_|!:,.;" «+&@#/%?=~_|!:,.;»
// Match a single character present in the list below «[-A-Z0-9+&@#/%=~_|]»
//    The character "-" «-»
//    A character in the range between "A" and "Z" «A-Z»
//    A character in the range between "0" and "9" «0-9»
//    One of the characters "+&@#/%=~_|" «+&@#/%=~_|»
[/code]

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.