Jump to content

Regular Expression problem


grum

Recommended Posts

Hi guys,

I hope everyone is well. Regex. Not my strong point. Any pointers appreciated.

[code]$temp = preg_replace("/[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+/ie", "'<a href=\"\\0\" target=\"_blank\">'.urlreduce('\\0').'</a><div class=\"new_window\"> (Opens in new window) </div>\n'", $temp);[/code]

The above code works fine, for an autolinking feature. But, I want it to ignore any URLs preceded by

[code]image=[/code]

I thought I could simply add a condition to the start of my match case, like so:

[code]$temp = preg_replace("/[^image=][a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+/ie", "'<a href=\"\\0\" target=\"_blank\">'.urlreduce('\\0').'</a><div class=\"new_window\"> (Opens in new window) </div>\n'", $temp);[/code]

It doesn't work though - still matches on URLs preceded by image=.

Cheers for your help,
G.


Link to comment
Share on other sites

The [^image=] part won't work because it's not interpreted as "image=" but rather as each individual character (i, m, a, g, e, =).

You need to use a negative lookbehind "(?<!)". Try this expression:

'/(?<!image=)\\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i'

Link to comment
Share on other sites

That did the trick, cheers.

Do you mind if I ask more about it?

'/(?<!image=)\\b(https?|ftp|file):\/\/[!--coloro:#FF0000--][span style=\"color:#FF0000\"][!--/coloro--][-A-Z0-9+&@#\/%?=~_|!:,.;][!--colorc--][/span][!--/colorc--]*[!--coloro:#33CC00--][span style=\"color:#33CC00\"][!--/coloro--][-A-Z0-9+&@#\/%=~_|] [!--colorc--][/span][!--/colorc--]/ie'

In psuedo code, is that..
any number of these characters [!--coloro:#FF0000--][span style=\"color:#FF0000\"][!--/coloro--][-A-Z0-9+&@#\/%?=~_|!:,.;] [!--colorc--][/span][!--/colorc--]followed by one of any of these characters [!--coloro:#33CC00--][span style=\"color:#33CC00\"][!--/coloro--][-A-Z0-9+&@#\/%=~_|][!--colorc--][/span][!--/colorc--]?

If so, why?

And I don't really get the word boundary thing. For my own curiosity, I tried your lookbehind statement and the word boundary at the start of my first code snippet. Why does it not work in that scenario?

Thanks again!
G.
Link to comment
Share on other sites

I use RegexBuddy to help with formatting regex/PREG. It's well worth the $30 USD cost. See: [a href=\"http://www.regexbuddy.com\" target=\"_blank\"]http://www.regexbuddy.com[/a]

Here's documentation on that regex for you to put in your code for future reference:
[code]
// (?<!image=)\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]
//
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!image=)»
//    Match the characters "image=" literally «image=»
// Assert position at a word boundary «\b»
// Match the regular expression below and capture its match into backreference number 1 «(https?|ftp|file)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «https?»
//       Match the characters "http" literally «http»
//       Match the character "s" literally «s?»
//          Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
//    Or match regular expression number 2 below (attempting the next alternative only if this one fails) «ftp»
//       Match the characters "ftp" literally «ftp»
//    Or match regular expression number 3 below (the entire group fails if this one fails to match) «file»
//       Match the characters "file" literally «file»
// Match the characters "://" literally «://»
// Match a single character present in the list below «[-A-Z0-9+&@#/%?=~_|!:,.;]*»
//    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    The character "-" «-»
//    A character in the range between "A" and "Z" «A-Z»
//    A character in the range between "0" and "9" «0-9»
//    One of the characters "+&@#/%?=~_|!:,.;" «+&@#/%?=~_|!:,.;»
// Match a single character present in the list below «[-A-Z0-9+&@#/%=~_|]»
//    The character "-" «-»
//    A character in the range between "A" and "Z" «A-Z»
//    A character in the range between "0" and "9" «0-9»
//    One of the characters "+&@#/%=~_|" «+&@#/%=~_|»
[/code]
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.