Jump to content

Regular Expression problem


grum

Recommended Posts

Hi guys,

I hope everyone is well. Regex. Not my strong point. Any pointers appreciated.

[code]$temp = preg_replace("/[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+/ie", "'<a href=\"\\0\" target=\"_blank\">'.urlreduce('\\0').'</a><div class=\"new_window\"> (Opens in new window) </div>\n'", $temp);[/code]

The above code works fine, for an autolinking feature. But, I want it to ignore any URLs preceded by

[code]image=[/code]

I thought I could simply add a condition to the start of my match case, like so:

[code]$temp = preg_replace("/[^image=][a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+/ie", "'<a href=\"\\0\" target=\"_blank\">'.urlreduce('\\0').'</a><div class=\"new_window\"> (Opens in new window) </div>\n'", $temp);[/code]

It doesn't work though - still matches on URLs preceded by image=.

Cheers for your help,
G.


Link to comment
https://forums.phpfreaks.com/topic/5859-regular-expression-problem/
Share on other sites

The [^image=] part won't work because it's not interpreted as "image=" but rather as each individual character (i, m, a, g, e, =).

You need to use a negative lookbehind "(?<!)". Try this expression:

'/(?<!image=)\\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i'

That did the trick, cheers.

Do you mind if I ask more about it?

'/(?<!image=)\\b(https?|ftp|file):\/\/[!--coloro:#FF0000--][span style=\"color:#FF0000\"][!--/coloro--][-A-Z0-9+&@#\/%?=~_|!:,.;][!--colorc--][/span][!--/colorc--]*[!--coloro:#33CC00--][span style=\"color:#33CC00\"][!--/coloro--][-A-Z0-9+&@#\/%=~_|] [!--colorc--][/span][!--/colorc--]/ie'

In psuedo code, is that..
any number of these characters [!--coloro:#FF0000--][span style=\"color:#FF0000\"][!--/coloro--][-A-Z0-9+&@#\/%?=~_|!:,.;] [!--colorc--][/span][!--/colorc--]followed by one of any of these characters [!--coloro:#33CC00--][span style=\"color:#33CC00\"][!--/coloro--][-A-Z0-9+&@#\/%=~_|][!--colorc--][/span][!--/colorc--]?

If so, why?

And I don't really get the word boundary thing. For my own curiosity, I tried your lookbehind statement and the word boundary at the start of my first code snippet. Why does it not work in that scenario?

Thanks again!
G.
I use RegexBuddy to help with formatting regex/PREG. It's well worth the $30 USD cost. See: [a href=\"http://www.regexbuddy.com\" target=\"_blank\"]http://www.regexbuddy.com[/a]

Here's documentation on that regex for you to put in your code for future reference:
[code]
// (?<!image=)\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]
//
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!image=)»
//    Match the characters "image=" literally «image=»
// Assert position at a word boundary «\b»
// Match the regular expression below and capture its match into backreference number 1 «(https?|ftp|file)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «https?»
//       Match the characters "http" literally «http»
//       Match the character "s" literally «s?»
//          Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
//    Or match regular expression number 2 below (attempting the next alternative only if this one fails) «ftp»
//       Match the characters "ftp" literally «ftp»
//    Or match regular expression number 3 below (the entire group fails if this one fails to match) «file»
//       Match the characters "file" literally «file»
// Match the characters "://" literally «://»
// Match a single character present in the list below «[-A-Z0-9+&@#/%?=~_|!:,.;]*»
//    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    The character "-" «-»
//    A character in the range between "A" and "Z" «A-Z»
//    A character in the range between "0" and "9" «0-9»
//    One of the characters "+&@#/%?=~_|!:,.;" «+&@#/%?=~_|!:,.;»
// Match a single character present in the list below «[-A-Z0-9+&@#/%=~_|]»
//    The character "-" «-»
//    A character in the range between "A" and "Z" «A-Z»
//    A character in the range between "0" and "9" «0-9»
//    One of the characters "+&@#/%=~_|" «+&@#/%=~_|»
[/code]

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.