Jump to content


Photo

Regular Expression problem


  • Please log in to reply
3 replies to this topic

#1 grum

grum
  • New Members
  • Pip
  • Newbie
  • 2 posts
  • LocationEdinburgh, Scotland

Posted 26 March 2006 - 05:18 PM

Hi guys,

I hope everyone is well. Regex. Not my strong point. Any pointers appreciated.

$temp = preg_replace("/[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+/ie", "'<a href=\"\\0\" target=\"_blank\">'.urlreduce('\\0').'</a><div class=\"new_window\"> (Opens in new window) </div>\n'", $temp);

The above code works fine, for an autolinking feature. But, I want it to ignore any URLs preceded by

image=

I thought I could simply add a condition to the start of my match case, like so:

$temp = preg_replace("/[^image=][a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+/ie", "'<a href=\"\\0\" target=\"_blank\">'.urlreduce('\\0').'</a><div class=\"new_window\"> (Opens in new window) </div>\n'", $temp);

It doesn't work though - still matches on URLs preceded by image=.

Cheers for your help,
G.




#2 toplay

toplay
  • Staff Alumni
  • Advanced Member
  • 973 posts

Posted 26 March 2006 - 06:03 PM

The [^image=] part won't work because it's not interpreted as "image=" but rather as each individual character (i, m, a, g, e, =).

You need to use a negative lookbehind "(?<!)". Try this expression:

'/(?<!image=)\\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i'



#3 grum

grum
  • New Members
  • Pip
  • Newbie
  • 2 posts
  • LocationEdinburgh, Scotland

Posted 26 March 2006 - 06:44 PM

That did the trick, cheers.

Do you mind if I ask more about it?

'/(?<!image=)\\b(https?|ftp|file):\/\/[!--coloro:#FF0000--][span style=\"color:#FF0000\"][!--/coloro--][-A-Z0-9+&@#\/%?=~_|!:,.;][!--colorc--][/span][!--/colorc--]*[!--coloro:#33CC00--][span style=\"color:#33CC00\"][!--/coloro--][-A-Z0-9+&@#\/%=~_|] [!--colorc--][/span][!--/colorc--]/ie'

In psuedo code, is that..
any number of these characters [!--coloro:#FF0000--][span style=\"color:#FF0000\"][!--/coloro--][-A-Z0-9+&@#\/%?=~_|!:,.;] [!--colorc--][/span][!--/colorc--]followed by one of any of these characters [!--coloro:#33CC00--][span style=\"color:#33CC00\"][!--/coloro--][-A-Z0-9+&@#\/%=~_|][!--colorc--][/span][!--/colorc--]?

If so, why?

And I don't really get the word boundary thing. For my own curiosity, I tried your lookbehind statement and the word boundary at the start of my first code snippet. Why does it not work in that scenario?

Thanks again!
G.

#4 toplay

toplay
  • Staff Alumni
  • Advanced Member
  • 973 posts

Posted 26 March 2006 - 07:07 PM

I use RegexBuddy to help with formatting regex/PREG. It's well worth the $30 USD cost. See: [a href=\"http://www.regexbuddy.com\" target=\"_blank\"]http://www.regexbuddy.com[/a]

Here's documentation on that regex for you to put in your code for future reference:
// (?<!image=)\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]
// 
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!image=)»
//    Match the characters "image=" literally «image=»
// Assert position at a word boundary «\b»
// Match the regular expression below and capture its match into backreference number 1 «(https?|ftp|file)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «https?»
//       Match the characters "http" literally «http»
//       Match the character "s" literally «s?»
//          Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
//    Or match regular expression number 2 below (attempting the next alternative only if this one fails) «ftp»
//       Match the characters "ftp" literally «ftp»
//    Or match regular expression number 3 below (the entire group fails if this one fails to match) «file»
//       Match the characters "file" literally «file»
// Match the characters "://" literally «://»
// Match a single character present in the list below «[-A-Z0-9+&@#/%?=~_|!:,.;]*»
//    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    The character "-" «-»
//    A character in the range between "A" and "Z" «A-Z»
//    A character in the range between "0" and "9" «0-9»
//    One of the characters "+&@#/%?=~_|!:,.;" «+&@#/%?=~_|!:,.;»
// Match a single character present in the list below «[-A-Z0-9+&@#/%=~_|]»
//    The character "-" «-»
//    A character in the range between "A" and "Z" «A-Z»
//    A character in the range between "0" and "9" «0-9»
//    One of the characters "+&@#/%=~_|" «+&@#/%=~_|»





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users