random_ Posted January 25, 2014 Share Posted January 25, 2014 Hi guys. Is it possible to exclude character, word, number, pattern or anything from regex? e.g. [a-z] but I want to exclude lets say character "g". My guess it can be done like this [a-f][h-z] but it doesn't seem to work... Any opinion is very much apreciated. Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/285671-regex-exclude-pattern/ Share on other sites More sharing options...
.josh Posted January 25, 2014 Share Posted January 25, 2014 close. A character class will match any one character (or not, if negated). So your attempt will match 2 characters total. All you have to do is combine [a-fh-z] Alternatively, you can do this: (?!g)[a-z]. This uses a negative lookahead to make sure "g" isn't the next character, then matches any character a-z. So while the character class itself will match for "g", the negative lookahead ensures "g" isn't actually there to match. This *might* be more readable to you. Quote Link to comment https://forums.phpfreaks.com/topic/285671-regex-exclude-pattern/#findComment-1466547 Share on other sites More sharing options...
random_ Posted January 26, 2014 Author Share Posted January 26, 2014 Thanks .josh. That thing you are sugesting is matching any charachter but what if I want to match only words that doesn't contain "g"? e.g. words: house laguna window and I want to match only house and window but not la una. Why (?!g)[\w] doens't work here? Quote Link to comment https://forums.phpfreaks.com/topic/285671-regex-exclude-pattern/#findComment-1466604 Share on other sites More sharing options...
.josh Posted January 26, 2014 Share Posted January 26, 2014 You do it by establishing word boundaries and quantifying whatever you are matching for. For example, to match a word that does not contain a "g", you do use \b[a-fh-z]+\b The \b is a word boundary assertion. Wherever the regex pointer is in the string, it looks at the character behind it and the character in front of it. If there is a switch from a "word" character to "non-word" character or visa versa, the word boundary will match. Then you have the original character class that matches any 1 character except "g", and then + is a quantifier, meaning match 1 or more of the preceding character (or character class or group). Here is the same principle using the negative lookahead: \b((?!g)[a-z])+\b Sidenote: I see that you found \w. Note that this is not the same as [a-z]. \w is shorthand and is the equivalent of [a-zA-Z0-9_] which matches any letter (case-insensitive), number and underscore. [a-z] only matches lowercase a through z (unless you were to add a case-insensitive modifier somewhere else), no numbers or underscores. For the purpose of this example it will match a "word" that does not contain a "g", but just note that \w would consider "abc_123_EFG" a "word". Also note that the \b boundary logic works the same way as \w's "word" logic. For example, using this regex: \b[a-z]+\b on "123foobar456" would fail, because the only thing [a-z]+ will match is "foobar", but since \b considers digits to be a "word" character, there is no switch from "word" to "non-word" between "3f" and "r4". Quote Link to comment https://forums.phpfreaks.com/topic/285671-regex-exclude-pattern/#findComment-1466649 Share on other sites More sharing options...
random_ Posted January 26, 2014 Author Share Posted January 26, 2014 (edited) Very much appreciate the explanation. Thanks bro. Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different... Edited January 26, 2014 by random_ Quote Link to comment https://forums.phpfreaks.com/topic/285671-regex-exclude-pattern/#findComment-1466666 Share on other sites More sharing options...
.josh Posted January 26, 2014 Share Posted January 26, 2014 Very much appreciate the explanation. Thanks bro. Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different... No, [^g] will not match the same thing as [a-fh-z]. [^g] will match any character that is not a "g". So it will match whitespace, numbers, non-alphanumeric chars, etc. But in general, a carat at the beginning signifies a negative character class. It means to match the opposite of what's listed. So for example [a-z] will match any one lower case letter whereas [^a-z] will match any one character that is not a lower case letter. Outside of the character class, ^ does mean "beginning of line" as you said. Well it actually means "beginning of string". If you use the "m" (multi-line mode) modifier then it becomes "beginning of line or string". Quote Link to comment https://forums.phpfreaks.com/topic/285671-regex-exclude-pattern/#findComment-1466667 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.