regex exclude pattern

random_ · January 25, 2014

Hi guys. Is it possible to exclude character, word, number, pattern or anything from regex?

e.g. [a-z] but I want to exclude lets say character "g". My guess it can be done like this [a-f][h-z] but it doesn't seem to work...

Any opinion is very much apreciated. Thanks.

.josh · January 25, 2014

close. A character class will match any one character (or not, if negated). So your attempt will match 2 characters total. All you have to do is combine [a-fh-z]

Alternatively, you can do this: (?!g)[a-z]. This uses a negative lookahead to make sure "g" isn't the next character, then matches any character a-z. So while the character class itself will match for "g", the negative lookahead ensures "g" isn't actually there to match. This *might* be more readable to you.

random_ · January 26, 2014

Thanks .josh. That thing you are sugesting is matching any charachter but what if I want to match only words that doesn't contain "g"?

e.g. words:

house

laguna

window

and I want to match only house and window but not la una. Why (?!g)[\w] doens't work here?

.josh · January 26, 2014

You do it by establishing word boundaries and quantifying whatever you are matching for.

For example, to match a word that does not contain a "g", you do use

\b[a-fh-z]+\b

The \b is a word boundary assertion. Wherever the regex pointer is in the string, it looks at the character behind it and the character in front of it. If there is a switch from a "word" character to "non-word" character or visa versa, the word boundary will match. Then you have the original character class that matches any 1 character except "g", and then + is a quantifier, meaning match 1 or more of the preceding character (or character class or group).

Here is the same principle using the negative lookahead:

\b((?!g)[a-z])+\b

Sidenote: I see that you found \w. Note that this is not the same as [a-z].

\w is shorthand and is the equivalent of [a-zA-Z0-9_] which matches any letter (case-insensitive), number and underscore.

[a-z] only matches lowercase a through z (unless you were to add a case-insensitive modifier somewhere else), no numbers or underscores.

For the purpose of this example it will match a "word" that does not contain a "g", but just note that \w would consider "abc_123_EFG" a "word".

Also note that the \b boundary logic works the same way as \w's "word" logic. For example, using this regex: \b[a-z]+\b on "123foobar456" would fail, because the only thing [a-z]+ will match is "foobar", but since \b considers digits to be a "word" character, there is no switch from "word" to "non-word" between "3f" and "r4".

random_ · January 26, 2014

Very much appreciate the explanation. Thanks bro.

Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different...

.josh · January 26, 2014

Very much appreciate the explanation. Thanks bro.

Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different...

No, [^g] will not match the same thing as [a-fh-z].

[^g] will match any character that is not a "g". So it will match whitespace, numbers, non-alphanumeric chars, etc.

But in general, a carat at the beginning signifies a negative character class. It means to match the opposite of what's listed. So for example [a-z] will match any one lower case letter whereas [^a-z] will match any one character that is not a lower case letter.

Outside of the character class, ^ does mean "beginning of line" as you said. Well it actually means "beginning of string". If you use the "m" (multi-line mode) modifier then it becomes "beginning of line or string".

Sign In

regex exclude pattern

Recommended Posts

random_

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

random_

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

random_

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information