Jump to content

regex exclude pattern


random_

Recommended Posts

close. A character class will match any one character (or not, if negated). So your attempt will match 2 characters total. All you have to do is combine [a-fh-z]

 

Alternatively, you can do this: (?!g)[a-z]. This uses a negative lookahead to make sure "g" isn't the next character, then matches any character a-z. So while the character class itself will match for "g", the negative lookahead ensures "g" isn't actually there to match. This *might* be more readable to you.

You do it by establishing word boundaries and quantifying whatever you are matching for.

 

For example, to match a word that does not contain a "g", you do use

 

\b[a-fh-z]+\b
The \b is a word boundary assertion. Wherever the regex pointer is in the string, it looks at the character behind it and the character in front of it. If there is a switch from a "word" character to "non-word" character or visa versa, the word boundary will match. Then you have the original character class that matches any 1 character except "g", and then + is a quantifier, meaning match 1 or more of the preceding character (or character class or group).

 

Here is the same principle using the negative lookahead:

 

\b((?!g)[a-z])+\b
Sidenote: I see that you found \w. Note that this is not the same as [a-z].

 

\w is shorthand and is the equivalent of [a-zA-Z0-9_] which matches any letter (case-insensitive), number and underscore.

 

[a-z] only matches lowercase a through z (unless you were to add a case-insensitive modifier somewhere else), no numbers or underscores.

 

For the purpose of this example it will match a "word" that does not contain a "g", but just note that \w would consider "abc_123_EFG" a "word".

 

Also note that the \b boundary logic works the same way as \w's "word" logic. For example, using this regex: \b[a-z]+\b on "123foobar456" would fail, because the only thing [a-z]+ will match is "foobar", but since \b considers digits to be a "word" character, there is no switch from "word" to "non-word" between "3f" and "r4".

Very much appreciate the explanation. Thanks bro.

Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different...

Very much appreciate the explanation. Thanks bro.

Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different...

No, [^g] will not match the same thing as [a-fh-z].

 

[^g] will match any character that is not a "g". So it will match whitespace, numbers, non-alphanumeric chars, etc.

 

 

But in general, a carat at the beginning signifies a negative character class. It means to match the opposite of what's listed. So for example [a-z] will match any one lower case letter whereas [^a-z] will match any one character that is not a lower case letter.

 

Outside of the character class, ^ does mean "beginning of line" as you said. Well it actually means "beginning of string". If you use the "m" (multi-line mode) modifier then it becomes "beginning of line or string".

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.