Jump to content

regex exclude pattern


random_

Recommended Posts

Hi guys. Is it possible to exclude character, word, number, pattern or anything from regex?

 

e.g. [a-z] but I want to exclude lets say character "g". My guess it can be done like this [a-f][h-z] but it doesn't seem to work...

Any opinion is very much apreciated. Thanks.

Link to comment
Share on other sites

close. A character class will match any one character (or not, if negated). So your attempt will match 2 characters total. All you have to do is combine [a-fh-z]

 

Alternatively, you can do this: (?!g)[a-z]. This uses a negative lookahead to make sure "g" isn't the next character, then matches any character a-z. So while the character class itself will match for "g", the negative lookahead ensures "g" isn't actually there to match. This *might* be more readable to you.

Link to comment
Share on other sites

Thanks .josh. That thing you are sugesting is matching any charachter but what if I want to match only words that doesn't contain "g"?

e.g. words:

house

laguna

window
 

and I want to match only house and window but not la una. Why (?!g)[\w] doens't work here?

Link to comment
Share on other sites

You do it by establishing word boundaries and quantifying whatever you are matching for.

 

For example, to match a word that does not contain a "g", you do use

 

\b[a-fh-z]+\b
The \b is a word boundary assertion. Wherever the regex pointer is in the string, it looks at the character behind it and the character in front of it. If there is a switch from a "word" character to "non-word" character or visa versa, the word boundary will match. Then you have the original character class that matches any 1 character except "g", and then + is a quantifier, meaning match 1 or more of the preceding character (or character class or group).

 

Here is the same principle using the negative lookahead:

 

\b((?!g)[a-z])+\b
Sidenote: I see that you found \w. Note that this is not the same as [a-z].

 

\w is shorthand and is the equivalent of [a-zA-Z0-9_] which matches any letter (case-insensitive), number and underscore.

 

[a-z] only matches lowercase a through z (unless you were to add a case-insensitive modifier somewhere else), no numbers or underscores.

 

For the purpose of this example it will match a "word" that does not contain a "g", but just note that \w would consider "abc_123_EFG" a "word".

 

Also note that the \b boundary logic works the same way as \w's "word" logic. For example, using this regex: \b[a-z]+\b on "123foobar456" would fail, because the only thing [a-z]+ will match is "foobar", but since \b considers digits to be a "word" character, there is no switch from "word" to "non-word" between "3f" and "r4".

Link to comment
Share on other sites

Very much appreciate the explanation. Thanks bro.

Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different...

Edited by random_
Link to comment
Share on other sites

Very much appreciate the explanation. Thanks bro.

Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different...

No, [^g] will not match the same thing as [a-fh-z].

 

[^g] will match any character that is not a "g". So it will match whitespace, numbers, non-alphanumeric chars, etc.

 

 

But in general, a carat at the beginning signifies a negative character class. It means to match the opposite of what's listed. So for example [a-z] will match any one lower case letter whereas [^a-z] will match any one character that is not a lower case letter.

 

Outside of the character class, ^ does mean "beginning of line" as you said. Well it actually means "beginning of string". If you use the "m" (multi-line mode) modifier then it becomes "beginning of line or string".

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.