simple regex..appreciate any help

Anti-Moronic · March 14, 2009

This must be simple but regex isn't my strongpoint.

I'm come up with this so far:

$title = "two by three is one";

$title = preg_replace("#by\s\w{1,100}\S#", "", $title);

OUTPUT: "two is one"

$title2 = "one is two by three";

$title2 = preg_replace("#by\s\w{1,100}\S#", "", $title2);

OUTPUT: "one is two"

...

That seems fairly simple, it is doing what I told it to do, but I'm *trying* to tell it to only remove "by *" IF it is at the end, and not anywhere else in a string.

Can anyone help please? Thanks for any help.

Anti-Moronic · March 14, 2009

Oh yeh, and is there a way to replace the limit {1,100} with a wildcard?

.josh · March 14, 2009

preg_replace("~\s?by\s?\w+$~i", "", $title);

Anti-Moronic · March 14, 2009

preg_replace("~\s?by\s?\w+$~i", "", $title);

Thank you very much, that worked perfectly. Of course, I simply replace \S with +$ in mine. is there a reason for the ~ delimiters and the altered expression?

I wouldn't normally ask, but I want to learn this stuff and I am always to listening to the best

Thanks any way.

.josh · March 14, 2009

~\s?by\s?\w+$~i

1) no reason for ~ over # except that's my delimiter of choice.

2) the \s at the beginning is to account for the space before by, in your string. Otherwise there will be an extra space. It will return for example, "one is two " The ? is more of a precaution than anything, for that space. If your data somehow ends up looking like this: "one is twoby three", the pattern will not work, because it expects to match the preceding space. If you know for sure that that won't happen, then you can remove that first ?.

3) The same can be said for the 2nd ? as well. the pattern expects a space to be between the y and next word character. If for some reason its not there, the pattern won't match. If you know that this will not ever come up, you can remove the 2nd ? as well.

4) \w+ The + replaces the {1,100} it is the wildcard you asked for. It means "one or more of the previous thing". You can use * instead of + to do zero or more word characters. If you want to limit it to a specific range, then you need to stick with {x1,x2}

5) You used \S that means anything that is not a whitespace character. Well you don't need that. \w{1,100} or \w+ matches all of your non-whitespace characters, because it only matches a-z, A-Z, 0-9 and _

6) $ tells the engine that after however many \w's match, an end of line must occur. So, when the pattern matches the "by blah" in the middle of the string, the pattern fails in the end, because a \n must come after it.

7) the ~....~i (the "i") means make it case insensitive. That is for in case "by" is spelled "BY" or "By" or "bY". If you do not expect that to happen, you can remove the "i".

Anti-Moronic · March 15, 2009

~\s?by\s?\w+$~i

1) no reason for ~ over # except that's my delimiter of choice.

2) the \s at the beginning is to account for the space before by, in your string. Otherwise there will be an extra space. It will return for example, "one is two " The ? is more of a precaution than anything, for that space. If your data somehow ends up looking like this: "one is twoby three", the pattern will not work, because it expects to match the preceding space. If you know for sure that that won't happen, then you can remove that first ?.

3) The same can be said for the 2nd ? as well. the pattern expects a space to be between the y and next word character. If for some reason its not there, the pattern won't match. If you know that this will not ever come up, you can remove the 2nd ? as well.

4) \w+ The + replaces the {1,100} it is the wildcard you asked for. It means "one or more of the previous thing". You can use * instead of + to do zero or more word characters. If you want to limit it to a specific range, then you need to stick with {x1,x2}

5) You used \S that means anything that is not a whitespace character. Well you don't need that. \w{1,100} or \w+ matches all of your non-whitespace characters, because it only matches a-z, A-Z, 0-9 and _

6) $ tells the engine that after however many \w's match, an end of line must occur. So, when the pattern matches the "by blah" in the middle of the string, the pattern fails in the end, because a \n must come after it.

7) the ~....~i (the "i") means make it case insensitive. That is for in case "by" is spelled "BY" or "By" or "bY". If you do not expect that to happen, you can remove the "i".

Thank you very very much for that explanation. I really appreciate your time. Lot of food for thought there.

Thanks.

Sign In

simple regex..appreciate any help

Recommended Posts

Anti-Moronic

Link to comment

Share on other sites

Anti-Moronic

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

Anti-Moronic

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

Anti-Moronic

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information