Jump to content

simple regex..appreciate any help


Anti-Moronic

Recommended Posts

This must be simple but regex isn't my strongpoint.

 

I'm come up with this so far:

 

$title = "two by three is one";

 

$title = preg_replace("#by\s\w{1,100}\S#", "", $title);

 

OUTPUT: "two is one"

 

$title2 = "one is two by three";

 

$title2 = preg_replace("#by\s\w{1,100}\S#", "", $title2);

 

OUTPUT: "one is two"

 

...

 

That seems fairly simple, it is doing what I told it to do, but I'm *trying* to tell it to only remove "by *" IF it is at the end, and not anywhere else in a string.

 

Can anyone help please? Thanks for any help.

Link to comment
Share on other sites

preg_replace("~\s?by\s?\w+$~i", "", $title);

 

Thank you very much, that worked perfectly. Of course, I simply replace \S with +$ in mine. is there a reason for the ~ delimiters and the altered expression?

 

I wouldn't normally ask, but I want to learn this stuff and I am always to listening to the best ;)

 

Thanks any way.

Link to comment
Share on other sites

~\s?by\s?\w+$~i

 

1) no reason for ~ over # except that's my delimiter of choice.

 

2) the \s at the beginning is to account for the space before by, in your string.  Otherwise there will be an extra space. It will return for example, "one is two "  The ? is more of a precaution than anything, for that space. If your data somehow ends up looking like this: "one is twoby three", the pattern will not work, because it expects to match the preceding space.  If you know for sure that that won't happen, then you can remove that first ?.

 

3) The same can be said for the 2nd ? as well.  the pattern expects a space to be between the y and next word character.  If for some reason its not there, the pattern won't match.  If you know that this will not ever come up, you can remove the 2nd ? as well.

 

4) \w+ The + replaces the {1,100} it is the wildcard you asked for.  It means "one or more of the previous thing".  You can use * instead of + to do zero or more word characters.  If you want to limit it to a specific range, then you need to stick with {x1,x2}

 

5) You used \S that means anything that is not a whitespace character.  Well you don't need that.  \w{1,100} or \w+ matches all of your non-whitespace characters, because it only matches a-z, A-Z, 0-9 and _

 

6) $ tells the engine that after however many \w's match, an end of line must occur.  So, when the pattern matches the "by blah" in the middle of the string, the pattern fails in the end, because a \n must come after it. 

 

7) the ~....~i (the "i") means make it case insensitive.  That is for in case "by" is spelled "BY" or "By" or "bY".  If you do not expect that to happen, you can remove the "i".

 

 

Link to comment
Share on other sites

~\s?by\s?\w+$~i

 

1) no reason for ~ over # except that's my delimiter of choice.

 

2) the \s at the beginning is to account for the space before by, in your string.  Otherwise there will be an extra space. It will return for example, "one is two "  The ? is more of a precaution than anything, for that space. If your data somehow ends up looking like this: "one is twoby three", the pattern will not work, because it expects to match the preceding space.  If you know for sure that that won't happen, then you can remove that first ?.

 

3) The same can be said for the 2nd ? as well.  the pattern expects a space to be between the y and next word character.  If for some reason its not there, the pattern won't match.  If you know that this will not ever come up, you can remove the 2nd ? as well.

 

4) \w+ The + replaces the {1,100} it is the wildcard you asked for.  It means "one or more of the previous thing".  You can use * instead of + to do zero or more word characters.  If you want to limit it to a specific range, then you need to stick with {x1,x2}

 

5) You used \S that means anything that is not a whitespace character.  Well you don't need that.  \w{1,100} or \w+ matches all of your non-whitespace characters, because it only matches a-z, A-Z, 0-9 and _

 

6) $ tells the engine that after however many \w's match, an end of line must occur.  So, when the pattern matches the "by blah" in the middle of the string, the pattern fails in the end, because a \n must come after it. 

 

7) the ~....~i (the "i") means make it case insensitive.  That is for in case "by" is spelled "BY" or "By" or "bY".  If you do not expect that to happen, you can remove the "i".

 

Thank you very very much for that explanation. I really appreciate your time. Lot of food for thought there.

 

Thanks.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.