Anti-Moronic Posted March 14, 2009 Share Posted March 14, 2009 This must be simple but regex isn't my strongpoint. I'm come up with this so far: $title = "two by three is one"; $title = preg_replace("#by\s\w{1,100}\S#", "", $title); OUTPUT: "two is one" $title2 = "one is two by three"; $title2 = preg_replace("#by\s\w{1,100}\S#", "", $title2); OUTPUT: "one is two" ... That seems fairly simple, it is doing what I told it to do, but I'm *trying* to tell it to only remove "by *" IF it is at the end, and not anywhere else in a string. Can anyone help please? Thanks for any help. Quote Link to comment Share on other sites More sharing options...
Anti-Moronic Posted March 14, 2009 Author Share Posted March 14, 2009 Oh yeh, and is there a way to replace the limit {1,100} with a wildcard? Quote Link to comment Share on other sites More sharing options...
.josh Posted March 14, 2009 Share Posted March 14, 2009 preg_replace("~\s?by\s?\w+$~i", "", $title); Quote Link to comment Share on other sites More sharing options...
Anti-Moronic Posted March 14, 2009 Author Share Posted March 14, 2009 preg_replace("~\s?by\s?\w+$~i", "", $title); Thank you very much, that worked perfectly. Of course, I simply replace \S with +$ in mine. is there a reason for the ~ delimiters and the altered expression? I wouldn't normally ask, but I want to learn this stuff and I am always to listening to the best Thanks any way. Quote Link to comment Share on other sites More sharing options...
.josh Posted March 14, 2009 Share Posted March 14, 2009 ~\s?by\s?\w+$~i 1) no reason for ~ over # except that's my delimiter of choice. 2) the \s at the beginning is to account for the space before by, in your string. Otherwise there will be an extra space. It will return for example, "one is two " The ? is more of a precaution than anything, for that space. If your data somehow ends up looking like this: "one is twoby three", the pattern will not work, because it expects to match the preceding space. If you know for sure that that won't happen, then you can remove that first ?. 3) The same can be said for the 2nd ? as well. the pattern expects a space to be between the y and next word character. If for some reason its not there, the pattern won't match. If you know that this will not ever come up, you can remove the 2nd ? as well. 4) \w+ The + replaces the {1,100} it is the wildcard you asked for. It means "one or more of the previous thing". You can use * instead of + to do zero or more word characters. If you want to limit it to a specific range, then you need to stick with {x1,x2} 5) You used \S that means anything that is not a whitespace character. Well you don't need that. \w{1,100} or \w+ matches all of your non-whitespace characters, because it only matches a-z, A-Z, 0-9 and _ 6) $ tells the engine that after however many \w's match, an end of line must occur. So, when the pattern matches the "by blah" in the middle of the string, the pattern fails in the end, because a \n must come after it. 7) the ~....~i (the "i") means make it case insensitive. That is for in case "by" is spelled "BY" or "By" or "bY". If you do not expect that to happen, you can remove the "i". Quote Link to comment Share on other sites More sharing options...
Anti-Moronic Posted March 15, 2009 Author Share Posted March 15, 2009 ~\s?by\s?\w+$~i 1) no reason for ~ over # except that's my delimiter of choice. 2) the \s at the beginning is to account for the space before by, in your string. Otherwise there will be an extra space. It will return for example, "one is two " The ? is more of a precaution than anything, for that space. If your data somehow ends up looking like this: "one is twoby three", the pattern will not work, because it expects to match the preceding space. If you know for sure that that won't happen, then you can remove that first ?. 3) The same can be said for the 2nd ? as well. the pattern expects a space to be between the y and next word character. If for some reason its not there, the pattern won't match. If you know that this will not ever come up, you can remove the 2nd ? as well. 4) \w+ The + replaces the {1,100} it is the wildcard you asked for. It means "one or more of the previous thing". You can use * instead of + to do zero or more word characters. If you want to limit it to a specific range, then you need to stick with {x1,x2} 5) You used \S that means anything that is not a whitespace character. Well you don't need that. \w{1,100} or \w+ matches all of your non-whitespace characters, because it only matches a-z, A-Z, 0-9 and _ 6) $ tells the engine that after however many \w's match, an end of line must occur. So, when the pattern matches the "by blah" in the middle of the string, the pattern fails in the end, because a \n must come after it. 7) the ~....~i (the "i") means make it case insensitive. That is for in case "by" is spelled "BY" or "By" or "bY". If you do not expect that to happen, you can remove the "i". Thank you very very much for that explanation. I really appreciate your time. Lot of food for thought there. Thanks. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.