elitegosu Posted June 13, 2010 Share Posted June 13, 2010 Hello everyone, I'm trying to match everything but null using regular expression. For example: Text: name="value1" name="" name="value2" I want to be able to match just value1 and value2 from the text above. I tried using !name=\"(.*?)\"! but the * matches 0 or more times, so it matches the name="" as well. I tried replacing the star with a + which is supposed to match any character (.) 1 or more times, but that's not working, in fact it behaves rather odd when I do that. Check this out: this is what I get when using !name=\"(.*?)\"! Array ( [0] => Array ( [0] => value="val1" [1] => value="" [2] => value="val2" ) [1] => Array ( [0] => val1 [1] => [2] => val2 ) And this is what I get when using !name=\"(.+?)\"! (using plus instead of a star) Array ( [0] => Array ( [0] => value="val1" [1] => value="" value=" ) [1] => Array ( [0] => val1 [1] => " value= ) ) So it seems that .+? doesn't work like .*? at all....is there a bug in PHP or am I missing something? What is the best way to match anything but a null value? I would really appreciate any help, I'm currently reading a book "Matering Regular Expressions" so hopefullyl I will be able to figure this out once I finish it. Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/ Share on other sites More sharing options...
elitegosu Posted June 13, 2010 Author Share Posted June 13, 2010 just used value=\"([^\"]+)\" suggested in another thread and it worked! still don't understand why (.+?) behaves like that though. Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1071596 Share on other sites More sharing options...
Psycho Posted June 14, 2010 Share Posted June 14, 2010 Well, as you stated * matches 0 or more occurances. The plus symbol matches one or more occurances - so there must be at least one match of the pattern for a result to be returned. Does that make sense? Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1072051 Share on other sites More sharing options...
elitegosu Posted June 14, 2010 Author Share Posted June 14, 2010 so there must be at least one match of the pattern for a result to be returned. Does that make sense? Right, that makes sense. However why does it match value="" value=", I would think that if there is no value between \" \" then there is no match and it shouldn't match anything? Instead it looks like it becomes greedy and goes on to match text beyond the closing \" Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1072108 Share on other sites More sharing options...
Psycho Posted June 14, 2010 Share Posted June 14, 2010 You are not looking at the ENTIRE pattern. If this is your pattern !name=\"(.*?)\"! Then you are telling the system to find text that begins with name=", then has 0 or more characters, and then ends with ". You are thinking too much like a human because you are assuming that it shouldn't match no characters. But, that is a perfectly valid match. What if you wanted to know the value of EVERY "name" parameter - even the ones that are empty? If (.*?) didn't do that, what would. Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1072127 Share on other sites More sharing options...
ZachMEdwards Posted June 15, 2010 Share Posted June 15, 2010 $pattern = '/name="(.+)"/'; Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1072227 Share on other sites More sharing options...
Psycho Posted June 15, 2010 Share Posted June 15, 2010 $pattern = '/name="(.+)"/'; @Zach, Did you even bother reading the post? If so, you would have seen that 1) the issue the OP had was already solved and the discussion was only continuing to explain a concept with regex patterns and 2) the pattern you supplied would NOT solve the OPs problem because it is "greedy". Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1072244 Share on other sites More sharing options...
elitegosu Posted June 15, 2010 Author Share Posted June 15, 2010 You are not looking at the ENTIRE pattern. If this is your pattern !name=\"(.*?)\"! Then you are telling the system to find text that begins with name=", then has 0 or more characters, and then ends with ". You are thinking too much like a human because you are assuming that it shouldn't match no characters. But, that is a perfectly valid match. Sorry if I wasn't clear enough on this. What I'm saying is this pattern - !name=\"(.*?)\"! is matching everything the way its supposed to, I know it should match the empty value in name="" as well, its working fine. My question was about this pattern - !name=\"(.+?)\"! I expect it to match it just the value between name="val1" because (.+?) is supposed to match the pattern 1 or more times, it shouldn't match name="" because there is nothing in between the double quotes. So why is it still matching this - " value= Here is the match of the !name=\"(.+?)\"! again: Array ( [0] => Array ( [0] => value="val1" [1] => value="" value=" ) [1] => Array ( [0] => val1 [1] => " value= ) ) Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1072264 Share on other sites More sharing options...
cags Posted June 15, 2010 Share Posted June 15, 2010 Because this is the problem with using such generalised approaches as .+ catch all patterns. If we are matching 0 or more characters then the lazy modifier will cause it to stop as soon as it meets the character we are matching afterwards i.e. in this case the ". Now think about the logic behind .+? in your pattern when it is given the string name="" something". The pattern will attempt to match the shortest string possible that matches the requirements, first it must match one character so it will match what you would consider to be the closing double quote it will then keep matching until it finds the " character to match in your pattern, thus returning " something" as the sub-pattern matched. What you really want to match is one or more characters that isn't the " followed by the ". So you should use a pattern such as.... '#name="([^"]+)"#' Since this will only match characters that aren't the " it will fail when it meets name="". The more specific you can make your patterns the better. Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1072293 Share on other sites More sharing options...
elitegosu Posted June 15, 2010 Author Share Posted June 15, 2010 Thanks cags, that explained it perfectly. Quote Link to comment https://forums.phpfreaks.com/topic/204666-need-help-with-regex/#findComment-1072614 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.