king.oslo Posted August 1, 2009 Share Posted August 1, 2009 Hello, this is my string: $string = '10000|one|1|24|mens***10001|two|48|tiss***'; I would like to find all occurences of: between 1 and 4 times [a-z0-9] after an '|' and apart if the text is one, two, three or four. Something like: /\|\w[^one|two|three|four]{1,4}/ What is the simplest way to do this? Thanks :) Marius Quote Link to comment https://forums.phpfreaks.com/topic/168431-solved-simplest-way-to-do-this/ Share on other sites More sharing options...
.josh Posted August 1, 2009 Share Posted August 1, 2009 ~\|(?!one|two|three|four)[a-z0-9]{1,4}~i Quote Link to comment https://forums.phpfreaks.com/topic/168431-solved-simplest-way-to-do-this/#findComment-888544 Share on other sites More sharing options...
king.oslo Posted August 1, 2009 Author Share Posted August 1, 2009 Amazing! What does the ~ do? And why doesn't /\|\w+(?<!(one|two))/ work? Thanks, Marius Quote Link to comment https://forums.phpfreaks.com/topic/168431-solved-simplest-way-to-do-this/#findComment-888547 Share on other sites More sharing options...
.josh Posted August 1, 2009 Share Posted August 1, 2009 ~ ... ~ does the same thing as your / ... / does: they are delimiters. PHP accepts lots of things as the delimiter. I usually use ~ because it doesn't come up inside the pattern a whole lot so I don't have to make sure I'm escaping it. / is used in patterns a lot because regex is usually popular with scraping webpages, and / is a common symbol in webpage code, so when you use / ... / as the delimiter, you end up having to escape shit a lot. And why doesn't /\|\w+(?<!(one|two))/ work? Because you are telling it to first match a literal |, then 1 or more word characters (as defined by your locale, not necessarily just [a-zA-Z0-9_], and then you have a negative lookbehind, so after \w+ can't match anything else, the engine steps back 3 characters to make sure whatever was matched by \w+ doesn't end in 'one' or 'two'. So in other words, you were doing it backwards. What you really wanted was a negative lookahead: \|(?!one|two|three|four)[a-z0-9]{1,4} which says: Okay, first let's match a |. After that, we are going to look ahead and make sure the next thing in the string after that | is not a "one" "two" "three" or "four". If it's not, then let's look to see if we can match 1 to 4 a-z0-9 chars (case insensitive because of the i modifier) after that pipe. The main thing to grasp here is that lookarounds have zero width assertion. What this means is that it will look for the (non)match, but it doesn't actually move the position of the pointer forward/backward. This is the main thing that causes people to trip on lookarounds. Quote Link to comment https://forums.phpfreaks.com/topic/168431-solved-simplest-way-to-do-this/#findComment-888560 Share on other sites More sharing options...
king.oslo Posted August 1, 2009 Author Share Posted August 1, 2009 ~ ... ~ does the same thing as your / ... / does: they are delimiters. PHP accepts lots of things as the delimiter. I usually use ~ because it doesn't come up inside the pattern a whole lot so I don't have to make sure I'm escaping it. / is used in patterns a lot because regex is usually popular with scraping webpages, and / is a common symbol in webpage code, so when you use / ... / as the delimiter, you end up having to escape shit a lot. And why doesn't /\|\w+(?<!(one|two))/ work? Because you are telling it to first match a literal |, then 1 or more word characters (as defined by your locale, not necessarily just [a-zA-Z0-9_], and then you have a negative lookbehind, so after \w+ can't match anything else, the engine steps back 3 characters to make sure whatever was matched by \w+ doesn't end in 'one' or 'two'. So in other words, you were doing it backwards. What you really wanted was a negative lookahead: \|(?!one|two|three|four)[a-z0-9]{1,4} which says: Okay, first let's match a |. After that, we are going to look ahead and make sure the next thing in the string after that | is not a "one" "two" "three" or "four". If it's not, then let's look to see if we can match 1 to 4 a-z0-9 chars (case insensitive because of the i modifier) after that pipe. The main thing to grasp here is that lookarounds have zero width assertion. What this means is that it will look for the (non)match, but it doesn't actually move the position of the pointer forward/backward. This is the main thing that causes people to trip on lookarounds. Great! Thanks! Yes, i see the convenience of using ~, and why I was doing it backwards. great to learn this Thank you! Wish you a pleasant evening! Marius Quote Link to comment https://forums.phpfreaks.com/topic/168431-solved-simplest-way-to-do-this/#findComment-888566 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.