Jump to content

[SOLVED] simplest way to do this?


king.oslo

Recommended Posts

Hello, this is my string:

 

$string = '10000|one|1|24|mens***10001|two|48|tiss***';

 

I would like to find all occurences of:

 

between 1 and 4 times [a-z0-9] after an '|' and apart if the text is one, two, three or four. Something like:

 

/\|\w[^one|two|three|four]{1,4}/

 

What is the simplest way to do this?

 

Thanks :) :)

 

Marius

Link to comment
https://forums.phpfreaks.com/topic/168431-solved-simplest-way-to-do-this/
Share on other sites

~ ... ~ does the same thing as your / ... / does: they are delimiters.  PHP accepts lots of things as the delimiter.  I usually use ~ because it doesn't come up inside the pattern a whole lot so I don't have to make sure I'm escaping it.  / is used in patterns a lot because regex is usually popular with scraping webpages, and / is a common symbol in webpage code, so when you use / ... / as the delimiter, you end up having to escape shit a lot. 

 

And why doesn't /\|\w+(?<!(one|two))/ work?

 

Because you are telling it to first match a literal |, then 1 or more word characters (as defined by your locale, not necessarily just [a-zA-Z0-9_], and then you have a negative lookbehind, so after \w+ can't match anything else, the engine steps back 3 characters to make sure whatever was matched by \w+ doesn't end in 'one' or 'two'.  So in other words, you were doing it backwards.  What you really wanted was a negative lookahead:

 

\|(?!one|two|three|four)[a-z0-9]{1,4}

 

which says:

 

Okay, first let's match a |. After that, we are going to look ahead and make sure the next thing in the string after that |  is not a "one" "two" "three" or "four".  If it's not, then let's look to see if we can match 1 to 4 a-z0-9 chars (case insensitive because of the i modifier) after that pipe.

 

The main thing to grasp here is that lookarounds have zero width assertion.  What this means is that it will look for the (non)match, but it doesn't actually move the position of the pointer forward/backward.  This is the main thing that causes people to trip on lookarounds.

~ ... ~ does the same thing as your / ... / does: they are delimiters.  PHP accepts lots of things as the delimiter.  I usually use ~ because it doesn't come up inside the pattern a whole lot so I don't have to make sure I'm escaping it.  / is used in patterns a lot because regex is usually popular with scraping webpages, and / is a common symbol in webpage code, so when you use / ... / as the delimiter, you end up having to escape shit a lot. 

 

And why doesn't /\|\w+(?<!(one|two))/ work?

 

Because you are telling it to first match a literal |, then 1 or more word characters (as defined by your locale, not necessarily just [a-zA-Z0-9_], and then you have a negative lookbehind, so after \w+ can't match anything else, the engine steps back 3 characters to make sure whatever was matched by \w+ doesn't end in 'one' or 'two'.  So in other words, you were doing it backwards.  What you really wanted was a negative lookahead:

 

\|(?!one|two|three|four)[a-z0-9]{1,4}

 

which says:

 

Okay, first let's match a |. After that, we are going to look ahead and make sure the next thing in the string after that |  is not a "one" "two" "three" or "four".  If it's not, then let's look to see if we can match 1 to 4 a-z0-9 chars (case insensitive because of the i modifier) after that pipe.

 

The main thing to grasp here is that lookarounds have zero width assertion.  What this means is that it will look for the (non)match, but it doesn't actually move the position of the pointer forward/backward.  This is the main thing that causes people to trip on lookarounds.

 

Great! Thanks! Yes, i see the convenience of using ~, and why I was doing it backwards. great to learn this :)

 

Thank you! Wish you a pleasant evening! :)

 

Marius

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.