Jump to content

Recommended Posts

Hello, this is my string:

 

$string = '10000|one|1|24|mens***10001|two|48|tiss***';

 

I would like to find all occurences of:

 

between 1 and 4 times [a-z0-9] after an '|' and apart if the text is one, two, three or four. Something like:

 

/\|\w[^one|two|three|four]{1,4}/

 

What is the simplest way to do this?

 

Thanks :) :)

 

Marius

Link to comment
https://forums.phpfreaks.com/topic/168431-solved-simplest-way-to-do-this/
Share on other sites

~ ... ~ does the same thing as your / ... / does: they are delimiters.  PHP accepts lots of things as the delimiter.  I usually use ~ because it doesn't come up inside the pattern a whole lot so I don't have to make sure I'm escaping it.  / is used in patterns a lot because regex is usually popular with scraping webpages, and / is a common symbol in webpage code, so when you use / ... / as the delimiter, you end up having to escape shit a lot. 

 

And why doesn't /\|\w+(?<!(one|two))/ work?

 

Because you are telling it to first match a literal |, then 1 or more word characters (as defined by your locale, not necessarily just [a-zA-Z0-9_], and then you have a negative lookbehind, so after \w+ can't match anything else, the engine steps back 3 characters to make sure whatever was matched by \w+ doesn't end in 'one' or 'two'.  So in other words, you were doing it backwards.  What you really wanted was a negative lookahead:

 

\|(?!one|two|three|four)[a-z0-9]{1,4}

 

which says:

 

Okay, first let's match a |. After that, we are going to look ahead and make sure the next thing in the string after that |  is not a "one" "two" "three" or "four".  If it's not, then let's look to see if we can match 1 to 4 a-z0-9 chars (case insensitive because of the i modifier) after that pipe.

 

The main thing to grasp here is that lookarounds have zero width assertion.  What this means is that it will look for the (non)match, but it doesn't actually move the position of the pointer forward/backward.  This is the main thing that causes people to trip on lookarounds.

~ ... ~ does the same thing as your / ... / does: they are delimiters.  PHP accepts lots of things as the delimiter.  I usually use ~ because it doesn't come up inside the pattern a whole lot so I don't have to make sure I'm escaping it.  / is used in patterns a lot because regex is usually popular with scraping webpages, and / is a common symbol in webpage code, so when you use / ... / as the delimiter, you end up having to escape shit a lot. 

 

And why doesn't /\|\w+(?<!(one|two))/ work?

 

Because you are telling it to first match a literal |, then 1 or more word characters (as defined by your locale, not necessarily just [a-zA-Z0-9_], and then you have a negative lookbehind, so after \w+ can't match anything else, the engine steps back 3 characters to make sure whatever was matched by \w+ doesn't end in 'one' or 'two'.  So in other words, you were doing it backwards.  What you really wanted was a negative lookahead:

 

\|(?!one|two|three|four)[a-z0-9]{1,4}

 

which says:

 

Okay, first let's match a |. After that, we are going to look ahead and make sure the next thing in the string after that |  is not a "one" "two" "three" or "four".  If it's not, then let's look to see if we can match 1 to 4 a-z0-9 chars (case insensitive because of the i modifier) after that pipe.

 

The main thing to grasp here is that lookarounds have zero width assertion.  What this means is that it will look for the (non)match, but it doesn't actually move the position of the pointer forward/backward.  This is the main thing that causes people to trip on lookarounds.

 

Great! Thanks! Yes, i see the convenience of using ~, and why I was doing it backwards. great to learn this :)

 

Thank you! Wish you a pleasant evening! :)

 

Marius

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.