Jump to content

Need help with regex


elitegosu

Recommended Posts

Hello everyone,

 

I'm trying to match everything but null using regular expression. For example:

 

Text: name="value1" name="" name="value2"

 

I want to be able to match just value1 and value2 from the text above.  I tried using !name=\"(.*?)\"! but the * matches 0 or more times, so it matches the name="" as well. I tried replacing the star with a + which is supposed to match any character (.) 1 or more times, but that's not working, in fact it behaves rather odd when I do that. Check this out:

 

this is what I get when using !name=\"(.*?)\"!

 

Array
(
    [0] => Array
        (
            [0] => value="val1"
            [1] => value=""
            [2] => value="val2"
        )

    [1] => Array
        (
            [0] => val1
            [1] => 
            [2] => val2
        )

 

And this is what I get when using !name=\"(.+?)\"! (using plus instead of a star)

 

Array
(
    [0] => Array
        (
            [0] => value="val1"
            [1] => value="" value="
        )

    [1] => Array
        (
            [0] => val1
            [1] => " value=
        )

)

 

So it seems that .+? doesn't work like .*? at all....is there a bug in PHP or am I missing something? What is the best way to match anything but a null value? I would really appreciate any help, I'm currently reading a book "Matering Regular Expressions" so hopefullyl I will be able to figure this out once I finish it.

Link to comment
Share on other sites

so there must be at least one match of the pattern for a result to be returned. Does that make sense?

 

Right, that makes sense. However why does it match value="" value=", I would think that if there is no value between \" \" then there is no match and it shouldn't match anything? Instead it looks like it becomes greedy and goes on to match text beyond the closing \"

Link to comment
Share on other sites

You are not looking at the ENTIRE pattern. If this is your pattern

!name=\"(.*?)\"!

 

Then you are telling the system to find text that begins with name=", then has 0 or more characters, and then ends with ". You are thinking too much like a human because you are assuming that it shouldn't match no characters. But, that is a perfectly valid match.

 

What if you wanted to know the value of EVERY "name" parameter - even the ones that are empty? If (.*?) didn't do that, what would.

 

 

Link to comment
Share on other sites

$pattern = '/name="(.+)"/';

 

@Zach,

 

Did you even bother reading the post? If so, you would have seen that 1) the issue the OP had was already solved and the discussion was only continuing to explain a concept with regex patterns and 2) the pattern you supplied would NOT solve the OPs problem because it is "greedy".

Link to comment
Share on other sites

You are not looking at the ENTIRE pattern. If this is your pattern

!name=\"(.*?)\"!

 

Then you are telling the system to find text that begins with name=", then has 0 or more characters, and then ends with ". You are thinking too much like a human because you are assuming that it shouldn't match no characters. But, that is a perfectly valid match.

 

Sorry if I wasn't clear enough on this. What I'm saying is this pattern - !name=\"(.*?)\"! is matching everything the way its supposed to, I know it should match the empty value in name="" as well, its working fine.

 

My question was about this pattern -  !name=\"(.+?)\"! I expect it to match it just the value between name="val1" because (.+?) is supposed to match the pattern 1 or more times, it shouldn't match name="" because there is nothing in between the double quotes.

So why is it still matching this - " value=

 

Here is the match of the !name=\"(.+?)\"! again:

Array
(
    [0] => Array
        (
            [0] => value="val1"
            [1] => value="" value="
        )

    [1] => Array
        (
            [0] => val1
            [1] => " value=
        )

)

Link to comment
Share on other sites

Because this is the problem with using such generalised approaches as .+ catch all patterns. If we are matching 0 or more characters then the lazy modifier will cause it to stop as soon as it meets the character we are matching afterwards i.e. in this case the ". Now think about the logic behind .+? in your pattern when it is given the string name="" something". The pattern will attempt to match the shortest string possible that matches the requirements, first it must match one character so it will match what you would consider to be the closing double quote it will then keep matching until it finds the " character to match in your pattern, thus returning " something" as the sub-pattern matched. What you really want to match is one or more characters that isn't the " followed by the ". So you should use a pattern such as....

 

'#name="([^"]+)"#'

 

Since this will only match characters that aren't the " it will fail when it meets name="". The more specific you can make your patterns the better.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.