Jump to content

Regex not matching when I think it should


joshabts

Recommended Posts

Hello all,

 

Hopefully someone can point out where I am hitting a snag with this expression.

 

This is the input line:

Sun Jan 11 22:45:45 2009 :: A young shopkeeper reports: 'User just purchased 'an item' for 1,000 gp.'

 

This is my matching line:

preg_match_all("/(.*) :: A young shopkeeper reports: '(.*) just purchased '(.*)' for (.*) gp\.'/", $_POST['theLog'], $matchedItemsSold);

 

The input line is being passed with other input lines in a textarea named "theLog" and when I run this, $matchedItemsSold is just an empty array.  So for some reason that line isn't matching.

 

Any ideas?  Thanks in advance!

Link to comment
Share on other sites

I am going to assume that your variable $_POST['theLog'] contains multiple entires of your sample string, because you are using preg_match_all.

I am also going to assume from your pattern that you want to capture the User's name, the Item in question, along with it's price value?

 

 

$str = 'Sun Jan 11 22:45:45 2009 :: A young shopkeeper reports: \'Frankie just purchased \'Box of Cornflakes\' for 1,000 gp.\' ...whatever... Sun Jan 11 22:45:45 2009 :: A young shopkeeper reports: \'Arnold just purchased \'Dumbells\' for 12,000 gp.\'';
preg_match_all("#:: A young shopkeeper reports: '([^ ]+) just purchased '([^']+)' for ([^ ]+) gp\.'#", $str, $matchedItemsSold, PREG_SET_ORDER);
echo "<pre>".print_r($matchedItemsSold, true);

 

Output (via pre / print_r):

Array
(
    [0] => Array
        (
            [0] => :: A young shopkeeper reports: 'Frankie just purchased 'Box of Cornflakes' for 1,000 gp.'
            [1] => Frankie
            [2] => Box of Cornflakes
            [3] => 1,000
        )

    [1] => Array
        (
            [0] => :: A young shopkeeper reports: 'Arnold just purchased 'Dumbells' for 12,000 gp.'
            [1] => Arnold
            [2] => Dumbells
            [3] => 12,000
        )

)

 

Therefore, the array [x][0] always represents the completely matched pattern.

Thus, array[x][1] represents the user, array[x][2] is the items, and finally, array[x][3] is the value spent.

Link to comment
Share on other sites

Sorry for not clarifying, yes the POST variable could contain additional lines similar to that, but they also will contain other lines such as it may have lines of the following:

 

Tue Jan 13 22:04:05 2009 :: Clan 120 (User): Deposit of 102,964 gold pieces.

 

I have however got the capture to work with the above line and another preg_match_all, its just the line I submitted that won't match.

 

And I would also like to capture the timestamp at the beginning of the line.  Not sure if it would make a difference but the item name could also contain a ' and ".  That was why I was thinking the (.*) would capture those fields, but apparently I wasn't accounting for something right.

 

Thanks for the help!

Link to comment
Share on other sites

Sorry for not clarifying, yes the POST variable could contain additional lines similar to that, but they also will contain other lines such as it may have lines of the following:

 

Tue Jan 13 22:04:05 2009 :: Clan 120 (User): Deposit of 102,964 gold pieces.

This is why if there is going to be some variances, show them. We can only help out with what you have shown us. Showing the possibilities makes the problem solving issue less recursive. Please note that for next time.

 

...I was thinking the (.*) would capture those fields, but apparently I wasn't accounting for something right.

Using .* is not recommended. Read this thread . Note post #11 and 14 to understand why.

 

Here is what I came up with:

 

$str = 'Sun Jan 11 22:45:45 2009 :: A young shopkeeper reports: \'Frankie just purchased \'Box of Cornflakes\' for 1,000 gp.\' ...whatever... Tue Jan 13 22:04:05 2009 :: Clan 120 Troll: Deposit of 102,964 gold pieces.';
preg_match_all("#((?:Sun|Mon|Tue|Wed|Thu|Fri|Sat).+?) :: (?:Clan \d+ ([^:]+): Deposit of ([\d,]+) gold pieces|A young shopkeeper reports: '(.+?) just purchased '([^']+)' for ([^ ]+) gp)#", $str, $matchedItemsSold, PREG_SET_ORDER);

$rootArrCount = count($matchedItemsSold);
for ($a = 0 ; $a < $rootArrCount ; $a++) {
    unset($matchedItemsSold[$a][0]); // get rid of aboslute pattern match array element zero

    $subArrCount = count($matchedItemsSold[$a]);
    for ($x = 0 ; $x < $subArrCount ; $x++) {
        if ( empty($matchedItemsSold[$a][$x]) ){
            unset ($matchedItemsSold[$a][$x]); // kill off empty entries
        }
    }

    $arrayTemp = array();
    $matchedItemsSold[$a] = array_merge($arrayTemp, $matchedItemsSold[$a]); // reorder all array keys (as unset unorders them)
    unset($arrayTemp);
    
    if($subArrCount > 3){ // young shopkeeper reports format detected
       // do whatever with shopkeep format...
    } else { // Clan format detected
       // do whatever with Clan format...
    }
}
echo "<pre>".print_r($matchedItemsSold, true);

 

Output:

Array
(
    [0] => Array
        (
            [0] => Sun Jan 11 22:45:45 2009
            [1] => Frankie
            [2] => Box of Cornflakes
            [3] => 1,000
        )

    [1] => Array
        (
            [0] => Tue Jan 13 22:04:05 2009
            [1] => Troll
            [2] => 102,964
        )
)

 

I supplied the meat and potatoes, you supply the gravy.

Link to comment
Share on other sites

So I think this will solve my problem, but if I want to capture these two lines at two different times I can use your above string and just remove the second part with the (?: <stuff here> | <other here>) and just use one or the other right?

 

I am trying to capture just the item sold line with the time, user, item, and value, the other line can just be ignored for now.

Link to comment
Share on other sites

My pattern is partly an alternation (so whether you have one format or the other, it doesn't matter... the pattern will find either one / or both). So for time, user, item and value, you just tap into the array elements you need.

 

So array[0][0] is time, array[0][1] is user, array[0][2] is item and array[0][3] is value (you can see this in the output in my last post). If there are multiple entries using this format in the variable you are checking, then the next set will be array[1][0], array[1][1], array[1][2], etc..

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.