Jump to content

[SOLVED] Parsing a complex string with preg_match_all


thegodfaza

Recommended Posts

So basically I'm developing a PHP RCon tool for use in the CoD series of games. I am limited to work with what comes out of the server. Ex: $data is the responce packet(s) from the query. For this problem I have supplied the data. The amount of whitespace between entries are subject to change, though there will always be at least one space. The problem that I'm having is getting player names with a space in them. There could be multiple spaces in the player name or multiple spaces in a row.

 

Regex code for people who don't want to hunt for it:

/^\s*(\d+)\s*(\d+)\s*(\d+)\s*([0-9a-f]*)\s*(.*?\S*\s*)\s*(\d*)\s*(\d*)$/

 

PHP File:

<?php
$data = "map: mp_showdown
num score ping guid                             name            lastmsg address               qport rate
--- ----- ---- -------------------------------- --------------- ------- --------------------- ----- -----
10     7   95  b8a09b5924fa2e5b79a56b9ae61c0954 Player 1        0       192.168.1.1:28960     2550  25000
11    25   37  0a6945dacb246b04d2a63daf3c50a877 Player 2        5       192.168.0.1:28960     25417 25000
12    15   80  00770f2f8fac5810cf62b6f2e4f0233a Player 3        0       192.168.0.2:28960     2280  25000
13    15   74  5396333c32a8ff26713da7ff6c0bcb73 Player 4        0       192.168.0.3:-10470    7773  25000
14    32   63  eebb4bbbdb04d2fc91e7d1e78310607d Player 5        0       192.168.0.4:28960     20599 25000
15    45   92  0f931649e253088674c305315ed13409 Player 6        0       192.168.0.5:28960     22608 25000
0     20   85  1e9e1b01dc19152154566816a2188e60 Player 7        0       192.168.0.6:28960     20643 25000
1     50   55  6fdabb920b571946229c06564be0024c Player 8        15      192.168.0.7:28960     3307  25000
2      0   71  7743eb260f7a3752890ae24678874e5d Player 9        0       192.168.0.8:-15259    -23125 18000
3      0   70  d09930d1d4eb4f7f120fc44b66a56b15 Player 10       0       192.168.0.9:28960     -23970 25000
4     20   72  7ae8f7c708dc91bb935bc03e84ba6975 Player 11       0       192.168.0.10:28960    -1276 25000
5      5  157  fc3b5031ec6e420e8a35c3b1acf0cd53 Player 12       0       192.168.0.11:28960    -31404 25000
6      5   40  79b311d031f9bfaca213e40a97037d03 Player 13       30      192.168.0.12:28960    -31819 25000
7     20   80  36bc7791fbe203c89bd058449691e912 Player 14       0       192.168.0.13:28960    -613  25000
8     10  162  b898156fbc3ed5d7b21518a7bf4a80c0 Player 15       0       192.168.0.14:28960    -15579 25000
9     30  194  dddb1af2422537f200dd61665cd54265 Player 16       10      192.168.0.15:52       -23523 25000";
$players = explode ("\n", $data );

array_shift($players);
array_shift($players);
array_shift($players);

foreach( $players as $input ) {
//                     ID     Score   Ping      GUID          Name     Last Mess   IP
preg_match_all("/^\s*(\d+)\s*(\d+)\s*(\d+)\s*([0-9a-f]*)\s*(.*?\S*\s*)\s*(\d*)\s*(\d*)$/",$input,$output);
$table .= "<tr><td>";
$table .= $output[1][0];
$table .= "</td><td>";
$table .= $output[2][0];
$table .= "</td><td>";
$table .= $output[3][0];
$table .= "</td><td>";
$table .= $output[4][0];
$table .= "</td><td>";
$table .= $output[5][0];
$table .= "</td><td>";
$table .= $output[6][0];
$table .= "</td><td>";
$table .= $output[7][0];
$table .= "</td></tr>";
}
echo "<table border=\"1\"><thead><tr><td>Player Number</td><td>Score</td><td>Ping</td><td>GUID</td><td>Name</td><td>Last Message</td><td>IP Address</td></tr></thead><tbody>".$table."</tbody></table>"
?>

 

I want to get:

$output[0]
   [0] => "10 7 95 b8a09b5924fa2e5b79a56b9ae61c0954 Player 1 0 192.168.1.1:28960"
   [1] => "10"
   [2] => "7"
   [3] => "95"
   [4] => "b8a09b5924fa2e5b79a56b9ae61c0954"
   [5] => "Player 1"
   [6] => "0"
   [7] => "192.168.1.1:28960"

 

What I am getting:

$output[0]
   [0] => "10 7 95 b8a09b5924fa2e5b79a56b9ae61c0954 Player 1 0 192.168.1.1:28960 2550 25000"
   [1] => "10"
   [2] => "7"
   [3] => "95"
   [4] => "b8a09b5924fa2e5b79a56b9ae61c0954"
   [5] => "Player 1 0 192.168.1.1:28960 "
   [6] => "2550"
   [7] => "25000"

Link to comment
Share on other sites

One possible solution could be:

 

$data = "map: mp_showdown
num score ping guid                             name            lastmsg address               qport rate
--- ----- ---- -------------------------------- --------------- ------- --------------------- ----- -----
10     7   95  b8a09b5924fa2e5b79a56b9ae61c0954 Player 1        0       192.168.1.1:28960     2550  25000
11    25   37  0a6945dacb246b04d2a63daf3c50a877 Player 2        5       192.168.0.1:28960     25417 25000
12    15   80  00770f2f8fac5810cf62b6f2e4f0233a Player 3        0       192.168.0.2:28960     2280  25000
13    15   74  5396333c32a8ff26713da7ff6c0bcb73 Player 4        0       192.168.0.3:-10470    7773  25000
14    32   63  eebb4bbbdb04d2fc91e7d1e78310607d Player 5        0       192.168.0.4:28960     20599 25000
15    45   92  0f931649e253088674c305315ed13409 Player 6        0       192.168.0.5:28960     22608 25000
0     20   85  1e9e1b01dc19152154566816a2188e60 Player 7        0       192.168.0.6:28960     20643 25000
1     50   55  6fdabb920b571946229c06564be0024c Player 8        15      192.168.0.7:28960     3307  25000
2      0   71  7743eb260f7a3752890ae24678874e5d Player 9        0       192.168.0.8:-15259    -23125 18000
3      0   70  d09930d1d4eb4f7f120fc44b66a56b15 Player 10       0       192.168.0.9:28960     -23970 25000
4     20   72  7ae8f7c708dc91bb935bc03e84ba6975 Player 11       0       192.168.0.10:28960    -1276 25000
5      5  157  fc3b5031ec6e420e8a35c3b1acf0cd53 Player 12       0       192.168.0.11:28960    -31404 25000
6      5   40  79b311d031f9bfaca213e40a97037d03 Player 13       30      192.168.0.12:28960    -31819 25000
7     20   80  36bc7791fbe203c89bd058449691e912 Player 14       0       192.168.0.13:28960    -613  25000
8     10  162  b898156fbc3ed5d7b21518a7bf4a80c0 Player 15       0       192.168.0.14:28960    -15579 25000
9     30  194  dddb1af2422537f200dd61665cd54265 Player 16       10      192.168.0.15:52       -23523 25000";

preg_match_all('#^(\d+)\s+(\d+)\s+(\d+)\s+([0-9a-f]+)\s+((?!\s{2,}).+?)\s{2,}(\d+)\s+([^\s]+)#m', $data, $matches, PREG_SET_ORDER);
$count = count($matches);
for ($a = 0 ; $a < $count ; $a++) {
    $matches[$a][0] = preg_replace('#\s{2,}#', ' ', $matches[$a][0]);
}
echo '<pre>'.print_r($matches, true);

 

Sample output:

Array
(
    [0] => Array
        (
            [0] => 10 7 95 b8a09b5924fa2e5b79a56b9ae61c0954 Player 1 0 192.168.1.1:28960
            [1] => 10
            [2] => 7
            [3] => 95
            [4] => b8a09b5924fa2e5b79a56b9ae61c0954
            [5] => Player 1
            [6] => 0
            [7] => 192.168.1.1:28960
        )

    [1] => Array
        (
            [0] => 11 25 37 0a6945dacb246b04d2a63daf3c50a877 Player 2 5 192.168.0.1:28960
            [1] => 11
            [2] => 25
            [3] => 37
            [4] => 0a6945dacb246b04d2a63daf3c50a877
            [5] => Player 2
            [6] => 5
            [7] => 192.168.0.1:28960
        )
..... etc ...

Link to comment
Share on other sites

@nrg_alpha

Your solution worked well. Though I had to modify it slightly.

#^\s*(\d+)\s+(\d+)\s+(\d+)\s+([0-9a-f]{32})\s+((?!\s{2,}).{1,15})\s{2,}(\d+)\s+([^\s]+)#m

 

Probably like most people I still don't understand regular expressions that well. The part I don't understand is:

((?!\s{2,}).{1,15})
([^\s]+)#m

 

Link to comment
Share on other sites

@nrg_alpha

Your solution worked well. Though I had to modify it slightly.

#^\s*(\d+)\s+(\d+)\s+(\d+)\s+([0-9a-f]{32})\s+((?!\s{2,}).{1,15})\s{2,}(\d+)\s+([^\s]+)#m

 

Probably like most people I still don't understand regular expressions that well. The part I don't understand is:

((?!\s{2,}).{1,15})
([^\s]+)#m

 

 

Well, allow me explain ;)

 

((?!\s{2,}).{1,15}) The outer most parenthesis is a capture, meaning it will store what it matches within it into a variable.. the (?!\s{2,}) part is a negative look ahead assertion.. what this means is that the regex engine checks to see if what comes next is not a white space character (that's that \s is, a shortcut character class that represents white space characters, like space, tab, carriage return, newline, etc..) two or more consecutive times.. so (?!\s{2,}).{1,15} collectively, this is saying, so long as it doesn't find 2 or more consecutive spaces, match any character (other than newline, as this is what the dot does), 1 - 15 consective times.. but I must say, my pattern takes care of this by using ((?!\s{2,}.+?)\s{2,}, which basically lazily matches up the starting point of 2 (or more) consecutive spaces.

 

Finally, the explanation for ([^\s]+)#m

Again, the parenthesis is a capture that will store what it finds into a variable.. the [^\s]+ is a negated character class that states, capture anything that is NOT a whitespace character (the ^ at the start of the [...] character class makes it negative) one or more consecutive times.

The # is the closing delimiter (you'll notice that the entire regex pattern starts and ends with #.. you can have a look at pcre to read up on regex and general, along with our resources page (which has a thread that discusses delimiters).

 

The m after the closing delimiter is for multi line more.. because the entire pattern starts with ^ (which typically means at the start of the entire subject), I want to check at the beginning of the each line, so the m modifier does just that.

 

Hope that all makes sense.

Glad you got it worked out though.

Link to comment
Share on other sites

I added the {1,15} since user names must be at lease one letter but can not exceed 15 characters.

I also need many checks since the responses are coming over UDP which has a tendency to return the packets out of order. It needs to be able to salvage good entries but ignore corrupt ones.

Link to comment
Share on other sites

Since you know that there will be a number after 'Player' why not let it search for digits.

 

Try this:

 

/^\s*(\d+)\s*(\d+)\s*(\d+)\s*([0-9a-f]*)\s*(.*?\d*)\s*(\d*)\s*(\d*)$/

 

Thought I should reply to this. The player and IP categories have been modified so that I'm not giving out peoples names with their guid and IP. The name could be anything.

Link to comment
Share on other sites

Since you know that there will be a number after 'Player' why not let it search for digits.

 

Try this:

 

/^\s*(\d+)\s*(\d+)\s*(\d+)\s*([0-9a-f]*)\s*(.*?\d*)\s*(\d*)\s*(\d*)$/

 

Thought I should reply to this. The player and IP categories have been modified so that I'm not giving out peoples names with their guid and IP. The name could be anything.

Ohh.. i misunderstood.

Anyways, you got your answer.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.