Jump to content

[SOLVED] Regex Engine Question


shane18

Recommended Posts

Can someone explain to me in high detail how:

 

<?
$CC = ".Username Reason is blah blah blah";
preg_match("/^([\.!#@\+^\$-_~]{1})(.+?)(.+?)$/", $CC, $CCM);
echo "<pre>";
print_r($CCM);
echo "</pre>";
?>

 

Makes this:

 

Array
(
    [0] => .Username Reason is blah blah blah
    [1] => .
    [2] => U
    [3] => sername Reason is blah blah blah
)

 

the outcome.

 

I know how to make this work the way I want it too, but that is not the question because I am trying to learn how the engine works inside and out. This is my last piece of the puzzle.

Link to comment
Share on other sites

Ok, well, [0] will always be the entire string matched

 

[1] is what was matched by the first bracket, which happened to be ([\.!#@\+^\$-_~]{1})

 

Now, ([\.!#@\+^\$-_~]{1}) means, 1 character (the {1} means one and only one), which has to be a . ! # + ^ $ - _ or ~. In this case it was .

 

[2] is what was matched by the second bracket which was (.+?) which means match anything, to any amount, so long as you match as little as you are required

 

[3] is the third bracket, which is the same as above

 

Now, [2] matched only one character because there was another (.+?) to let it stop, because it's lazy, it said "Well, it's your job to match now, I'm gonna sit down and have a cup of Coffee" simply because it was lazy enough to pass the job on as soon as it could. Since there was no other match orders after [3], [3] had to match the rest, because it .+ which meant anything, once or more

Link to comment
Share on other sites

Shane18,

 

To further expand on the explanation of things, I advise you to have a look at this thread, which explains things regarding .+  and .+? (in particular, read post #11 and #14).

 

Also note that in your pattern, you used the {1} (called an interval) after the character class (character class = [...] notation).. this is not necessary, as a character class already checks for a single character only.. so using [abc] will check for either an a, b or c at the current location in the source string, just as [abc]{1} will.

Intervals are more useful for things like {1,} (minimum one, or any additional amount - similar to the + quantifier), or say {2,7} (minimum 2, maximum 7) kind of thing. Simply using {1} is impractical, as whatever aspect of the pattern that precedes it will represent at least one.. so the pattern #sle{1}pt# is the same as simply using #slept#, as in both cases, a single 'e' is understood automatically.

 

As well, with regards to character classes, it is important to understand that most meta characters (meta characters are characters that have special meanings; examples are like the dot (which is a match_all character that typically matches any single character other than a newline by default)) lose their special meaning within a character class..(some meta characters can retain their special meaning, depending on their location within the character class) so for a literal dot in the character class, you don't need to escape it... (position of the dot in a character class doesn't matter).

 

Notice however the location of your hyphen (-) character in the class (this is where location in the character class becomes crucial). If you want to look for a literal hyphen, list it as the very first or very last character in the character class, otherwise you are creating a range instead. So in your case, you have \$-_ which creates a range from the dollar sign to the underscore, which would create undesirable results.. (much like [a-z] will look for a range from a all the way to z). Relocate that hyphen to the start or end, as this will be clear to the regex engine that this is not a range (as you won't have characters listed on both sides of it) and will instead force it to be treated as a literal.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.