Jump to content

[SOLVED] Regex Engine Question


shane18

Recommended Posts

Can someone explain to me in high detail how:

 

<?
$CC = ".Username Reason is blah blah blah";
preg_match("/^([\.!#@\+^\$-_~]{1})(.+?)(.+?)$/", $CC, $CCM);
echo "<pre>";
print_r($CCM);
echo "</pre>";
?>

 

Makes this:

 

Array
(
    [0] => .Username Reason is blah blah blah
    [1] => .
    [2] => U
    [3] => sername Reason is blah blah blah
)

 

the outcome.

 

I know how to make this work the way I want it too, but that is not the question because I am trying to learn how the engine works inside and out. This is my last piece of the puzzle.

Link to comment
https://forums.phpfreaks.com/topic/175314-solved-regex-engine-question/
Share on other sites

Ok, well, [0] will always be the entire string matched

 

[1] is what was matched by the first bracket, which happened to be ([\.!#@\+^\$-_~]{1})

 

Now, ([\.!#@\+^\$-_~]{1}) means, 1 character (the {1} means one and only one), which has to be a . ! # + ^ $ - _ or ~. In this case it was .

 

[2] is what was matched by the second bracket which was (.+?) which means match anything, to any amount, so long as you match as little as you are required

 

[3] is the third bracket, which is the same as above

 

Now, [2] matched only one character because there was another (.+?) to let it stop, because it's lazy, it said "Well, it's your job to match now, I'm gonna sit down and have a cup of Coffee" simply because it was lazy enough to pass the job on as soon as it could. Since there was no other match orders after [3], [3] had to match the rest, because it .+ which meant anything, once or more

Shane18,

 

To further expand on the explanation of things, I advise you to have a look at this thread, which explains things regarding .+  and .+? (in particular, read post #11 and #14).

 

Also note that in your pattern, you used the {1} (called an interval) after the character class (character class = [...] notation).. this is not necessary, as a character class already checks for a single character only.. so using [abc] will check for either an a, b or c at the current location in the source string, just as [abc]{1} will.

Intervals are more useful for things like {1,} (minimum one, or any additional amount - similar to the + quantifier), or say {2,7} (minimum 2, maximum 7) kind of thing. Simply using {1} is impractical, as whatever aspect of the pattern that precedes it will represent at least one.. so the pattern #sle{1}pt# is the same as simply using #slept#, as in both cases, a single 'e' is understood automatically.

 

As well, with regards to character classes, it is important to understand that most meta characters (meta characters are characters that have special meanings; examples are like the dot (which is a match_all character that typically matches any single character other than a newline by default)) lose their special meaning within a character class..(some meta characters can retain their special meaning, depending on their location within the character class) so for a literal dot in the character class, you don't need to escape it... (position of the dot in a character class doesn't matter).

 

Notice however the location of your hyphen (-) character in the class (this is where location in the character class becomes crucial). If you want to look for a literal hyphen, list it as the very first or very last character in the character class, otherwise you are creating a range instead. So in your case, you have \$-_ which creates a range from the dollar sign to the underscore, which would create undesirable results.. (much like [a-z] will look for a range from a all the way to z). Relocate that hyphen to the start or end, as this will be clear to the regex engine that this is not a range (as you won't have characters listed on both sides of it) and will instead force it to be treated as a literal.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.