[SOLVED] How does this one work?


$string = <<<EOF
<img src="http://www.example.com/path/image.gif" height="500" width="500">
<img src="http://www.example.com/path/image2.gif" width="500" 
<img src="http://www.example.com/path/image3.gif" >
<img     height="500"     width="500" 
src="http://www.example.com/path/image4.gif"     >
$pattern = '/<img\s.*?src="(.*?)".*?>/s';
$repl = '<img src="createthumb.php?src=$1&w=100">';
echo preg_replace($pattern, $repl, $string );


Saw this example, didn't have an explanation o_o It makes literally no sense to me. (I know basics of regex) Can someone break it down? :-D

OK i hope this helps...


//Set String
$string = '
<img src="http://www.example.com/path/image.gif" height="500" width="500">
<img src="http://www.example.com/path/image2.gif" width="500" 
<img src="http://www.example.com/path/image3.gif" >
<img     height="500"     width="500" 
src="http://www.example.com/path/image4.gif"     >

//The Find
$pattern = '/<img\s.*?src="(.*?)".*?>/s';
//The Replace
$repl = '<img src="createthumb.php?src=$1&w=100">';
//Do the Find and Replace and print results
echo preg_replace($pattern, $repl, $string );



The Find

my sample Text is this

hello <img src="http://www.example.com/path/image.gif" height="500" width="500"> world

1. Match the characters


Now We have



2. Match a single "whitespace character" (spaces, tabs, line breaks, etc.) thats the \s

Now We have


(note the space at the end)


3. Match anything not a line break character, using lazyiness(the ? afte the * means lazy) .*?

Now We have



okay explain a little more, it matches until the next characters are found (thats a basic example of lazy) as the next set are 'src="' it finds them next thus no addions, so results is no different..


4. Match the characters 'src="' literally src="

Now we have

<img src=


5. Matches everything until " (6.) is found, and stores it as  backreference number 1 (.*?)

Now we have

<img src="http://www.example.com/path/image.gif

and backreference1 = http://www.example.com/path/image.gif

the reason we store it is because its in brackets (), i'm english we call them brackets ;)


6. Match the character "

Now we have

<img src="http://www.example.com/path/image.gif


7. Match any single character that is not a line break character .*?>/s

Match everything until > and then "whitespace character"


Okay that probably confused you, but i expect other members to help or if your confused to highlight what part.. but i'll continue..


now remeber the sample text was

hello <img src="http://www.example.com/path/image.gif" height="500" width="500"> world


the find found

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 

and stored


in backreference number 1


so we move on..


the Replace...

<img src="createthumb.php?src=$1&w=100">

what this does it replace what we found

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 


<img src="createthumb.php?src=$1&w=100">


Note the src=$1

the $1 means backreference number 1

so infact we replace

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 


<img src="createthumb.php?src=http://www.example.com/path/image.gif&w=100">



i'll await your questions

I'm a bit lost on two parts, the lazyness, and the backreferences (I've never heard of those.. ever)





I know the "." means to match any non-whitespace character, and the "*' means 0 or more times, but what the heck is going on with the question mark? Or why does it even make it "lazy" for that matter?



( [stuff] )


So how do you know if it's backreference2 or backreference# (unless it's based on first=first, second=second, and so on?) And you're able to extract any part of text you want using a backreference? Does BBcode use backreferences to make links and all that?



[url=someplace.com]Go to Some Place[/url]


and then the pattern would be:

$pattern = '/[url=(.*?)](.*?)[/url]/';


(I just tried to copy-paste from what was below and change it around o_o)


Just noticed the "/s" at the end as well, that's supposed to mean ignore cases?

your example didn't escape the ['s

this would work


data= "[url=test.com]test[/url]";
$result = preg_replace('%\[url=(.*?)\](.*?)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);


1. the Backreferance is in order

2. lazyness

from the example above


Now if the Expression was


the .* would be greedy and match everything including the ], thats no good..

so we use lazyness .*? which looks ahead and see we want to match a ], thus matches until it see a ] ahead..


i hope this makes sence..

as a side note

i would use this code

$result = preg_replace('%\[url=([^\]]*)\]([^[]*)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);


this thread could go on for weeks, a great book is O'REILLY, Mastering Regular Expression


EDIT: the /s means that the . (dot) also mateches new line

What you put here:

data= "[url=test.com]test[/url]";
$result = preg_replace('%\[url=(.*?)\](.*?)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);


I can see that you escaped all the brackets, but what the heck is with your modulus operators (% signs)? Also, it doesn't really make sense to escape the $ in "$1" part of the url, seeing how it's not really treating it as a special character of the regular expression, but rather as the literal variable. (If that made sense? xD)



$result = preg_replace('%\[url=([^\]]*)\]([^[]*)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);


Let me see if I can break it up:



No clue w/ %, but it escapes the bracket, treats it as a literal so it looks for "[url=" in the string.



The backreference1 is going to contain the following:

As many of anything except the "]" (hence the ^ part of it) and then a "]" at the end.



The backreference2 is going to contain the following:

What the heck is going on here? Everything except everything? Never saw a "[]" in RegExp before.



It's going to look for "[/url]", but I don't know what the %i at the end means o_o

What the heck is going on here? Everything except everything? Never saw a "[]" in RegExp before.

LOL, my bad.. that was incorrect, it should of been ([^\[]*),


the % just tell preg_replace that the expression has started or ended, the %i means end of the expression and use i (ignore case) the fact i used % doesn't make any differents you could us @ or | or #, anything you don't use in the expression..


\[ = escaped [


