Jump to content

[SOLVED] How does this one work?


kratsg

Recommended Posts

<?php
$string = <<<EOF
<img src="http://www.example.com/path/image.gif" height="500" width="500">
<img src="http://www.example.com/path/image2.gif" width="500" 
height="500">
<img src="http://www.example.com/path/image3.gif" >
<img     height="500"     width="500" 
src="http://www.example.com/path/image4.gif"     >
EOF; 
$pattern = '/<img\s.*?src="(.*?)".*?>/s';
$repl = '<img src="createthumb.php?src=$1&w=100">';
echo preg_replace($pattern, $repl, $string );
?> 

 

Saw this example, didn't have an explanation o_o It makes literally no sense to me. (I know basics of regex) Can someone break it down? :-D

Link to comment
https://forums.phpfreaks.com/topic/73144-solved-how-does-this-one-work/
Share on other sites

OK i hope this helps...

 

<?php
//Set String
$string = '
<img src="http://www.example.com/path/image.gif" height="500" width="500">
<img src="http://www.example.com/path/image2.gif" width="500" 
height="500">
<img src="http://www.example.com/path/image3.gif" >
<img     height="500"     width="500" 
src="http://www.example.com/path/image4.gif"     >
';

//The Find
$pattern = '/<img\s.*?src="(.*?)".*?>/s';
//The Replace
$repl = '<img src="createthumb.php?src=$1&w=100">';
//Do the Find and Replace and print results
echo preg_replace($pattern, $repl, $string );
?> 

 

Okay..

The Find

my sample Text is this

hello <img src="http://www.example.com/path/image.gif" height="500" width="500"> world

1. Match the characters

<img

Now We have

<img

 

2. Match a single "whitespace character" (spaces, tabs, line breaks, etc.) thats the \s

Now We have

<img 

(note the space at the end)

 

3. Match anything not a line break character, using lazyiness(the ? afte the * means lazy) .*?

Now We have

<img 

,

okay explain a little more, it matches until the next characters are found (thats a basic example of lazy) as the next set are 'src="' it finds them next thus no addions, so results is no different..

 

4. Match the characters 'src="' literally src="

Now we have

<img src=

 

5. Matches everything until " (6.) is found, and stores it as  backreference number 1 (.*?)

Now we have

<img src="http://www.example.com/path/image.gif

and backreference1 = http://www.example.com/path/image.gif

the reason we store it is because its in brackets (), i'm english we call them brackets ;)

 

6. Match the character "

Now we have

<img src="http://www.example.com/path/image.gif

 

7. Match any single character that is not a line break character .*?>/s

Match everything until > and then "whitespace character"

 

Okay that probably confused you, but i expect other members to help or if your confused to highlight what part.. but i'll continue..

 

now remeber the sample text was

hello <img src="http://www.example.com/path/image.gif" height="500" width="500"> world

 

the find found

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 

and stored

http://www.example.com/path/image.gif

in backreference number 1

 

so we move on..

 

the Replace...

<img src="createthumb.php?src=$1&w=100">

what this does it replace what we found

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 

with

<img src="createthumb.php?src=$1&w=100">

BUT..

Note the src=$1

the $1 means backreference number 1

so infact we replace

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 

with

<img src="createthumb.php?src=http://www.example.com/path/image.gif&w=100">

 

 

i'll await your questions

I'm a bit lost on two parts, the lazyness, and the backreferences (I've never heard of those.. ever)

 

Lazyness:

.*?

 

I know the "." means to match any non-whitespace character, and the "*' means 0 or more times, but what the heck is going on with the question mark? Or why does it even make it "lazy" for that matter?

 

Backreference:

( [stuff] )

 

So how do you know if it's backreference2 or backreference# (unless it's based on first=first, second=second, and so on?) And you're able to extract any part of text you want using a backreference? Does BBcode use backreferences to make links and all that?

 

IE:

[url=someplace.com]Go to Some Place[/url]

 

and then the pattern would be:

$pattern = '/[url=(.*?)](.*?)[/url]/';

 

(I just tried to copy-paste from what was below and change it around o_o)

 

Just noticed the "/s" at the end as well, that's supposed to mean ignore cases?

your example didn't escape the ['s

this would work

 

data= "[url=test.com]test[/url]";
$result = preg_replace('%\[url=(.*?)\](.*?)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);

 

1. the Backreferance is in order

2. lazyness

from the example above

(.*?)\]

Now if the Expression was

(.*)\]

the .* would be greedy and match everything including the ], thats no good..

so we use lazyness .*? which looks ahead and see we want to match a ], thus matches until it see a ] ahead..

 

i hope this makes sence..

as a side note

i would use this code

$result = preg_replace('%\[url=([^\]]*)\]([^[]*)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);

 

this thread could go on for weeks, a great book is O'REILLY, Mastering Regular Expression

 

EDIT: the /s means that the . (dot) also mateches new line

What you put here:

data= "[url=test.com]test[/url]";
$result = preg_replace('%\[url=(.*?)\](.*?)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);

 

I can see that you escaped all the brackets, but what the heck is with your modulus operators (% signs)? Also, it doesn't really make sense to escape the $ in "$1" part of the url, seeing how it's not really treating it as a special character of the regular expression, but rather as the literal variable. (If that made sense? xD)

 

 

$result = preg_replace('%\[url=([^\]]*)\]([^[]*)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);

 

Let me see if I can break it up:

 

%\[url=

No clue w/ %, but it escapes the bracket, treats it as a literal so it looks for "[url=" in the string.

 

([^\]]*)\]

The backreference1 is going to contain the following:

As many of anything except the "]" (hence the ^ part of it) and then a "]" at the end.

 

([^[]*)

The backreference2 is going to contain the following:

What the heck is going on here? Everything except everything? Never saw a "[]" in RegExp before.

 

\[/url\]%i

It's going to look for "[/url]", but I don't know what the %i at the end means o_o

What the heck is going on here? Everything except everything? Never saw a "[]" in RegExp before.

LOL, my bad.. that was incorrect, it should of been ([^\[]*),

 

the % just tell preg_replace that the expression has started or ended, the %i means end of the expression and use i (ignore case) the fact i used % doesn't make any differents you could us @ or | or #, anything you don't use in the expression..

 

\[ = escaped [

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.