Jump to content

Recommended Posts

<?php
$string = <<<EOF
<img src="http://www.example.com/path/image.gif" height="500" width="500">
<img src="http://www.example.com/path/image2.gif" width="500" 
height="500">
<img src="http://www.example.com/path/image3.gif" >
<img     height="500"     width="500" 
src="http://www.example.com/path/image4.gif"     >
EOF; 
$pattern = '/<img\s.*?src="(.*?)".*?>/s';
$repl = '<img src="createthumb.php?src=$1&w=100">';
echo preg_replace($pattern, $repl, $string );
?> 

 

Saw this example, didn't have an explanation o_o It makes literally no sense to me. (I know basics of regex) Can someone break it down? :-D

Link to comment
https://forums.phpfreaks.com/topic/73144-solved-how-does-this-one-work/
Share on other sites

OK i hope this helps...

 

<?php
//Set String
$string = '
<img src="http://www.example.com/path/image.gif" height="500" width="500">
<img src="http://www.example.com/path/image2.gif" width="500" 
height="500">
<img src="http://www.example.com/path/image3.gif" >
<img     height="500"     width="500" 
src="http://www.example.com/path/image4.gif"     >
';

//The Find
$pattern = '/<img\s.*?src="(.*?)".*?>/s';
//The Replace
$repl = '<img src="createthumb.php?src=$1&w=100">';
//Do the Find and Replace and print results
echo preg_replace($pattern, $repl, $string );
?> 

 

Okay..

The Find

my sample Text is this

hello <img src="http://www.example.com/path/image.gif" height="500" width="500"> world

1. Match the characters

<img

Now We have

<img

 

2. Match a single "whitespace character" (spaces, tabs, line breaks, etc.) thats the \s

Now We have

<img 

(note the space at the end)

 

3. Match anything not a line break character, using lazyiness(the ? afte the * means lazy) .*?

Now We have

<img 

,

okay explain a little more, it matches until the next characters are found (thats a basic example of lazy) as the next set are 'src="' it finds them next thus no addions, so results is no different..

 

4. Match the characters 'src="' literally src="

Now we have

<img src=

 

5. Matches everything until " (6.) is found, and stores it as  backreference number 1 (.*?)

Now we have

<img src="http://www.example.com/path/image.gif

and backreference1 = http://www.example.com/path/image.gif

the reason we store it is because its in brackets (), i'm english we call them brackets ;)

 

6. Match the character "

Now we have

<img src="http://www.example.com/path/image.gif

 

7. Match any single character that is not a line break character .*?>/s

Match everything until > and then "whitespace character"

 

Okay that probably confused you, but i expect other members to help or if your confused to highlight what part.. but i'll continue..

 

now remeber the sample text was

hello <img src="http://www.example.com/path/image.gif" height="500" width="500"> world

 

the find found

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 

and stored

http://www.example.com/path/image.gif

in backreference number 1

 

so we move on..

 

the Replace...

<img src="createthumb.php?src=$1&w=100">

what this does it replace what we found

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 

with

<img src="createthumb.php?src=$1&w=100">

BUT..

Note the src=$1

the $1 means backreference number 1

so infact we replace

<img src="http://www.example.com/path/image.gif" height="500" width="500"> 

with

<img src="createthumb.php?src=http://www.example.com/path/image.gif&w=100">

 

 

i'll await your questions

I'm a bit lost on two parts, the lazyness, and the backreferences (I've never heard of those.. ever)

 

Lazyness:

.*?

 

I know the "." means to match any non-whitespace character, and the "*' means 0 or more times, but what the heck is going on with the question mark? Or why does it even make it "lazy" for that matter?

 

Backreference:

( [stuff] )

 

So how do you know if it's backreference2 or backreference# (unless it's based on first=first, second=second, and so on?) And you're able to extract any part of text you want using a backreference? Does BBcode use backreferences to make links and all that?

 

IE:

[url=someplace.com]Go to Some Place[/url]

 

and then the pattern would be:

$pattern = '/[url=(.*?)](.*?)[/url]/';

 

(I just tried to copy-paste from what was below and change it around o_o)

 

Just noticed the "/s" at the end as well, that's supposed to mean ignore cases?

your example didn't escape the ['s

this would work

 

data= "[url=test.com]test[/url]";
$result = preg_replace('%\[url=(.*?)\](.*?)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);

 

1. the Backreferance is in order

2. lazyness

from the example above

(.*?)\]

Now if the Expression was

(.*)\]

the .* would be greedy and match everything including the ], thats no good..

so we use lazyness .*? which looks ahead and see we want to match a ], thus matches until it see a ] ahead..

 

i hope this makes sence..

as a side note

i would use this code

$result = preg_replace('%\[url=([^\]]*)\]([^[]*)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);

 

this thread could go on for weeks, a great book is O'REILLY, Mastering Regular Expression

 

EDIT: the /s means that the . (dot) also mateches new line

What you put here:

data= "[url=test.com]test[/url]";
$result = preg_replace('%\[url=(.*?)\](.*?)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);

 

I can see that you escaped all the brackets, but what the heck is with your modulus operators (% signs)? Also, it doesn't really make sense to escape the $ in "$1" part of the url, seeing how it's not really treating it as a special character of the regular expression, but rather as the literal variable. (If that made sense? xD)

 

 

$result = preg_replace('%\[url=([^\]]*)\]([^[]*)\[/url\]%i', "URL=\$1\r\nText=\$2", $data);

 

Let me see if I can break it up:

 

%\[url=

No clue w/ %, but it escapes the bracket, treats it as a literal so it looks for "[url=" in the string.

 

([^\]]*)\]

The backreference1 is going to contain the following:

As many of anything except the "]" (hence the ^ part of it) and then a "]" at the end.

 

([^[]*)

The backreference2 is going to contain the following:

What the heck is going on here? Everything except everything? Never saw a "[]" in RegExp before.

 

\[/url\]%i

It's going to look for "[/url]", but I don't know what the %i at the end means o_o

What the heck is going on here? Everything except everything? Never saw a "[]" in RegExp before.

LOL, my bad.. that was incorrect, it should of been ([^\[]*),

 

the % just tell preg_replace that the expression has started or ended, the %i means end of the expression and use i (ignore case) the fact i used % doesn't make any differents you could us @ or | or #, anything you don't use in the expression..

 

\[ = escaped [

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.