Jump to content

preg_replace explanation of code


stuckwithcode

Recommended Posts

Could someone please explain how the code below works

 

$text = preg_replace('#[\r\n]+#', "\n", $text);

 

especially the bold part.

 

The regular expression is a sequence consisting of the character '#', the expression '[\r\n]+' with the replacement '\n'. Read up on preg_replace to learn more about this function.

Link to comment
Share on other sites

So I could use the code below instead of $text = preg_replace('#[\r\n]+#', "\n", $text);

 

$text = preg_replace('/[\r\n]+/', "\n", $text);

 

and the + means it looks for more than one occurrence of:

 

\r

\n

\r\n

\n\r

 

Or does the + means it will look for the different combinatons which you said were

 

\r

\n

\r\n

\n\r

 

Sorry if question sounds stupid.

Link to comment
Share on other sites

@stuckwithcode,

 

To further explain on delimiters, in pcre, patterns must be placed within those delimiters.. They can be any non alphanumeric, non whitespace characters (the first paragraph in the pcre introduction explains this).

The Carriage return (\r) and newline (\n) is within a character class (denoted by the [...] square brackets). What a character class does is match (or capture / depending on if the character class is placed within a capturing group or not) an individual character on a charcter by character basis (and by this I mean, many newcomers believe that it is a sequence of characters - which is false, and should not to be confused with an actual sequence of characters outside the character class); or if it is negated (via [^...]), match / capture something that is not within the class.

 

So to better illustrate, consider the following:

$str = 'I will be back soon!';
echo $str = preg_replace('#[ab]#', '*', $str);
//Output: I will *e **ck soon!

 

This will have the regex engine go through each character in the string $str one at a time, and if it is either an a or b, it will be replaced by an asterisk.

 

The + is the one ore more quantifier. Given the above sample, the only difference between [ab] and say [ab]+ is that with the quantifier, any consecutive a's or b's will be lumped together and replaced:

 

$str = 'I will be back soon!';
echo $str = preg_replace('#[ab]+#', '*', $str);
// Output: I will *e *ck soon! // note that unlike the first example, which has each a and b in 'back' converted, this one lumps them both together, then converts that.

 

If you find that reading the php manual isn't helping you much, perhaps you might consider the following links to help you out:

 

http://www.phpfreaks.com/tutorial/regular-expressions-part1---basic-syntax

http://www.regular-expressions.info/

http://weblogtoolscollection.com/regex/regex.php

 

Obviously, googling for more regex tutorials will yield to more results.. Personally, aside from online references, I am finding that a good way to understand it is to simply experiment.. play around with it. By changing  aspects of the string / pattern, you'll see the results first hand, which helps out a lot.

Link to comment
Share on other sites

Regarding pattern delimiters, there is now a page about them in the manual: http://php.net/regexp.reference.delimiters (no more linking to "the first paragraph in the introduction"!)

 

Nice to know there is a dedicated page for delimiters now, salathe. Enough people ask about them.. this way, they [both the delimiters and those who ask] will get the tender loving care / attention they deserve.

 

If the delimiter needs to be matched inside the pattern it will have to be escaped using a backslash.

 

To be pedantic, this isn't entirely true. In the event brackets are chosen as delimiters for some unknown, god forsaken reason, so long as you include an equal amount of opening / closing brackets of the same type within the pattern as those chosen as delimiters, those inner ones will be treated as literals.

 

A very unlikely and odd example:

$txt = 'some text <_garbage_> some more text <_some more garbage_> and thats it!';
echo $txt = preg_replace('<<_[^_]+_>>', '<whatever>', $txt);

 

Note that had that pattern been something like '<<[^>]+>>' instead, this would fail (an Unknown modifier '>' warning would ensue), as there is a mismatched character (one '>' character too many).

But perhaps since situations like this will hopefully never be done by members as regular practice (personally, I don't think it would be a wise idea to use brackets as delimiters, if only for the sake of keeping things uncomplicated), I agree with that statement in the manual as a 'generalized' truth,  but not an absolute one. Thus, that whole issue is probably not worth mentioning if the goal of that page is to keep things simple and generalized.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.