Jump to content

preg_replace search patterns.


guitarist809

Recommended Posts

Hello,

I was just looking at php.net's manual on preg_replace (http://www.php.net/preg_replace) and I am totally lost on how it works. For example:

<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
?> 

 

I dont even understand what /(\w+) (\d+), (\d+)/i even means, as I continued through the page, i saw this pattern "/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/','/^\s*{(\w+)}\s*=/"... woah...

 

If somebody could tell me or direct me to a link on how all this works, that would be great

 

thanks,

matt n

Link to comment
https://forums.phpfreaks.com/topic/37157-preg_replace-search-patterns/
Share on other sites

I think http://www.regexp.info is a good source for Regular Expressions.

 

Also, check out the Regex forums on this site.

 

The way preg_replace works is that it uses a Perl-compatible regular expression ($pattern) to search the string ($string), and replaces its matches with a replacement ($replacement).

The + will evaluate the associated character class (one before it, if I'm correct) 1 or more times. So if it can't find it, the whole pattern fails. The * will match it 0 or more times, so the whole pattern may not fail.

 

Those \ characters are escape characters. If you wanted to search for one of the regex reserved characters (such as + or *) you'd have to escape it first (like to match for a + sign you'd have to use \+).

 

The \w and \d are shorthands for character sets, if I remember correctly. A \d will match a digit, and \w will match A-Z or a-z, and depending on the engine _ and any digits.

 

For your *.txt example, it'll look for any character before a .txt any number of times. To break it down:

 

. is the anything matcher.

* will tell it to repeat any number of times

\. will escape the ., meaning look for a literal dot

txt will look for txt in that exact order.

 

I'm not sure what the / and /i do, though I know they're a pair.

Hmm, its starting to make some sense now :)

 

with the anything matcher (the .), does it match single characters such as 'a' or 'b', or does it match multicharacters, like 'abcd' or 'efgh'?

 

so, .*\.txt would match the following if there was a string like "one.txt two.txt. three.txt"

one.txt

two.txt

three.txt, or would it match like o.txt n.txt e.txt t.txt w.txt, etc...?

To find this format of date in a text you can tray that:

 

April 15, 2003

'[A-Za-z]{3,9}, [0-9]{1,2} [0-9]{4}' - this is a solution, I don't say it's perfect but you can tray it ;)

This mean:

[A-Za-z]{3,9}, - find string which length is between 3 and 9 charcters, with coma "," and space after that

[0-9]{1,2} - find 2 number with 1 or 2 digits and space

[0-9]{4} - find exactly 4 digit number

If you want questions you can ask :)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.