ragax

Members

View Profile See their activity

Posts
186
Joined
December 20, 2011
Last visited
April 15, 2014

Content Type

All Activity

Profiles

Forums

Topics
Posts

Everything posted by ragax

Prev
1
2
3
4
5
6
7
8
Next
Page 5 of 8

Tut: Two Little-Known but Way-Cool Features of PHP Regex

ragax replied to ragax's topic in Regex Help

...part 2 of this post: B. (?| ) LETS YOU USE ONE GROUP NUMBER FOR MULTIPLE CAPTURES Sometimes, you have data where what you want to capture almost fits in one set of parentheses. Almost: you can fit it in an alternation, but you end up using multiple sets of parentheses. As a result, your data can find itself in Group 1, Group 2, Group 3... You don't know. To sort it out, you have to write some code to look at the array of results. Here's an example: (?:shipping (\w+)|mailing (\w+) to): \w+ This could match either of these strings: shipping books: today mailing books to: john In both cases, "books" would be captured... But in Group 1 for the first case, and Group 2 for the second. That's because group numbers are set from left to right as you read the regex, whether or not they are set. On the second string, Group 1 is not set, but Group 2 captures "books". You'll have to sort that out in PHP by examining your matches. Well, PCRE has a magical feature that lets you capture "books" in Group 1 in both cases, even though they are captured by different sets of parentheses!!! That syntax is (?|, and it allows you to "reset" a capture group once you pass "|", the alternation marker. Here's the piece of magic syntax that always returns "books" in Group 1: (?|shipping (\w+)|mailing (\w+) to): \w+ Here's code to test it: <?php $regex=',(?|shipping (\w+)|mailing (\w+) to): \w+,'; preg_match($regex, 'shipping books: today', $match); echo $match[1].' '; preg_match($regex, 'mailing books to: john', $match); echo $match[1].' '; ?> Output: books books In both cases, "books" is found in $match[1], which is the content of Group 1! This feature is briefly mentioned on the PHP manual's subpattern page. See my page on regex (? syntax disambiguation for more on PCRE (?| group reset syntax. Wishing you all a fun weekend!
- January 28, 2012
- 1 reply
Tut: Two Little-Known but Way-Cool Features of PHP Regex

ragax posted a topic in Regex Help

Greetings, PHP heads! A couple days ago, during a full revamp of my regex tutorial (see sig), I added two very sweet regex features that I haven't seen discussed in the online PHP world. These are features that I found buried in the PCRE documentation. One of them is briefly mentioned in the PHP manual. The other might be too, but I couldn't find it. I thought I'd make a quick tut on the forum to share these two "secret features" of PHP regex with my fellow regex lovers. A. (?(DEFINE)) LETS YOU REUSE A PATTERN You already know how a back-reference (named or numbered) lets you match a literal string previously captured by a set of parentheses: Both (\d\d) abc \1 and (?P<Nb>\d\d) abc (?P=Nb) will match "12 abc 12", where "12" is captured in Group 1 or in a named capture group: "Nb" What if instead of referring to a string already captured, you could refer to a regex pattern? This is what (?(DEFINE)) does. In the following example, the DEFINE statement defines "phone" as this regex pattern: (?:Tel|Fax):[ ]415-\d{3}-\d{4} Then the regex uses this defined phone pattern multiple times in the expression. Run this: <?php $pattern=',(?x)(?(DEFINE)(?<phone>(?:Tel|Fax):[ ]415-\d{3}-\d{4})) ^start[ ](?&phone)[ ]////[ ](?&phone) [ ]----[ ]((?&phone)),'; $string = 'start Tel: 415-555-1212 //// Fax: 415-555-0000 ---- Fax: 415-555-9999'; if(preg_match($pattern, $string, $match)) echo 'Properly formatted string. The third number is: '.$match[2].' '; ?> Three phone numbers are matched, but the pattern to match a phone number is only given once! Note that the third number is captured by an additional set of parentheses. It is actually Group 2, because the DEFINE statement consumes one group. Now, you could also accomplish this with a repeating expression: the (?1) syntax. For instance: <?php $string = 'start Tel: 415-555-1212 //// Fax: 415-555-0000 ---- Fax: 415-555-9999'; $pattern=',(?x) ^start[ ]((?:Tel|Fax):[ ]415-\d{3}-\d{4}) [ ]////[ ](?1) [ ]----[ ]((?1)),'; if(preg_match($pattern, $string, $match)) echo 'Properly formatted string. The third number is: '.$match[2].' '; ?> Or, using Oniguruma-style named capture: <?php $string = 'start Tel: 415-555-1212 //// Fax: 415-555-0000 ---- Fax: 415-555-9999'; $pattern=',(?x) ^start[ ](?<Phone>(?:Tel|Fax):[ ]415-\d{3}-\d{4}) [ ]////[ ]\g<Phone> [ ]----[ ](\g<Phone>),'; if(preg_match($pattern, $string, $match)) echo 'Properly formatted string. The third number is: '.$match[2].' '; ?> So what is the benefit of the DEFINE syntax over these other techniques? Well, if you wanted, you could set up all your definitions at the beginning of the expression, which could be handy for a long regex! (?(DEFINE)(?<Gender>M|F)) (?(DEFINE)(?<Age>\b\d\d\b)) (?(DEFINE)(?<Name>\b[[:alpha:]]+\b)) Then you can pepper your names in the expression: (?&Age) to match an Age, (?&Gender) to match a Gender, and so on. Then, if you change your mind about a sub-pattern, all you have to do is change it at the top! See my page on regex (? syntax disambiguation for more on PCRE regex DEFINE syntax. As of Jan 28 2012, I couldn't find this feature on the PHP manual, but if you find it please let me know. In the next post, we will look at an even more interesting and useful feature of PHP regex: Capture Groups with Duplicate Numbers.
- January 28, 2012
- 1 reply
Splitting string into three parts

ragax replied to Mcod's topic in Regex Help

That's true: as I mentioned in the post, I just gave MCod part 1, part 2, and part 3 so he could do independent tests on these variables. (And potentially report to the person who submitted the data that one particular part is broken.) The post by AyKay discussed doing the same faster by using explode(). Adam is quite right that you can validate the entire string in one go: valid AND valid AND valid. If it fails, you don't know where, so it's up to you to choose the approach that works best for your needs. Nothing wrong with Adam's approach! MCod, small suggestions if you're going with Adam's expression: 1. You don't need the \ in the first bracket in front of the dot (\.) 2. You still need parentheses to capture the three parts since you said you wanted to split the string: 3. You said you want the last part to be 1, 2 or 3, but in the example you gave, $part3 was 0, so you may want to refine that in the last part of the regex, currently [1-3]. Wishing you all a fun weekend
- January 27, 2012
- 10 replies
Splitting string into three parts

ragax replied to Mcod's topic in Regex Help

Aha... so it's more like a magic trick than pure psychic ability??? Thank you for explaining your art to your public! ;-)
- January 25, 2012
- 10 replies
Splitting string into three parts

ragax replied to Mcod's topic in Regex Help

Ah, just saw AyKay47's post... He's so right, explode() or preg_split() is a great way to do it, and right again, I posted a regex 2 minutes after his message! I envy your psychic powers, AyKay. (And the clarity of mind to go to the easiest solution first.)
- January 25, 2012
- 10 replies
Splitting string into three parts

ragax replied to Mcod's topic in Regex Help

Hi MCod! Run this: Input: jim.h|1234567890123456|0 Code: <?php $regex=',([^|]*)\|([^|]*)\|(.*),'; $string='jim.h|1234567890123456|0'; $hit=preg_match($regex,$string,$part); if($hit) { echo "Part 1: "; if(isset($part[1])) echo $part[1]; else echo 'n/a'; echo ' '; echo "Part 2: "; if(isset($part[2])) echo $part[2]; else echo 'n/a'; echo ' '; echo "Part 3: "; if(isset($part[3])) echo $part[3]; else echo 'n/a'; echo ' '; } ?> Output: Part 1: jim.h Part 2: 1234567890123456 Part 3: 0 Then you can do all the tests you want on $part[1], $part[2] and $part[3]. Let me know if this works for you!
- January 25, 2012
- 10 replies
link replace but not mailto

ragax replied to drisate's topic in Regex Help

Good news, drisate, glad to hear it. :-)
- January 23, 2012
- 3 replies
link replace but not mailto

ragax replied to drisate's topic in Regex Help

First thing that comes to mind: Insert a negative lookahead in your working regex. #href=(?!"mailto)['|\"](.+?)['|\"]#
- January 23, 2012
- 3 replies
Optimized Replacing

ragax replied to Mcod's topic in Regex Help

Hi Mcod! The way your particular copyright symbols are encoded (not just ascii 169, but ascii 194 in front of it), I would go for something like this: Input: ©1 © leave it ©a ©abc ©2012 Code: <?php $regex=',[\xC2][\xA9]([[:alnum:]]),'; $string='©1 © leave it ©a ©abc ©2012 '; echo '<pre>'.htmlentities(preg_replace($regex, '©$1', $string)).'</pre>'; ?> Output: ©1 Â© leave it ©a ©abc ©2012 The weird Â character seems to be part of how your © seems is encoded (ascii 194 / xC2 in front of the ascii 169 / xA9). But I'm an old Ascii man, so don't ask me about character encoding! I'm sure many people here can explain. (Maybe you can!) If you like, you can take out the Â by replacing [\xC2][\xA9] with \xA9
- January 23, 2012
- 3 replies
Deleting many lines of text between specified characters?

ragax replied to terrypin's topic in Regex Help

Hi Terry, From what you sent, I'd say very basic. But maybe there's more. The expression I sent is meant to work with a full-blown regex flavor. The commas are delimiters. They're part of the php code I sent you. If you're not using php (although this is the phpfreaks forum), then omit the commas when you paste the expression in your tool. For instance it works in regexbuddy. (?sm) turns on "dot matches new line" and "multiline" modes [^[] Means anything that is not an opening square bracket. (The caret here stands for NOT) \r is a carriage return, whether you need \r\n or \n depends on your OS. \r\n for Windows. * means zero or more. That's what it means in .* and in [^[]* Hope this helps, don't hesitate to ask more.
- January 22, 2012
- 9 replies
Master regex person's help requested!

ragax replied to fapapfap's topic in Regex Help

You're very welcome, glad it helped.
- January 22, 2012
- 4 replies
Master regex person's help requested!

ragax replied to fapapfap's topic in Regex Help

Here you go, fapapfap. Run this code, let me know if it works for you. (There can be more or less space between the lines, it doesn't matter. Code: <?php $regex=',(?s)(?><tr>(?:[ \r\n]*)(?:<td.*?</td>(?:[ \r\n]*)){7})<td>[^>]+>([^<]+)(?:[^>]+>){4}([^<]+),'; $string='<tr> <td>12.34.56.78</td> <td>GB</td> <td>random things</td> <td>randomthings</td> <td>random things</td> <td>random things</td> <td></td> <td>30.9500</td> <td>-2.2000</td> <td>random things</td> <td>random things</td> <td></td> <td></td> </tr>'; preg_match($regex,$string,$match); echo $match[1].' '; echo $match[2].' '; ?> Output: 30.9500 -2.2000
- January 22, 2012
- 4 replies
Deleting many lines of text between specified characters?

ragax replied to terrypin's topic in Regex Help

Hi again Terry, If you don't have PHP, for the simple REPLACE approach I gave you above, I'd use a program that has regex search-and-replace capabilities. Two that I like: EditPadPro, Aba Search and Replace. There's also some regex replace functionality in some Adobe programs (Dreamweaver, Indesign). The regex flavor there is probably strong enough for the expression I gave you, which is fairly simple. Some of the IDEs have regex functionality: Code::Blocks, NetBeans. I haven't fully tested them. Let me know if you need any help with the two linked tools or the Adobe tools.
- January 22, 2012
- 9 replies
Deleting many lines of text between specified characters?

ragax replied to terrypin's topic in Regex Help

Hy Terrypin, Didn't have time to look at Joe's solution, rushing out, just wanted to give you a preg_replace option. You can run this php code. The Regex: ,(?sm)\[([^]]+.jpg)\].*?- COMMENT -(\r\n[^[]*), Code: <?php $regex=',(?sm)\[([^]]+.jpg)\].*?- COMMENT -(\r\n[^[]*),'; $string='[blackfordLane.jpg] File name = BlackfordLane.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 87, subsampling OFF Resolution = 96 x 96 DPI File date/time = 19/01/2012 / 15:01:23 - IPTC - Object Name - s bridge over the River Thames is not a footbridge but carries pipes. - COMMENT - Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton. [Castle Eaton Church.jpg] File name = Castle Eaton Church.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 87, subsampling OFF Resolution = 72 x 72 DPI File date/time = 19/01/2012 / 14:03:55 - EXIF - Make - FUJIFILM Model - FinePix2600Zoom Orientation - Top left XResolution - 72 YResolution - 72 ResolutionUnit - Inch - COMMENT - Castle Eaton Church [CastleEaton-2.jpg] File name = CastleEaton-2.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 75 Resolution = 0 x 0 DPI File date/time = 18/01/2012 / 15:40:05 - COMMENT - The Red Lion, Castle Eaton A warm welcoming pub on a cold winter\'s day, with the River Thames running at the bottom of the garden. '; $s=preg_replace($regex,'\1\2',$string); echo '<pre>'.$s.'</pre>'; ?> Output: BlackfordLane.jpg Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton. Castle Eaton Church.jpg Castle Eaton Church CastleEaton-2.jpg The Red Lion, Castle Eaton A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden. Didn't have time to look at the fine details, let me know if that works for you.
- January 22, 2012
- 9 replies
Limit E-mail to 60 Characters

ragax replied to doubledee's topic in Regex Help

I don't seem to have trouble hanging on to Smarties, it's more M&Ms that give me trouble. It's that crunchy peanut inside the chocolate...
- January 21, 2012
- 12 replies
Limit E-mail to 60 Characters

ragax replied to doubledee's topic in Regex Help

You are the real programmer, Debbie... regex is just my Sunday crossword.
- January 21, 2012
- 12 replies
another regex question...

ragax replied to mck.workman's topic in Regex Help

Darnit, McK, that's a disappointment. I thought I was helping you build a spam robot.
- January 21, 2012
- 20 replies
Limit E-mail to 60 Characters

ragax replied to doubledee's topic in Regex Help

Mmm... 1. You could limit the size of each component (e.g., the name) with a quantifier such as {2,10}. Not a solution that would impress Bill Gates. 2. You could write a horrible OR tree to specify each of the characters (if you had 200 years to live). 3. You could use a strlen to check the input programmatically. 4. And... your favorite, I am sure: just before the $, you could insert a (?<=^.{1,60}), which is a lookbehind. But not in PHP, as it doesn't allow variable-width lookbehinds (.NET does) I'll post more if they come to mind. Warmest wishes, A
- January 21, 2012
- 12 replies
Limit E-mail to 60 Characters

ragax replied to doubledee's topic in Regex Help

No, your {,60} quantifier applies to the whole expression, so it would allow up to 60 email addresses. Correct. There's no need to bother about the lower boundary of the quantifier. (As you already knew, seeing your quantifier.) Fabulous. Good to hear your voice, Debbie, talk to you soon. -A
- January 21, 2012
- 12 replies
Limit E-mail to 60 Characters

ragax replied to doubledee's topic in Regex Help

P.S.: It's the same principle as for your strong password thread. Hope it works for you, let me know if you run into any probs. Wishing you a fun weekend, Andy
- January 21, 2012
- 12 replies
Limit E-mail to 60 Characters

ragax replied to doubledee's topic in Regex Help

Hi Debbie, Try this. Without looking at the details of your expression, I inserted a lookahead at the very beginning. It checks that the string has between 1 and 60 characters. if (preg_match('#^(?=.{1,60}$)[A-Z0-9_\+-]+(\.[A-Z0-9_\+-]+)*@[A-Z0-9-]+(\.[A-Z0-9-]+)*\.([A-Z]{2,7})$#i', $trimmed['email'])){ It will match 123@5678901234567890123456789012345678901234567890123456.com but not 123@56789012345678901234567890123456789012345678901234567.com (One more digit before the .com)
- January 21, 2012
- 12 replies
another regex question...

ragax replied to mck.workman's topic in Regex Help

Okay, focus on this part of your expression: \d+((?<="<Send Email to ).+) After the digits (\d+), you want to match STUFF (.+) that is preceded by "<Send Email to But there is no such stuff. After the digits, you go straight to "<Send Email Let me explain in detail, as this is a key point of lookarounds. See, the lookbehind does not JUMP over characters. After the digits, the regex engine is standing between the 9 and the " At this stage, if you use a lookaround, you stay PLANTED in that position between the 9 and the " With a lookbehind, you look to the left for "<Send, and of course you're not going to find that, there are only digits. If you used a lookahead, you'd be looking to the right of that spot between 9 and ", so you'd be seeing a double quote and some stuff. And after each lookbehind or lookaround, you're still standing in the same spot! This might make your head spin for a moment because your current understanding of lookarounds is a different paradigm. It's like these images you can see with two geometries, with the stairs either going up or going down... Once it clicks, it will be clear as day. Ctrl + F conditionals on my Tut for more on this topic. (I'm doing a major revamp but it's not ready.) Talk soon bro!
- January 21, 2012
- 20 replies
another regex question...

ragax replied to mck.workman's topic in Regex Help

Ah, yes, I should go splash some cold water on my face to wake myself up. Can you paste some of the actual text that the pattern is supposed to match? Without that, I have a hard time troubleshooting an expression.
- January 21, 2012
- 20 replies
another regex question...

ragax replied to mck.workman's topic in Regex Help

Hey McK, If that's the actual code you're running, are you sure you have the right test string? For instance, I don't see SendEmail in the string.
- January 20, 2012
- 20 replies
another regex question...

ragax replied to mck.workman's topic in Regex Help

Hi McK, It looks to me like the quote in (?>=" closes the pattern string. On your earlier tests, you escaped the double quote, so it worked.
- January 20, 2012
- 20 replies

Prev
1
2
3
4
5
6
7
8
Next
Page 5 of 8

Sign In

ragax

Posts

Joined

Last visited

Content Type

Profiles

Forums

Everything posted by ragax

Tut: Two Little-Known but Way-Cool Features of PHP Regex

Tut: Two Little-Known but Way-Cool Features of PHP Regex

Splitting string into three parts

Splitting string into three parts

Splitting string into three parts

Splitting string into three parts

link replace but not mailto

link replace but not mailto

Optimized Replacing

Deleting many lines of text between specified characters?

Master regex person's help requested!

Master regex person's help requested!

Deleting many lines of text between specified characters?

Deleting many lines of text between specified characters?

Limit E-mail to 60 Characters

Limit E-mail to 60 Characters

another regex question...

Limit E-mail to 60 Characters

Limit E-mail to 60 Characters

Limit E-mail to 60 Characters

Limit E-mail to 60 Characters

another regex question...

another regex question...

another regex question...

another regex question...

Browse

Activity

Important Information