Very Very Simple Regex

sh0wtym3 · April 7, 2010

How can I find a string that starts with "A" and ends with "5"? I thought:

$string = "ABCDE12345";

if (preg_match("/(^A)(5$)/", $string)) {
echo "Match found.";
}

... Would work but it's not

premiso · April 7, 2010

if (preg_match('~^A.*5$~', $string)) {
    echo "Match Found.";
}

Un-tested and pending any silly errors I made should work like you want.

sh0wtym3 · April 7, 2010

That works! Thanks

sh0wtym3 · April 7, 2010

Sorry, I have another problem... Sometimes my string will have line breaks, so while:

$string = "ABCDEF12345";

if (preg_match('~^\A.*5$~', $string, $matches)) {
  echo "Match was found <br />";
  echo $matches[0];
}

... will work, this won't:

$string = "ABCDEF

12345";

if (preg_match('~^\A.*5$~', $string, $matches)) {
  echo "Match was found <br />";
  echo $matches[0];
}

How do I make it ignore the line breaks?

premiso · April 7, 2010

By using the s modifier:

if (preg_match('~^\A.*5$~s', $string, $matches)) {

Should resolve that issue as it treats the string as being on one line. It may do you good to look at the regex resources I posted in my signature, if you plan on using them regularly. I like the cheatsheet the best.

sh0wtym3 · April 7, 2010

Thanks, I will definitely check out your links as I want to get good at regex

sh0wtym3 · April 7, 2010

Sorry to keep resurrecting this thread, but I couldn't find any info on your links (or google for that matter) on how to search for a string that ends in a semicolon. I think regex uses semicolons as a delimiter which throws it off? But I'm not sure. Here is what I got:

$string = "ABCDEF;12345";

if (preg_match('~^\A.*;$~', $string, $matches)) {
  echo "Match was found <br />";
  echo $matches[0];
}

I tried escaping the semicolon but that didn't seem to work either:

$string = "ABCDEF;12345";

if (preg_match('~^\A.*\;$~', $string, $matches)) {
  echo "Match was found <br />";
  echo $matches[0];
}

sh0wtym3 · April 7, 2010

Nvm I figured it out myself. Here's the solution if anyone else is looking:

$string = "ABCDEF;12345";

if (preg_match('~^\A.*[;$]~', $string, $matches)) {
  echo "Match was found <br />";
  echo $matches[0];
}

Just enclose the ";$" in brackets

premiso · April 8, 2010

Just to correct your line of thinking:

if (preg_match('~^\A.*[;]$~U', $string, $matches)) {

Should work. The ^ says the first character of the searched string must be A and the $ says it must end with the character preceding it. Start of string End of string.

I am unsure of the ; is a special character in regex, but yes putting it inside the brackets will work just fine. I added the U after it to be un-greedy as if there are multiple ; in the line it will match to the last one. If you want that then simple remove the U.

sh0wtym3 · April 8, 2010

If you could help me with this I'd be grateful, I've been working on this script nearly (7?) hours now and I'm ready to pull my hair out. I've got most of it done, but here's the last string:

$FullString = ".sideAds#tripleCalc h3{	
display:block;
width:120px;
background-image:url(images/sideAdH3_blank-top.png);
background-repeat:no-repeat;
background-position:top center;
padding-top:15px;
background-color:#909;
font-size:12px;
text-align:center;
}	
.sideAds#tripleCalc .h3Floor{	
display:block;
width:120px;
height:14px;
background-image:url(images/sideAdH3_blank-floor.png);
background-repeat:no-repeat;
background-position:bottom center;
background-color:#909;
}	



.sideAds#CustomButton{
}
.sideAds#CustomButton a{
display:block;
}
.sideAds#CustomButton h3{	
display:block;
width:120px;
background-image:url(images/sideAdH3_blank-top.png);
background-repeat:no-repeat;
background-position:top center;
padding-top:15px;
background-color:#EEE;
font-size:12px;
text-align:center;
}	
.sideAds#CustomButton .h3Floor{	
display:block;
width:120px;
height:14px;
background-image:url(images/sideAdH3_blank-floor.png);
background-repeat:no-repeat;
background-position:bottom center;
background-color:#EEE;
}";

Out of that string, I need a RegEx that will replace only:

.sideAds#tripleCalc h3{	
display:block;
width:120px;
background-image:url(images/sideAdH3_blank-top.png);
background-repeat:no-repeat;
background-position:top center;
padding-top:15px;
background-color:#909;
font-size:12px;
text-align:center;
}

I came up with the following, but it replaces EVERYTHING instead of just the first part, I think it's being greedy like you mentioned and looking for the last "}"

$FullString = preg_replace('~^\.sideAds#tripleCalc h3{.*}$~s', "Now Im Gone", $FullString);

If you have paypal I can send you $10-$20 for all your help, I just really need to get this done tonight

premiso · April 8, 2010

$FullString = preg_replace('~^\.sideAds#tripleCalc h3{.*}$~sU', "Now Im Gone", $FullString);

If it being greedy is the case adding a simple U like done above should alleviate the problem.

sh0wtym3 · April 8, 2010

Nah, it's still replacing EVERYTHING with "Now Im Gone"

premiso · April 8, 2010

Sorry, my mind is a little slow at the moment.

$FullString = preg_replace('~(\.sideAds#tripleCalc h3{.*})~sU', "Now Im Gone", $FullString);

It was because you needed to match the pattern with the ( ) in order to replace with preg_replace. The preg_match did not require this because we did not want to get the variable out of it in a match. We just wanted to see if that variable was in the string.

sh0wtym3 · April 8, 2010

Yes!!

Do you have a Paypal account? And if so what's your paypal e-mail address

salathe · April 8, 2010

sh0wtym3, if you are still needing help please let us know. premiso's posts contain a number of errors and falsehoods which might distract your learning about regular expressions. I'd go through everything but a) nobody likes a smart-arse, and b) I'm supposed to be working. :shy:

premiso · April 8, 2010

I'd go through everything but a) nobody likes a smart-arse, and b) I'm supposed to be working.

Well if when you get a chance you can explain yourself, I would appreciate it. Maybe next time I won't be such a retard with my explanations

premiso · April 8, 2010

Let me try and redeem myself, if only a little bit:

$FullString = preg_replace('~\.sideAds#tripleCalc h3{.*}~sU', "Now Im Gone", $FullString);

Should work, as I stated before that the ( ) were needed, they were not. I am unsure of this, but they would only be needed if you wanted to reference the portion that matched the regex in the "replace with" portion.

The $ how you had it in this peice of code:

$FullString = preg_replace('~^\.sideAds#tripleCalc h3{.*}$~s', "Now Im Gone", $FullString);

Was causing it to match the final } because the $ indicated the end of the string and since we made the . ignore linebreaks with the s modifier, it was matching clear to the end. I still think you needed the U modifier, to make the matches un-greedy.

The s modifier causes the regex to match new line characters as well, as the . generally does not match the newline character. Using the s overrides this functionality and allows it to match new line characters.

If I had explained this poorly, or wrong please correct me Sally Hopefully I did not do too much damage to the OP's theory / usage of Regex by my poor mis-interpretations.

cags · April 8, 2010

Just to clarify a few points...

As premiso has corrected the brackets aren't required at all. When using preg_replace the entire matched pattern will b replaced, the only reason to use parenthesis at all is if you wish to include part of the original pattern in the replacement. There is no real reason to ever create a capture group around the entire pattern whether it be preg_match or preg_replace, the reason for this is that even if you require the contents of the matched data, this is automatically stored in capture group 0 and can be referenced in a replacement using \0.

The $ was indeed the culprit in the situation, but not exactly because it attempted to match the last }, what it was actually checking was whether the 'pointer' was at the end of the string at the point that the rest of the pattern had finished matching (semantics really, I think premiso knew what he meant). For the reason it is indeed required to make either the entire pattern ungreedy using the U modifier, just that section by using .*? or as an alternative in this case you could have used {[^}]*} which would avoid the need for both the s modifier and the U modifier.

As described by premiso the s modifier does indeed enable 'single line mode'. This is how I remember it as it's easy to associate the s with single. In essence all it actually does it makes the . match newline characters, which it doesn't do by default.

Disclaimer:- Salathe may still find an issue with what I've said, or indeed something we've not even mentioned, but it's been a long day he's been doing it longer than us and frankly knows a lot more about Regex than pretty much anyone else on PHPF. I've learned most of what I know from him correcting my posts, so it's all gravy.

salathe · April 8, 2010

Great work guys with those latest couple of posts, you've covered key points that I was going to mention.

However, I'd still like to jump on a few other notes and possibly touch on what you've already discussed. Harking back to post #9, the following pattern was offered: ~^\A.*[;]$~U When using regular expressions in PHP there are often many ways to do the same thing. Different ways are not necessarily wrong or right, so long as they get the job done! There are a number of 'special' characters to be considered when writing our patterns and indeed the context in which characters are placed can give them different meanings. As an example, the dot character (.) is generally used to match "anything except a newline character". This meaning is not true within a character class ([.]) yet oftentimes there is the tendancy to, absolutely needlessly, escape the dot (which outside of a character class would cause it to lose any special meaning and simply match a dot) within character classes ([\.]). A similar situation has occurred (unless I'm mistaken) with the \A of premiso's regex.

I believe he was attempting to match a literal capital A character yet, whether it was a typo of deliberate escaping, did not do so. See \A is one of the magical, mystical special sequences of characters: it matches the very start of the subject string (like ^ when not using the multiline pattern modifier). To this end, the first part of the regex (^\A) is effectively doing the same thing twice: making sure that we're currently at the start of the subject string. Happily, the following .* allowed any starting letter A to get matched to make it appear that the regex was doing what it was intended to do. Note however, that it will also match "Oops;" which was not intended.

The next part of that regex is [;]. The semicolon was enclosed in square brackets because, by his own admission, premiso was not sure whether it is a special character. It does not have any special meaning, anywhere. So that's one regex down and a lot of waffle for a 12-character pattern.

The later patterns have been covered very well in the previous couple of posts. The key points being that the parentheses around the later regex were not necessary since a) the entire match can always be referenced with $0 or \0 in the replacement string, and b) we did not want to use the matched text anyway.

As cags put forward, it would also have been a good idea to use something like [^}]* in place of .*—it is good practice to try to be as explicit as possible with regards to what you want to match (in this case, declaring precisely what we don't want to match). This would have helped with the issue of matching too much (it simply would not have happened) and the changes to the pattern as a result of that. Whilst on the subject of being as specific as possible, those ^ and $ at the start and end of the regex could have been put to good use. Aside from matching the very beginning and very end* of the entire subject string, they can be used to match the start and end of any line within the subject by using the m modifier. This would have been useful to, for example, make sure that the closing brace (}) was on a line of its own (e.g. …^}$) or to make sure that the CSS selector was at the start of a line (^\.sideAds…). Also note that \A could have been used to make sure the select was at the very beginning of the subject string (\A\.sideAds…).

That's enough; you're probably bored by now! To lighten the mood, here's an animated badger:

^{(did your eyes skip here before reading everything?.. get back to reading!)}

* I've posted about this before by, by default, the $ character will match the very end of the string or immediately before a trailing newline character (in other words /cake$/ will match "cake" when given the string "pancake\n"). To not allow matching if there is a trailing newline, use the D pattern modifier ala. /cake$/D—this has obvious repercussions if the $ is being used in multiline mode.

Sign In

Very Very Simple Regex

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information