scanreg Posted January 19, 2010 Share Posted January 19, 2010 I have the following pattern that is supposed to match any words that begin and end with the same letter: (^.).*/1$ results should be stuff like: blab wow However, I don't understand how the pattern above is limited to the same letter at each end of the word. I get that (^.) is the first character and that .*/1$ is the last character But how does the pattern above know that the first and last characters must match? Thanks Quote Link to comment Share on other sites More sharing options...
cags Posted January 19, 2010 Share Posted January 19, 2010 I'm not sure why it's captured like that, capturing of the caret doesn't seem required. Essentially speaking the full stop in the brackets will capture the first character in the provided input (the ^ ensuring you are matching against the start of the string). The .* will then match the rest of the string (assuming there are no \n characters in it) and the /1 is I suspect supposed to be a back reference, but I'm not sure what syntax it is using, with the PCRE engine (ie functions beginning preg_) I believe a back reference should be a forward slash i.e. \1. A back reference refers to a value previously captured in the pattern (in this case the first character of the string). The dollar sign signifies the end. Quote Link to comment Share on other sites More sharing options...
scanreg Posted January 19, 2010 Author Share Posted January 19, 2010 I'm not sure why it's captured like that, capturing of the caret doesn't seem required. Essentially speaking the full stop in the brackets will capture the first character in the provided input (the ^ ensuring you are matching against the start of the string). The .* will then match the rest of the string (assuming there are no \n characters in it) and the /1 is I suspect supposed to be a back reference, but I'm not sure what syntax it is using, with the PCRE engine (ie functions beginning preg_) I believe a back reference should be a forward slash i.e. \1. A back reference refers to a value previously captured in the pattern (in this case the first character of the string). The dollar sign signifies the end. Ah, so the back reference contains a numerical reference to the first character with the value 1 \1 That's how it knows to match the first char, am I on target? I think that's what you mean Thanks so much Quote Link to comment Share on other sites More sharing options...
cags Posted January 19, 2010 Share Posted January 19, 2010 No, the \1 refers to the first capture group, which means the first set of brackets. So whatever characters are captured in the first set of brackets will be placed back into the pattern wherever you place \1 in the pattern. Whatever is captured by the second set of brackets is placed back using \2 and so on. Quote Link to comment Share on other sites More sharing options...
scanreg Posted January 19, 2010 Author Share Posted January 19, 2010 No, the \1 refers to the first capture group, which means the first set of brackets. So whatever characters are captured in the first set of brackets will be placed back into the pattern wherever you place \1 in the pattern. Whatever is captured by the second set of brackets is placed back using \2 and so on. Super thanks :-) Quote Link to comment Share on other sites More sharing options...
JAY6390 Posted January 19, 2010 Share Posted January 19, 2010 Something like this should do it '/\b([a-z])[]a-z]*\1\b/i' Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted January 20, 2010 Share Posted January 20, 2010 Something like this should do it '/\b([a-z])[]a-z]*\1\b/i' Two issue though: a) I think you accidently inserted a ] in your a-z character class, and b) Your character class would suffice with zero times (due to using the * quantifier). So your pattern would be problematic in something like: $str = 'I took my dad to the bb range!'; preg_match_all('/\b([a-z])[a-z]*\1\b/i', $str, $matches); echo '<pre>'.print_r($matches[0], true); Both dad and bb will register in the $matches array. I suspect you would need to make the quantifier a + like so: [a-z]+ This would force the word to be at least 3 characters long (unless of course the OP doesn't mind matching "words" like bb, or qq, ii, etc... Quote Link to comment Share on other sites More sharing options...
salathe Posted January 20, 2010 Share Posted January 20, 2010 [ot]Surely words like "a" and "I" also start and end with the same letter, right? [/ot] Quote Link to comment Share on other sites More sharing options...
JAY6390 Posted January 20, 2010 Share Posted January 20, 2010 Yes, The ] shouldn't be there. It should be /\b([a-z])[a-z]*\1\b/i As for the *, the OP requested that a word begins and ends with the same letter, so it will work with any word two or more letters long. I suppose it should be made to work for words like "a" and "I" so it should be /\b([a-z])([a-z]*\1)?\b/i Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted January 21, 2010 Share Posted January 21, 2010 [ot]Surely words like "a" and "I" also start and end with the same letter, right? [/ot] Yeah, I suppose... lol [ot] Hmmm.. do words like"a" and "I" make great palindromes (much like PHP )? [/ot] Quote Link to comment Share on other sites More sharing options...
scanreg Posted January 21, 2010 Author Share Posted January 21, 2010 How cool! Given the single-letter-word issue, might there be a way to exclude single-letter-words but still find the others? (to prevent unintended results) Thanks so much :-) Quote Link to comment Share on other sites More sharing options...
JAY6390 Posted January 21, 2010 Share Posted January 21, 2010 yes, this will match any two or more letter words with the first and last letters matching /\b([a-z])[a-z]*\1\b/i so aa will match, bb, cc etc Quote Link to comment Share on other sites More sharing options...
scanreg Posted January 21, 2010 Author Share Posted January 21, 2010 Thanks :-) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.