Lexas Posted July 10, 2009 Share Posted July 10, 2009 Hello guys. I'm trying to use the Wordpress plugin WordPress Easy Contents, but it is with a "bug" that I'm trying to fix. The following expression is meant to catch HTML tags, like h1 for exemple preg_match_all('#\<'.$element.'>(.+?)\</'.$element.'>#si', $content, $matches, PREG_SET_ORDER); $element is the tag element that must be cautch, $cotent is the input text to be searched. The problem is this expressions only works if the tag has no ID and no class. For exemple, <h1> works, but <h1 class="anything"> doesn't. I've tried a lot of combinations to mean "anything from here until '>'" but nothing worked. Any idea of what can be used here? Quote Link to comment https://forums.phpfreaks.com/topic/165431-preg_match_all-passing-through-possible-classes/ Share on other sites More sharing options...
thebadbad Posted July 10, 2009 Share Posted July 10, 2009 preg_match_all("#<$element(\s+[^>]+)?>(.+?)</$element>#si", $content, $matches, PREG_SET_ORDER); Added an optional subpattern: 1 or more whitespace characters followed by 1 or more characters not a >. Quote Link to comment https://forums.phpfreaks.com/topic/165431-preg_match_all-passing-through-possible-classes/#findComment-872646 Share on other sites More sharing options...
nrg_alpha Posted July 10, 2009 Share Posted July 10, 2009 preg_match_all("#<$element(s+[^>]+)?>(.+?)</$element>#si", $content, $matches, PREG_SET_ORDER); Added an optional subpattern: 1 or more whitespace characters followed by 1 or more characters not a >. Conversely, you could also simply use: preg_match_all("#<$element[^>]*>(.+?)</$element>#si", $content, $matches, PREG_SET_ORDER); In your case, should there be some attribute(s) after the $element tag name, you will be capturing it. If there is no need to capture, you can use non-capturing parenthesis: (?: ... ), but I find simply using the negated character class easier. Quote Link to comment https://forums.phpfreaks.com/topic/165431-preg_match_all-passing-through-possible-classes/#findComment-873241 Share on other sites More sharing options...
thebadbad Posted July 11, 2009 Share Posted July 11, 2009 In your case, should there be some attribute(s) after the $element tag name, you will be capturing it. If there is no need to capture, you can use non-capturing parenthesis: (?: ... ), but I find simply using the negated character class easier. You're right that I should have used non-capturing parentheses, simply forgot it. Consider this sample string to see why I added the whitespace(s): <acronym title="PHP Freaks"><a href="http://php.net/">PHP</a>F</acronym> When $element = 'a', your pattern would (wrongfully) capture the green part. Quote Link to comment https://forums.phpfreaks.com/topic/165431-preg_match_all-passing-through-possible-classes/#findComment-873413 Share on other sites More sharing options...
nrg_alpha Posted July 11, 2009 Share Posted July 11, 2009 In your case, should there be some attribute(s) after the $element tag name, you will be capturing it. If there is no need to capture, you can use non-capturing parenthesis: (?: ... ), but I find simply using the negated character class easier. You're right that I should have used non-capturing parentheses, simply forgot it. Consider this sample string to see why I added the whitespace(s): <acronym title="PHP Freaks"><a href="http://php.net/">PHP</a>F</acronym> When $element = 'a', your pattern would (wrongfully) capture the green part. Right.. I see what your saying now. In that case, we could simply insert a \b word boundery inside the opening tag in the pattern: <$element\b[^>]*> This way, if $element = 'a', it will ignore tags like <acronym> or <abbr> for example and will find the actual anchor tags (and thus bypass the need for a group checking for a space, then anything not a >, all of which is optional). Quote Link to comment https://forums.phpfreaks.com/topic/165431-preg_match_all-passing-through-possible-classes/#findComment-873525 Share on other sites More sharing options...
thebadbad Posted July 11, 2009 Share Posted July 11, 2009 True, using a word boundary would be more appropriate Quote Link to comment https://forums.phpfreaks.com/topic/165431-preg_match_all-passing-through-possible-classes/#findComment-873535 Share on other sites More sharing options...
nrg_alpha Posted July 11, 2009 Share Posted July 11, 2009 It was a good catch on your part though.. looking at the OP's pattern, then looking at yours, I wasn't sure what you were getting at (hindsight has 20/20 vision they say ). Quote Link to comment https://forums.phpfreaks.com/topic/165431-preg_match_all-passing-through-possible-classes/#findComment-873537 Share on other sites More sharing options...
thebadbad Posted July 11, 2009 Share Posted July 11, 2009 I didn't go in detail on purpose, to let you figure it out yourself Quote Link to comment https://forums.phpfreaks.com/topic/165431-preg_match_all-passing-through-possible-classes/#findComment-873560 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.