JD* Posted August 8, 2008 Share Posted August 8, 2008 I'm trying to figure out the best way to do some replacement on a file. I have an HTML page that I'm reading into my script, tearing out some code I don't want and then posting it again. I currently have it working with a lot of str_replace statements, but I wanted to see if I could trim this up. I'm removing class and style tags from table code (so tags: table, th, tr and td), and right now I have all searches hard coded but I wanted to make it more dynamic. Here is an example of some of the code that I'm working with (it's a tv guide type thing): <th style="background-color: #ffcc77; font-weight: normal; font-size: 14px; margin: 0px; padding: 2px 0px 2px 5px; widt h: 9em;">Time</th> <tr> <td style="vertical-align: top; background-color: #ededed;">09:00:00 AM</td> <td style="vertical-align: top; background-color: #ededed;"> <table style="width: 100%; padding: 0px; margin: 0px;"> <tr> <td class="program_guide_filler" style="height: 40px; background-color: #ededed; border-top: thick solid #FFCC77;"> <span style="font-weight: bolder;">Spring Sports Awards 2008</span> <br/> 1 hour and 49 minutes </td> </tr> </table> </td> So what I'd like to do is find a better way to say "Anything between a < >, remove style and class tags. I've been reading up on the three subject-referenced functions but got lost real quick. Thanks! Quote Link to comment Share on other sites More sharing options...
thebadbad Posted August 8, 2008 Share Posted August 8, 2008 To remove the style attribute from all tags, this regex will work: <?php $str = 'html code'; $str = preg_replace('~<([^>]*?) style=".*?"~is', '<$1', $str); ?> Just swap style with any other attribute you want to remove. The attribute values need to be enclosed in double quotes in order for it to work. Else just swap the double quotes with escaped single quotes, if single quotes is used. Breakdown of the regex: ~pattern delimiter <literal match ([^>]*?)a non-greedy match of zero or more of any character other than > enclosed in parens to save what's matched, and use it in the replacement ($1) spaceliteral space style=".*?"literal match of style=", then zero or more of any character until the closing double quote is reached ~pattern delimiter ispattern modifiers As you can see, I'm replacing the matched portions with <$1, because we don't want to remove the opening < and what's saved in $1 (see above). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.