Jump to content

Help with str_replace or preg_replace or ereg_replace...


JD*

Recommended Posts

I'm trying to figure out the best way to do some replacement on a file. I have an HTML page that I'm reading into my script, tearing out some code I don't want and then posting it again.

 

I currently have it working with a lot of str_replace statements, but I wanted to see if I could trim this up. I'm removing class and style tags from table code (so tags: table, th, tr and td), and right now I have all searches hard coded but I wanted to make it more dynamic.

 

Here is an example of some of the code that I'm working with (it's a tv guide type thing):

<th style="background-color: #ffcc77; font-weight: normal; font-size: 14px; margin: 0px; padding: 2px 0px 2px 5px; widt
h: 9em;">Time</th>
<tr>
      <td style="vertical-align: top; background-color: #ededed;">09:00:00 AM</td>
      <td style="vertical-align: top; background-color: #ededed;">
        <table style="width: 100%; padding: 0px; margin: 0px;">
<tr>
<td class="program_guide_filler" style="height: 40px; background-color: #ededed; border-top: thick solid #FFCC77;">
<span style="font-weight: bolder;">Spring Sports Awards 2008</span>
<br/> 1 hour and 49 minutes
</td>
</tr>
</table>
</td>      

So what I'd like to do is find a better way to say "Anything between a < >, remove style and class tags. I've been reading up on the three subject-referenced functions but got lost real quick.

Thanks!

Link to comment
Share on other sites

To remove the style attribute from all tags, this regex will work:

 

<?php
$str = 'html code';
$str = preg_replace('~<([^>]*?) style=".*?"~is', '<$1', $str);
?>

 

Just swap style with any other attribute you want to remove. The attribute values need to be enclosed in double quotes in order for it to work. Else just swap the double quotes with escaped single quotes, if single quotes is used.

 

Breakdown of the regex:

 

~pattern delimiter

<literal match

([^>]*?)a non-greedy match of zero or more of any character other than >

enclosed in parens to save what's matched, and use it in the replacement ($1)

spaceliteral space

style=".*?"literal match of style=", then zero or more of any character until the closing double quote is reached

~pattern delimiter

ispattern modifiers

 

As you can see, I'm replacing the matched portions with <$1, because we don't want to remove the opening < and what's saved in $1 (see above).

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.