Jump to content

need help with removing certain chunks of html code


tryingtolearn

Recommended Posts

Im wondering if there is a way to accomplish this.

If a user inputs some html code I only want to accept everything that would be inbetween the body tags.

If they input
[code]<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>test</title>
<link href="style.css" rel="stylesheet" type="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body disabled leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="center">
<table width="800" border="0" cellspacing="0" cellpadding="0">
<tr>
<td bgcolor="FE0000"><img src="img/logo2.gif" width="371" height="128"></td>
</tr>
<tr>
<td bgcolor="F0F0F0">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="1"></td>
<td width="258" valign="top"><a href="www.birddogsgarage.com">BDG</a></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>[/code]

I need to remove
[code]<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>test</title>
<link href="style.css" rel="stylesheet" type="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body disabled leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">



</body>
</html>[/code]

But keep
[code]
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="center">
<table width="800" border="0" cellspacing="0" cellpadding="0">
<tr>
<td bgcolor="FE0000"><img src="img/logo2.gif" width="371" height="128"></td>
</tr>
<tr>
<td bgcolor="F0F0F0">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="1"></td>
<td width="258" valign="top"><a href="www.birddogsgarage.com">BDG</a></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
[/code]

The problem that I am running into is stripping off the top portion because the body tag can have alot of variation and the content itself can start with anything.

This is the closest I have come
but it still leaves the  closing head tag and the body tag in place -

[code]<?php
$source = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>test</title>
<link href="style.css" rel="stylesheet" type="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body disabled leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
<p><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="center">
<table width="800" border="0" cellspacing="0" cellpadding="0">
<tr>
<td bgcolor="FE0000"><img src="img/logo2.gif" width="371" height="128"></td>
</tr>
<tr>
<td bgcolor="F0F0F0">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="1"></td>
<td width="258" valign="top"><a href="www.birddogsgarage.com">BDG</a></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>' . $source;
$output = strstr ($source,"</head>");
$output = substr ($output, 0, strpos ($output,"</body>"));
$file_source = highlight_string($output, true);
echo '<textarea name="150" rows="20" cols="75" >'.$output.'</textarea>';
?>[/code]


Any ideas would be greatly appreciated.
Do you want to strip all of the HTML tags? If you do, there is a PHP function called strip_tags(). You can also allow certain tags using this function, but you will have to read through the manual to figure out how it works :)

[url=http://us2.php.net/strip_tags]http://us2.php.net/strip_tags[/url]

Scot
As I understand it, the strip_tags() function will remove the entire HTML tag, for instance:

[code]
$text = '<p class="123">some text</p>';

$text = strip_tags($text);

echo $text;
[/code]

This should just echo 'some text'.

However, if you leave some tags in, you will also get the attributes of those tags in the '$text' variable.

You should just try it out and see if it works for you.

Scot
I stand corrected
You are right on the money
I was reading it wrong.

Now the only thing that is left behind is if the user has something in the title tag.

Guess I will have to find a way to strip that - and then use the strip_tags.

Back to the drawing board.
This did the trick - but it seemed to make more sense to add the tags that you dont want rather than create a list of all the tags to allow.

It also strips the text between open and closed tags For example the title.

Thanks for the push in the right direction Scot!  I appreciate it.

[code]<?php
  $source ='<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>test</title>
<link href="style.css" rel="stylesheet" type="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body disabled leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="center">
<table width="800" border="0" cellspacing="0" cellpadding="0">
<tr>
<td bgcolor="FE0000"><img src="img/logo2.gif" width="371" height="128"></td>
</tr>
<tr>
<td bgcolor="F0F0F0">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="1"></td>
<td width="258" valign="top"><a href="www.birddogsgarage.com">BDG</a></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>';
 
 
  function strip_selected_tags($source, $tags = '<html><head><title><link><meta><body><!>', $stripContent = true)
  {
      preg_match_all("/<([^>]+)>/i",$tags,$allTags,PREG_PATTERN_ORDER);
      foreach ($allTags[1] as $tag){
          if ($stripContent) {
              $source = preg_replace("/<".$tag."[^>]*>.*<\/".$tag.">/iU","",$source);
          }
          $source = preg_replace("/<\/?".$tag."[^>]*>/iU","",$source);
      }
      return $source;
  }
  $clean = strip_selected_tags($source);
  echo '<textarea name="150" rows="20" cols="75" >'.$clean.'</textarea>';


?>[/code]

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.