Jump to content

A standard function to strip attributes from a HTML tag


Recommended Posts

The following comes close to achieving this:

 

First I remove all tags except for ul,li and br from my HTML string ($desc).  So I can keep a clients bullet point format, and line breaks at the very least.  The preg_replace functions are meant to remove any attributes left, eg that may be present in the li tags.

 

$output = strip_tags($desc,'<ul><li><br>');
$output = preg_replace("/<([^\s>]+)[^>]+\/>/i", "<\\1 />", $output);
$output = preg_replace("@<([^\s>]+)[^>]+[^\/]>@si", "<\\1>", $output);

 

The problem is that my clients source data does not put </li> tags after each li section. And I think this is causing the above to screw up. Because instead of putting the </li> and </ul> and the end, it just puts </></>.

 

Can anyone advise?

 

Someone posted the following function on a website, but it doesnt work.

 

function strip_attributes($msg, $tag, $attr, $suffix=""){

/* 
$msg. The text you want to strip attributes from. 
$tag. The tag you want to strip attributes fom (p, for instancee). 
$attr. An array with the name of the attributes you want to strip (leaving the rest intact). If the array is empty, the function will strip all attributes. 
$suffix. An optional text to append to the tag. It may be a new attribute, for instance
*/

$lengthfirst = 0;
while (strstr(substr($msg, $lengthfirst), "<$tag ") != "")
{
$tag_start = $lengthfirst + strpos(substr($msg, $lengthfirst), "<$tag ");

$partafterwith = substr($msg, $tag_start);

$img = substr($partafterwith, 0, strpos($partafterwith, ">") + 1);
$img = str_replace(" =","=", $img);

$out = "<$tag";
for($i=0; $i < count($attr); $i++)
{
if (empty($attr[$i])) {
continue;
}
$long_val =
(strpos($img, " ", strpos($img, $attr[$i] . "=")) === FALSE) ?
strpos($img, ">", strpos($img, $attr[$i] . "=")) - (strpos($img, $attr[$i] . "=") + strlen($attr[$i]) + 1) :
strpos($img, " ", strpos($img, $attr[$i] . "=")) - (strpos($img, $attr[$i] . "=") + strlen($attr[$i]) + 1);
$val = substr($img, strpos($img, $attr[$i] . "=" ) + strlen($attr[$i]) + 1, $long_val);
if (!empty($val)) {
$out .= " " . $attr[$i] . "=" . $val;
}
}
if (!empty($suffix)) {
$out .= " " . $suffix;
}

$out .= ">";
$partafter = substr($partafterwith, strpos($partafterwith,">") + 1);
$msg = substr($msg, 0, $tag_start). $out. $partafter;
$lengthfirst = $tag_start + 3;
}
return $msg;
}

 

Can anyone understand why it doesnt work?

All HTML tag names consist of letters. Therefore, the pattern matches any number of consecutive letters and excludes anything afterwards up to the end of the tag. You may want an additional expression to remove tags that begin with <!.
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.