Jump to content

Archived

This topic is now archived and is closed to further replies.

michaellunsford

Truncating a varialbe that contains HTML

Recommended Posts

this is an interesting problem...

On a homepage, I'm putting some "teaser" text about an article that page contains. The problem is, the article contains HTML code for font colors, images, etc. so, if my truncated text happens to cut right in the middle of a tag, the "click here" href is affected.

here's some sample HTML code of the problem:
[code]<p><font... [<a href="moreinfo_51.html">read more</a>][/code]

the browser is ignoring the href, but picks up after the href's closing greater-than bracket.

The code I'm using is pretty simple:

[code]substr($my_members['notes'],0,strpos($my_members['notes']," ",120))[/code]

it looks for the first space after 120 characters, and calls that the break point. using striptags would eliminate the problem, but also strip out the formatting the client wants to see in the teaser text.

your ideas would be awesome.

Share this post


Link to post
Share on other sites
i've actually had to tackle this myself, and had to make two custom functions to fix it:

[code]<?php
function kill_broken_tags($string)
{
// check for any incomplete/broken tags right at the end of it - if there are any, remove them
$pattern = '@<[\/\!]*?[^<>]*?$@si';
return preg_replace($pattern, '', $string);
}

function get_open_tags($string, $tags = array('span', 'em', 'i', 'strong', 'b', 'u', 'table', 'tr', 'td', 'p', 'div'))
{
$unclosed_tags = array();
// go through each tag and check if there are any of its type that are unclosed
foreach ($tags AS $tagname)
{
// define the regex pattern to search for
$pattern = '@<'.$tagname.'[^<>]*?>.*?(?!</'.$tagname.'>)$@si';

// check for any matches to it
if (preg_match($pattern, $string) > 0)
{
// we've got a match, meaning that there are unclosed tags; append it
$unclosed_tags[] = $tagname;
}
}

// return the array if it has got unclosed tags; otherwise return a null
return (!empty($unclosed_tags)) ? $unclosed_tags : NULL;
}
?>[/code]

truncate your string, as you are now.  then run kill_broken_tags() on it, which will eradicate the remains of any tags that were only partially removed during the truncation (is that even a word?).  finally, run get_open_tags() on it, which returns a list of the tags that are unclosed in the string.  you may then go through the array returned (it will return NULL if there are no unclosed tags) and append the appropriate closing tags to the end of the string.

i'm not sure it's the most elegant solution, but as far as i've tested it (on HTML news articles), it works.  give it a whirl.

[b]EDIT:  i'm not sure how well this works for multiple unclosed tags of one type (ie. two open <span>s or whatever).  you can use preg_match_all(), which returns the total number of matches for that tag type, and add that as part of the information returned by the function.[/b]

[b]EDIT THE SECOND:  i've also just noticed (it's all in the details) that you're ending it at the first space.  i'd still run kill_broken_tags() to be safe, but it may not be necessary if the tags you allow never use attributes.[/b]

Share this post


Link to post
Share on other sites
thanks for the reply. by the way, can you run kill_broken_tags without running get_open_tags? I don't see where the two cross?

Share this post


Link to post
Share on other sites
well, there's no crossing between the two.  but if you have any tags that were chopped short (and broken), they wont be processed properly by the get_open_tags() function.  you'd want to remove them before checking for any unclosed tags.  they both have independent roles.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.