Jump to content


Photo

Truncating a varialbe that contains HTML


  • Please log in to reply
3 replies to this topic

#1 michaellunsford

michaellunsford
  • Members
  • PipPipPip
  • Advanced Member
  • 1,023 posts
  • LocationLouisiana, USA

Posted 19 July 2006 - 01:42 AM

this is an interesting problem...

On a homepage, I'm putting some "teaser" text about an article that page contains. The problem is, the article contains HTML code for font colors, images, etc. so, if my truncated text happens to cut right in the middle of a tag, the "click here" href is affected.

here's some sample HTML code of the problem:
<p><font... [<a href="moreinfo_51.html">read more</a>]

the browser is ignoring the href, but picks up after the href's closing greater-than bracket.

The code I'm using is pretty simple:

substr($my_members['notes'],0,strpos($my_members['notes']," ",120))

it looks for the first space after 120 characters, and calls that the break point. using striptags would eliminate the problem, but also strip out the formatting the client wants to see in the teaser text.

your ideas would be awesome.

#2 akitchin

akitchin
  • Staff Alumni
  • Advanced Member
  • 2,516 posts
  • LocationCalgary, AB, Canada

Posted 19 July 2006 - 01:50 AM

i've actually had to tackle this myself, and had to make two custom functions to fix it:

<?php
function kill_broken_tags($string)
{
	// check for any incomplete/broken tags right at the end of it - if there are any, remove them
	$pattern = '@<[\/\!]*?[^<>]*?$@si';
	return preg_replace($pattern, '', $string);
}

function get_open_tags($string, $tags = array('span', 'em', 'i', 'strong', 'b', 'u', 'table', 'tr', 'td', 'p', 'div'))
{
	$unclosed_tags = array();
	// go through each tag and check if there are any of its type that are unclosed
	foreach ($tags AS $tagname)
	{
		// define the regex pattern to search for
		$pattern = '@<'.$tagname.'[^<>]*?>.*?(?!</'.$tagname.'>)$@si';

		// check for any matches to it
		if (preg_match($pattern, $string) > 0)
		{
			// we've got a match, meaning that there are unclosed tags; append it
			$unclosed_tags[] = $tagname;
		}
	}

	// return the array if it has got unclosed tags; otherwise return a null
	return (!empty($unclosed_tags)) ? $unclosed_tags : NULL;
}
?>

truncate your string, as you are now.  then run kill_broken_tags() on it, which will eradicate the remains of any tags that were only partially removed during the truncation (is that even a word?).  finally, run get_open_tags() on it, which returns a list of the tags that are unclosed in the string.  you may then go through the array returned (it will return NULL if there are no unclosed tags) and append the appropriate closing tags to the end of the string.

i'm not sure it's the most elegant solution, but as far as i've tested it (on HTML news articles), it works.  give it a whirl.

EDIT:  i'm not sure how well this works for multiple unclosed tags of one type (ie. two open <span>s or whatever).  you can use preg_match_all(), which returns the total number of matches for that tag type, and add that as part of the information returned by the function.

EDIT THE SECOND:  i've also just noticed (it's all in the details) that you're ending it at the first space.  i'd still run kill_broken_tags() to be safe, but it may not be necessary if the tags you allow never use attributes.

#3 michaellunsford

michaellunsford
  • Members
  • PipPipPip
  • Advanced Member
  • 1,023 posts
  • LocationLouisiana, USA

Posted 19 July 2006 - 03:06 AM

thanks for the reply. by the way, can you run kill_broken_tags without running get_open_tags? I don't see where the two cross?

#4 akitchin

akitchin
  • Staff Alumni
  • Advanced Member
  • 2,516 posts
  • LocationCalgary, AB, Canada

Posted 19 July 2006 - 03:59 AM

well, there's no crossing between the two.  but if you have any tags that were chopped short (and broken), they wont be processed properly by the get_open_tags() function.  you'd want to remove them before checking for any unclosed tags.  they both have independent roles.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users