Jump to content

Truncating a varialbe that contains HTML


michaellunsford

Recommended Posts

this is an interesting problem...

On a homepage, I'm putting some "teaser" text about an article that page contains. The problem is, the article contains HTML code for font colors, images, etc. so, if my truncated text happens to cut right in the middle of a tag, the "click here" href is affected.

here's some sample HTML code of the problem:
[code]<p><font... [<a href="moreinfo_51.html">read more</a>][/code]

the browser is ignoring the href, but picks up after the href's closing greater-than bracket.

The code I'm using is pretty simple:

[code]substr($my_members['notes'],0,strpos($my_members['notes']," ",120))[/code]

it looks for the first space after 120 characters, and calls that the break point. using striptags would eliminate the problem, but also strip out the formatting the client wants to see in the teaser text.

your ideas would be awesome.
Link to comment
Share on other sites

i've actually had to tackle this myself, and had to make two custom functions to fix it:

[code]<?php
function kill_broken_tags($string)
{
// check for any incomplete/broken tags right at the end of it - if there are any, remove them
$pattern = '@<[\/\!]*?[^<>]*?$@si';
return preg_replace($pattern, '', $string);
}

function get_open_tags($string, $tags = array('span', 'em', 'i', 'strong', 'b', 'u', 'table', 'tr', 'td', 'p', 'div'))
{
$unclosed_tags = array();
// go through each tag and check if there are any of its type that are unclosed
foreach ($tags AS $tagname)
{
// define the regex pattern to search for
$pattern = '@<'.$tagname.'[^<>]*?>.*?(?!</'.$tagname.'>)$@si';

// check for any matches to it
if (preg_match($pattern, $string) > 0)
{
// we've got a match, meaning that there are unclosed tags; append it
$unclosed_tags[] = $tagname;
}
}

// return the array if it has got unclosed tags; otherwise return a null
return (!empty($unclosed_tags)) ? $unclosed_tags : NULL;
}
?>[/code]

truncate your string, as you are now.  then run kill_broken_tags() on it, which will eradicate the remains of any tags that were only partially removed during the truncation (is that even a word?).  finally, run get_open_tags() on it, which returns a list of the tags that are unclosed in the string.  you may then go through the array returned (it will return NULL if there are no unclosed tags) and append the appropriate closing tags to the end of the string.

i'm not sure it's the most elegant solution, but as far as i've tested it (on HTML news articles), it works.  give it a whirl.

[b]EDIT:  i'm not sure how well this works for multiple unclosed tags of one type (ie. two open <span>s or whatever).  you can use preg_match_all(), which returns the total number of matches for that tag type, and add that as part of the information returned by the function.[/b]

[b]EDIT THE SECOND:  i've also just noticed (it's all in the details) that you're ending it at the first space.  i'd still run kill_broken_tags() to be safe, but it may not be necessary if the tags you allow never use attributes.[/b]
Link to comment
Share on other sites

well, there's no crossing between the two.  but if you have any tags that were chopped short (and broken), they wont be processed properly by the get_open_tags() function.  you'd want to remove them before checking for any unclosed tags.  they both have independent roles.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.