michaellunsford Posted July 19, 2006 Share Posted July 19, 2006 this is an interesting problem...On a homepage, I'm putting some "teaser" text about an article that page contains. The problem is, the article contains HTML code for font colors, images, etc. so, if my truncated text happens to cut right in the middle of a tag, the "click here" href is affected.here's some sample HTML code of the problem:[code]<p><font... [<a href="moreinfo_51.html">read more</a>][/code]the browser is ignoring the href, but picks up after the href's closing greater-than bracket.The code I'm using is pretty simple:[code]substr($my_members['notes'],0,strpos($my_members['notes']," ",120))[/code]it looks for the first space after 120 characters, and calls that the break point. using striptags would eliminate the problem, but also strip out the formatting the client wants to see in the teaser text.your ideas would be awesome. Quote Link to comment Share on other sites More sharing options...
akitchin Posted July 19, 2006 Share Posted July 19, 2006 i've actually had to tackle this myself, and had to make two custom functions to fix it:[code]<?phpfunction kill_broken_tags($string){ // check for any incomplete/broken tags right at the end of it - if there are any, remove them $pattern = '@<[\/\!]*?[^<>]*?$@si'; return preg_replace($pattern, '', $string);}function get_open_tags($string, $tags = array('span', 'em', 'i', 'strong', 'b', 'u', 'table', 'tr', 'td', 'p', 'div')){ $unclosed_tags = array(); // go through each tag and check if there are any of its type that are unclosed foreach ($tags AS $tagname) { // define the regex pattern to search for $pattern = '@<'.$tagname.'[^<>]*?>.*?(?!</'.$tagname.'>)$@si'; // check for any matches to it if (preg_match($pattern, $string) > 0) { // we've got a match, meaning that there are unclosed tags; append it $unclosed_tags[] = $tagname; } } // return the array if it has got unclosed tags; otherwise return a null return (!empty($unclosed_tags)) ? $unclosed_tags : NULL;}?>[/code]truncate your string, as you are now. then run kill_broken_tags() on it, which will eradicate the remains of any tags that were only partially removed during the truncation (is that even a word?). finally, run get_open_tags() on it, which returns a list of the tags that are unclosed in the string. you may then go through the array returned (it will return NULL if there are no unclosed tags) and append the appropriate closing tags to the end of the string.i'm not sure it's the most elegant solution, but as far as i've tested it (on HTML news articles), it works. give it a whirl.[b]EDIT: i'm not sure how well this works for multiple unclosed tags of one type (ie. two open <span>s or whatever). you can use preg_match_all(), which returns the total number of matches for that tag type, and add that as part of the information returned by the function.[/b][b]EDIT THE SECOND: i've also just noticed (it's all in the details) that you're ending it at the first space. i'd still run kill_broken_tags() to be safe, but it may not be necessary if the tags you allow never use attributes.[/b] Quote Link to comment Share on other sites More sharing options...
michaellunsford Posted July 19, 2006 Author Share Posted July 19, 2006 thanks for the reply. by the way, can you run kill_broken_tags without running get_open_tags? I don't see where the two cross? Quote Link to comment Share on other sites More sharing options...
akitchin Posted July 19, 2006 Share Posted July 19, 2006 well, there's no crossing between the two. but if you have any tags that were chopped short (and broken), they wont be processed properly by the get_open_tags() function. you'd want to remove them before checking for any unclosed tags. they both have independent roles. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.