tijmenamsing Posted February 22, 2012 Share Posted February 22, 2012 Hello, I'm working on a page where users can add articles by writing text in textareas with a WYSIWYG editor. When they submit the form it's saved in a database. As a summary of the article i grab the first 800 characters of the article, but as you could imagine there might occur html tags like <div> or <span> in the summary which are not closed. To prevent this from ruining my page layout when their articles are posted on the wegbsite I could use strip_tags but I'd like to keep the format, also this would delete images. I couldn't think of another solution then a function which checks for open tags and if so; add closing tags at the end of the summary. I already made a similar function a while back, but that one only checks for <div> and <span>, as those are the worst.. The nasty part is that I kind of deleted that function accidentally, and I can't fully remember how I wrote that.. So what I would like to have is a function that checks for all unclosed html tags and add the associated closing tags, in the right order, at the end of the summary. Any help getting on the right track is appreciated. Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/ Share on other sites More sharing options...
scootstah Posted February 22, 2012 Share Posted February 22, 2012 An easy way would be to count all open tags, and then count all closed tags. If the amount of closed tags is less than the open tags, add as many as you need. It might screw up the layout of what they posted but at least it will be confined to that area. Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1319800 Share on other sites More sharing options...
tijmenamsing Posted February 22, 2012 Author Share Posted February 22, 2012 That's the way my function worked as far as I can remember. And I called it like: tags("<div>", "</div>", $summary), the paramaters being (opening tags, closing tags, haystack). I used to call it only for div and span but now I would like a function that checks for all occuring opening tags automatically. Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1319815 Share on other sites More sharing options...
Psycho Posted February 22, 2012 Share Posted February 22, 2012 Since the tags aren't displayed I wouldn't consider them part of the "first 800" characters. So, I'd build a process to strip out all the characters after the first 800 that aren't tags. So, you might have some empty tags towards the end but since there would be no content between them they wouldn't do anything to the display. It's a little late, otherwise I might tinker with this. Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1319844 Share on other sites More sharing options...
Psycho Posted February 22, 2012 Share Posted February 22, 2012 OK, I lied. I found this interesting and wanted to give it a shot. This is pretty sloppy, but it works with the testing I did. I'll leave it to you to parse it down as needed and clean it up function getTextPart($text, $maxCount=800) { preg_match_all('#([^<]*)|(<[^>]*>)|([^<]*)#', $text, $matches); $output = ''; $letterCount = 0; $maxLength = false; foreach($matches[0] as $line) { if(empty($line)) { continue; } if($line[0]=="<") { $output .= $line; } elseif(!$maxLength) { $space = 0; foreach(explode(' ', $line) as $word) { if(!$maxLength && (strlen($word) + $letterCount + $space) <= $maxCount) { if($space) { $output .= ' '; } $output .= $word; $letterCount += strlen($word) + $space; $space = 1; } else { $maxLength = true; } } } } return $output; } echo getTextPart($input); Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1319857 Share on other sites More sharing options...
xyph Posted February 22, 2012 Share Posted February 22, 2012 Here's my version <?php $void_tags = array_fill_keys(array( 'area','base','br','col','command','embed','hr','img', 'input','keygen','link','meta','param','source' ),''); $content = <<<HEREDOC This thing of text <b><i>will have some</i> missing tags. Others will be<br>complete. Still others <input type="text" value="won't need to be"> completed, and shouldn't close at the end. <div> <img src="foobar.jpg" alt="baz" /> A tag will even be closed without being opened</table> HEREDOC; // RegEx solution $pattern = '#<(/?)([a-z]+)[^>]*>#i'; // Capture 1 will be empty for opening tag, '/' for closing tags // Capture 2 will be the type of tag if( !preg_match_all($pattern, $content, $matches, PREG_SET_ORDER) ) die( 'RegEx error' ); // This will hold the counts of each tag. $counts = array(); // This will hold a string of the tags to prepend $before = ''; // This will hold a string of the tags to append $after = ''; foreach( $matches as $match ) { // Verify that the tag doesn't need to be escaped if( isset($void_tags[$match[2]]) ) continue; // Check if this is a closing tag if( $match[1] == '/' ) { if( isset($counts[$match[2]]) ) $counts[$match[2]]--; // If this happens, someone has closed a tag before opening one. else { $before .= '<'.$match[2].'>'; $counts[$match[2]] = 0; } // This must be an opening tag } else { if( isset($counts[$match[2]]) ) $counts[$match[2]]++; else $counts[$match[2]] = 1; } } // Now we should have an array containing tags for keys, and integers for // values. If a tag's value is 0, there are as many opening tags as closing tags // If it is negative, there are that many missing opening tags; positive, missing // closing tags. echo '<h3>Preview of $counts array</h3><pre>'; print_r( $counts ); echo '</pre>'; foreach( $counts as $tag=>$count ) { while( $count > 0 ) { $after .= '</'.$tag.'>'; $count--; } while( $count < 0 ) { $before .= '<'.$tag.'>'; $count++; } } echo '<h3>Contents</h3><pre>'; echo htmlspecialchars( $before.$content.$after ); echo '</pre>'; ?> The only down side is that the tags aren't put back in the same order. This is possible to do though, with a little more effort. Let me know if you have any questions. Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1319871 Share on other sites More sharing options...
kicken Posted February 22, 2012 Share Posted February 22, 2012 Here is a function I use that closes up tags. You could adapt it to also count the 800 non-tag characters and use it for extracting your summary. function fixupHtml($html){ static $noClosers = array( 'input', 'br', 'link', 'base' ); $tagStack = array(); $len=strlen($html); $pos = 0; while (($pos=strpos($html, '<', $pos)) !== false){ $pos++; $isEnding = false; $isSelfClose = false; $foundTagEnd= false; for ($i=$pos; !$foundTagEnd && $i<$len; $i++){ $ch = $html[$i]; switch ($ch){ case "/": $isEnding = true; $isSelfClose = !($i==$pos); break; case '>': case " ": case "\r": case "\n": $foundTagEnd=true; break; default: } } $tagEnd = strpos($html, '>', $pos); if ($tagEnd === false){ $foundTagEnd = false; } if ($foundTagEnd){ $i--; $tag = substr($html, $pos, $i-($pos)); if (!$isEnding){ if (!in_array($tag, $noClosers)){ array_push($tagStack, $tag); } } else if (!$isSelfClose){ $tag=ltrim($tag, '/'); $tslen = count($tagStack); if ($tagStack[$tslen-1] == $tag){ array_pop($tagStack); } else { //Try and find it earlier in the stack $found=false; for ($i=$tslen-1; !$found && $i>=0; $i--){ if ($tagStack[$i] == $tag){ unset($tagStack[$i]); $tagStack = array_values($tagStack); $found=true; } } if (!$found){ //Bad end tag found. Lets remove it. $tagStart = $pos-1; $endOfTag = strpos($html, '>', $tagStart); $html = substr($html, 0, $tagStart).substr($html, $endOfTag+1); } } } } else { $html = substr($html, 0, $pos-1); } } while (count($tagStack) > 0){ $tag = array_pop($tagStack); $html .= '</'.$tag.'>'; } return $html; } Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1319893 Share on other sites More sharing options...
tijmenamsing Posted February 22, 2012 Author Share Posted February 22, 2012 wow thanks all for the replies! I don't have the knowledge yet to understand all of the code, but I think kicken's function is most complete and works out best for me. I get one warning though, which occurs when I add a closing tag when there is no opening tag of it. Warning: strpos() [function.strpos]: Offset not contained in string in #/test.php on line 12 line 12 being: while (($pos=strpos($html, '<', $pos)) !== false){ Any idea how to fix this? And if it's not too much to ask, could you add some comments to the function? Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1319971 Share on other sites More sharing options...
The Little Guy Posted February 22, 2012 Share Posted February 22, 2012 If you have tidy installed, you should use that: tidy_get_output Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1320016 Share on other sites More sharing options...
The Little Guy Posted February 22, 2012 Share Posted February 22, 2012 Here is an example, as you can see html tidy adds the closing <b> tag: <?php $html = <<<HTML <div> <b>Hello world </div> HTML; $config = array( "show-body-only" => true ); $tidy = new tidy(); $out = $tidy->repairString($html, $config, 'UTF8'); echo htmlentities($out); ?> More config options: http://tidy.sourceforge.net/docs/quickref.html#show-body-only Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1320028 Share on other sites More sharing options...
tijmenamsing Posted February 22, 2012 Author Share Posted February 22, 2012 Neat, never heard and works like a charm without installing anything new. I have no how it works though ;p Quote Link to comment https://forums.phpfreaks.com/topic/257502-closing-all-html-tags/#findComment-1320031 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.