imperialized Posted May 4, 2015 Share Posted May 4, 2015 (edited) Alright, so I have ended myself in a predicament. Lets say, for example, I have a blog post that has 500 words (not including any HTML markup within). The post stored in the DB could be something like this: <div style='text-align: left'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> <div style='text-align: right'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> <div style='text-align: center'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> <div style='text-align: right'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> <div style='text-align: center'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> <div style='text-align: left'> post post post post post post </div> Doing a word count, or substr, or splitting in on a space could potentially leave disaster if it splits in the middle of a style, or leaves out a closing tag for html markup.I've thought about doing a substr($post, 0, 200) and pulling the first 200 characters but that leaves the possibility for the above mentioned issues. Doing a slice also leaves the issue: $postSummmary = implode(" ", array_slice(explode(" ", $post), 0, 100); Any ideas? Edited May 4, 2015 by imperialized Quote Link to comment Share on other sites More sharing options...
Barand Posted May 4, 2015 Share Posted May 4, 2015 Alright, so I have ended myself in a predicament. Brought on by storing the markup with the data. Solution: Don't. Quote Link to comment Share on other sites More sharing options...
fastsol Posted May 4, 2015 Share Posted May 4, 2015 I have run into the same thing before and this is how I did it. Granted with this method you would need to always format the portion you want to show in it's own <div> or <p>. preg_match("/<p>(.*)<\/p>/U", $a['art_body'], $matches); echo '<p>'.$matches[1].'..... </p>'; This basically finds the first <p> and </p> and grabs everything inside of it and assigns it to $matches. If you want it to find the div instead then just change that in the preg_match expression. Quote Link to comment Share on other sites More sharing options...
fastsol Posted May 4, 2015 Share Posted May 4, 2015 Brought on by storing the markup with the data. Solution: Don't. There isn't really any other way when using something like tinymce texteditor, at least from what I know. Quote Link to comment Share on other sites More sharing options...
Barand Posted May 4, 2015 Share Posted May 4, 2015 (edited) $summary = substr(strip_tags($text),0, 200); echo $summary; maybe? Edited May 4, 2015 by Barand Quote Link to comment Share on other sites More sharing options...
requinix Posted May 4, 2015 Share Posted May 4, 2015 (edited) I derived this monstrosity for work*: we had arbitrary HTML and I needed to cut it down to a certain number of words, not counting headings, and preserving as much markup as possible. Slightly redacted. For regular use, one would probably want to make a couple adjustments like removing the check for "foo_" and "foohead" (both indicating heading markup). /** * Truncate raw content * * @param mixed $content * @param int $words * @param bool $headings * @return array */ private static function truncateRaw($content, $words, $headings = false) { if(is_array($content)) { $ret = array('content' => array(), 'remaining' => $words); foreach($content as $key => $value) { if($ret['remaining'] <= 0) { if($ret['remaining'] == 0) { $ret['content'][$key] = '...'; } break; } $remaining = $value->truncate($ret['remaining'], $headings); $ret['content'][$key] = $value; $ret['remaining'] = $remaining; } return $ret; } else { $ret = array('content' => '', 'remaining' => $words); if($words <= 0) { return $ret; } // some special content has html comments explicitly marking a preview area if(($pos1 = strpos($content, '<!-- BEGIN PREVIEW -->')) !== false && ($pos2 = strpos($content, '<!-- END PREVIEW -->', $pos1)) !== false) { // 22 for strlen('<!-- BEGIN PREVIEW -->') $len = $pos2 - $pos1 - 22; $ret['content'] = trim(substr($content, $pos1 + 22, $len)); $ret['remaining'] = 0; } else { // cut the text in a way to preserve html tag structure $pieces = preg_split('#(</?(\w+).*?>)#is', $content, -1, PREG_SPLIT_DELIM_CAPTURE); // comes in triplets $excerpt = array(); $tags = array(); $header = 0; // counter for tag depth in headers $state = 0; // piece A: 0=text // piece B: 1=html tag // piece C: 2=open tag name, 3=opening tag name of header, 4=self-closing tag, 5=closing tag name // // A B C // +----+------+------------------------+ // | 0 -|-> 1 -|-> 2 if opening tag -|-> 0 and header++ (if header>0) // +----+------+------------------------+ // | 0 -|-> 1 -|-> 3 if opening header -|-> 0 and header++ // +----+------+------------------------+ // | 0 -|-> 1 -|-> 4 if self-closing -|-> 0 // +----+------+------------------------+ // | 0 -|-> 1 -|-> 5 if closing tag -|-> 0 and header-- (if header>0) // +----+------+------------------------+ // // break text if header==0, otherwise only increase word count // // h1 h2 h1 h0 // 013 0 12 0 15 015 014 012 0 15 0 // <p class="foo_header">Header <b>Text</b> </p> <br /> <p>Content</p> // ^ header ^ not header foreach($pieces as $piece) { // 0. text if($state == 0) { if($header) { if($headings) { $cut = self::cutWords($piece, $ret['remaining']); $excerpt[] = $cut['content']; $ret['remaining'] = $cut['remaining']; } else { $excerpt[] = $piece; $ret['remaining'] -= self::countWords($piece); } } else { $cut = self::cutWords($piece, $ret['remaining']); $excerpt[] = $cut['content']; $ret['remaining'] = $cut['remaining']; if($ret['remaining'] <= 0) { break; } } $state = 1; } // 1. html tag else if($state == 1) { // logic is easier to write in reverse order if($piece[1] == '/') { // closing // closing tag logic will decide when to add itself to the excerpt $state = 5; } else if(substr($piece, -2, 1) == '/') { // self-closing $excerpt[] = $piece; $state = 4; } else if(strlen($piece) >= 3 && $piece[1] == 'h' && ctype_digit($piece[2])) { // normal header $excerpt[] = $piece; $state = 3; } else if(strpos($piece, 'foo_') !== false || strpos($piece, 'foohead') !== false) { // old header $excerpt[] = $piece; $state = 3; } else { // text $excerpt[] = $piece; $state = 2; } } // 2. opening tag else if($state == 2) { $header && $header++; $tags[] = $piece; $state = 0; } // 3. opening header else if($state == 3) { $header++; $tags[] = $piece; $state = 0; } // 4. self-closing else if($state == 4) { $state = 0; } // 5. closing tag else if($state == 5) { $header && $header--; while($tags && $tag = array_pop($tags)) { $excerpt[] = "</{$tag}>"; if($tag == $piece) { break; } } $state = 0; } } // clean up any unclosed tags while($tags && $tag = array_pop($tags)) { $excerpt[] = "</{$tag}>"; } $ret['content'] = implode('', $excerpt); } return $ret; } }The most important thing was that it put tags onto a stack in order to close them out properly when the content gets cut inside multiple tags - I had to deal with things like ULs and tables. OR, instead of all this crazy work deciding how to get a summary: ask the writer to write one. Seriously. It's so much easier and their summary will be nicer than one you come up with automatically. * I don't normally like sharing stuff done for my job, IP rights and whatnot, but we're fairly lax and this is one of those times when there can be significant benefit to the community. Edited May 4, 2015 by requinix Quote Link to comment Share on other sites More sharing options...
imperialized Posted May 5, 2015 Author Share Posted May 5, 2015 Like the guy above mentioned, I am using tinyMCE so I don't have much choice when it comes to storing the tags with the data. I used a combination of techniques to accomplish what I was trying to achieve. It seems to be working. Albeit not the #1 solution, it will suffice for the purpose of this project. Thanks for the help. The following code is what was used to accomplish the intended result: $summary = strip_tags($this->blogPost); $summary = implode(' ', array_slice(explode(' ', $summary), 0, 100)); The above code gets the first 100 words of the blog post. Quote Link to comment Share on other sites More sharing options...
Psycho Posted May 5, 2015 Share Posted May 5, 2015 How important is it to maintain the markup - for the summary? I understand you want the markup for the actual output, but why not remove tags to generate the summary? Quote Link to comment Share on other sites More sharing options...
imperialized Posted May 5, 2015 Author Share Posted May 5, 2015 psycho, After rethinking it and looking at the posts provided that is exactly what I did. It wasn't important to keep the formatting for the summary. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.