Mko Posted February 4, 2013 Share Posted February 4, 2013 (edited) Hello all, I'm recently writing a script on the homepage that would display certain threads from certain forum categories. My current SQL query and fetching the contents work well, except I encounter an odd issue when using the substring method on the fetched contents to limit the characters displayed. Just so you're aware, I'm parsing the contents of the thread's post through vBulletin's BBCodeParser, yet that's not the issue. Here's a bit of background regarding my code/issue. Current Code (only included the important stuff): $parsed_text = $parser->do_parse($body); $message_pre = substr($parsed_text, 0, 500); $message = substr($message_pre, 0, strrpos($message_pre, ' ')); echo '<div id="a1"> echo '<div class="b">'; echo $message.'...'; echo '<div class="c"></div>'; echo '<div class="d">[<a href="">Read More...</a>]'; echo '</div>'; echo '<div class="e"></div></div></div>'; So, that's all fine. However, let's get some example database contents: [b]bold[/b] [i]italic[/i] [u]underline[/u] [center] center [/center] [left]left [/left] [right]right [/right] [url="http://google.com"]google.com[/url] [url="http://google.com"]url1[/url] [url="http://google.com"]url2[/url] [email="1@2.com"]1@2.com[/email] [email=1@2.com]1@2.com2[/email] [img=http://google.com] [size=4]yo[/size] [size="4"]yo2[/size] [font="Book Antiqua"]test[/font] [font=Book Antiqua]test2[/font] [color="Red"]hey[/color] [color="#0048C0"]hey2[/color] [list] [*]hello [*]world [/list] [list=1] [*]list2 [*]list2_1 [/list] Now, the BBCodeParser successfully parses the BBCode like it should and spits back some HTML, which I store inside the $parsed_text variable. However, I have an odd issue with the $message variable. Some of the HTML that is parsed seems to not terminate correctly, thus messing up my style. Here's an example of the issue in action (HTML output): <b>bold</b><br /> <i>italic</i><br /> <u>underline</u><br /> <div align="[url=""]center[/url]"> center<br /> </div><div align="[url=""]left[/url]">left<br /> </div><div align="[url=""]right[/url]">right<br /> </div><a href="[url="view-source:http://google.com/"]http://google.com[/url]" target="[url=""]_blank[/url]">google.com</a><br /> <a href="[url="view-source:http://google.com/"]http://google.com[/url]" target="[url=""]_blank[/url]">url1</a><br /> <a href="[url="view-source:http://google.com/"]http://google.com[/url]" target="[url=""]_blank[/url]">url2</a><br /> <a href="[email="mark@mko.com"]mailto:1@2.com[/email]">1@2.com</a><br /> <a href="[email="mark@mko.com"]mailto:1@2.com[/email]">1@2.com2</a><br /> <img...<div class="[url=""]clear[/url]"></div><div class="[url=""]news_bottom[/url]">[<a href="">Read More...</a>]</div> As you can most likely see, the contents of $message end with <img, because of the space before the src in <img src. My question is: What would be the correct way to go about limiting the amount of characters displayed AND preventing unclosed HTML tags from being displayed on the last line of the $message variable's content? Thanks for any and all help, Mark Edited February 4, 2013 by Mko Quote Link to comment https://forums.phpfreaks.com/topic/274029-fetching-contents-issue-with-substr/ Share on other sites More sharing options...
kicken Posted February 4, 2013 Share Posted February 4, 2013 Rather than doing a blind substr() to a specific length, you'd need to create sort of a mini-parser to go through the string an determine if your in an HTML tag or not. Only count characters when your not inside a tag and also keep track of which tags have been opened. When you reach your target character count you can substr() to that position, then close any tags that are still open. I posted a function that does something like this quite a while ago I believe, you could try searching for it. If I can find it I'll post the link. Quote Link to comment https://forums.phpfreaks.com/topic/274029-fetching-contents-issue-with-substr/#findComment-1410120 Share on other sites More sharing options...
requinix Posted February 4, 2013 Share Posted February 4, 2013 The only problem is the cut-off tag? Not the things? Technique I use is a preg_split() to alternate strings that can be cut (regular text) with strings that cannot (ie, HTML tags). As you're going you keep track of what HTML tags you've opened and closed. $parts = preg_split('#capture opening *and closing* html tags#', $input, -1, PREG_SPLIT_DELIM_CAPTURE); $cut = true; // first in $parts is regular text $length = 0; // so far $opentags = array(); // stack of tags needing to close $output = ""; // shortened version foreach ($parts as $p) { if ($cut) { // if you need to trim then go ahead, then break out of the loop // otherwise add to $length } else { // look at the captured html tag // if it opens and doesn't self-close then // - add the tag name to $opentags // if it closes then // - optionally check that it agrees with the top of the $opentags stack // - pop off $opentags } $output .= $p; $cut = !$cut; } // now close off the remaining open tags foreach ($opentags as $tag) { $output .= "</{$tag}>"; } Quote Link to comment https://forums.phpfreaks.com/topic/274029-fetching-contents-issue-with-substr/#findComment-1410124 Share on other sites More sharing options...
Mko Posted February 5, 2013 Author Share Posted February 5, 2013 (edited) The only problem is the cut-off <img> tag? Not the things? Technique I use is a preg_split() to alternate strings that can be cut (regular text) with strings that cannot (ie, HTML tags). As you're going you keep track of what HTML tags you've opened and closed. $parts = preg_split('#capture opening *and closing* html tags#', $input, -1, PREG_SPLIT_DELIM_CAPTURE); $cut = true; // first in $parts is regular text $length = 0; // so far $opentags = array(); // stack of tags needing to close $output = ""; // shortened version foreach ($parts as $p) { if ($cut) { // if you need to trim then go ahead, then break out of the loop // otherwise add to $length } else { // look at the captured html tag // if it opens and doesn't self-close then // - add the tag name to $opentags // if it closes then // - optionally check that it agrees with the top of the $opentags stack // - pop off $opentags } $output .= $p; $cut = !$cut; } // now close off the remaining open tags foreach ($opentags as $tag) { $output .= "</{$tag}>"; } The was just an issue with View Source for some reason :s Anyways, I implemented your code, but for some reason, I get this error when I run my script: Fatal error: Maximum execution time of 30 seconds exceeded in /home/mko/public_html/home.php on line 189 My current code: <?php $conn = new DB(); $query = $conn->query("query here;"); if (mysqli_num_rows($query) > 0) { while ($result = mysqli_fetch_array($query)) { $body = $result['pagetext']; $parser = new vB_BbCodeParser($vbulletin, fetch_tag_list(), true); $parsed_text = $parser->do_parse($body); $parts = preg_split("/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/", $parsed_text, -1, PREG_SPLIT_DELIM_CAPTURE); $cut = true; // first in $parts is regular text $length = 0; // so far $opentags = array(); // stack of tags needing to close $output = ""; // shortened version foreach ($parts as $p) { if ($cut) { // if you need to trim then go ahead, then break out of the loop // otherwise add to $length if ($length > 250) { break; } else { $length .= $p; } } else { // look at the captured html tag // if it opens and doesn't self-close then // - add the tag name to $opentags // if it closes then // - optionally check that it agrees with the top of the $opentags stack // - pop off $opentags if ($p.substr($p, 1, 1) != "/") { $opentags .= $p; } else if ($p.substr($p, 1, 1) == "/") { unset($opentags[$p]); } } $output .= $p; $cut = !$cut; } // now close off the remaining open tags foreach ($opentags as $tag) { $output .= "</{$tag}>"; } echo '<div id="a1">'; echo '<div class="b">'; echo $output.'...'; echo '<div class="c"></div>'; echo '<div class="d">[<a href="">Read More...</a>]'; echo '</div>'; echo '<div class="e"></div></div></div>'; } } else { echo 'No news!'; } ?> Am I implementing this correctly? Thanks, Mark Edited February 5, 2013 by Mko Quote Link to comment https://forums.phpfreaks.com/topic/274029-fetching-contents-issue-with-substr/#findComment-1410329 Share on other sites More sharing options...
requinix Posted February 6, 2013 Share Posted February 6, 2013 I don't see anything that would cause an infinite loop but I do see a few things to fix. Can you do some debugging to find out where the problem is? I can improve upon what I said earlier but now it's probably getting hard to follow me. So I'll just throw the whole thing at you. function shorten($text, $limit) { $selfclosing = array("img", "br"); $parts = preg_split('#(</?([a-z]+)[^>]*>)#i', $text, -1, PREG_SPLIT_DELIM_CAPTURE); $what = "text"; // "text", "html", or "tag" $tagaction = "add"; // "add", "remove", or "ignore" $length = 0; // so far $opentags = array(); // stack of tags needing to close $output = ""; // shortened version foreach ($parts as $p) { // just regular text if ($what == "text") { // if the new $p pushes the $length too long, cut it and stop // this is a good place for an ellipsis $l = strlen($p); if ($length + $l >= $limit) { $output .= substr($p, 0, $limit - $length - $l) . "..."; break; } // otherwise add it else { $output .= $p; $length += $l; } $what = "html"; // next step } // the entire html tag. see if it needs a separate closing tag else if ($what == "html") { // if it's a closing tag then it needs to be removed from the stack if ($p[1] == "/") { $action = "remove"; } // if it explicitly closes itself then ignore it else if (substr($p, -2, 1) == "/") { $action = "ignore"; } // otherwise it's an opening tag so add it else { $action = "add"; } $output .= $p; $what = "tag"; // next step } // just the tag name else { // maybe add the tag to the top (beginning) of the stack (array) if ($action == "add" && !in_array(strtolower($p), $selfclosing)) { array_unshift($opentags, $p); } // remove whatever's on top else if ($action == "remove") { array_shift($opentags); } $what = "text"; // reset } } // now close off the remaining open tags foreach ($opentags as $tag) { $output .= "</{$tag}>"; } return $output; } If I run that on <div itemprop="commentText" class='post entry-content '> Hello all,<br /> I'm recently writing a script on the homepage that would display certain threads from certain forum categories.<br /> My current SQL query and fetching the contents work well, except I encounter an odd issue when using the substring method on the fetched contents to limit the characters displayed.<br /> Just so you're aware, I'm parsing the contents of the thread's post through vBulletin's BBCodeParser, yet that's not the issue.<br /> <br /> Here's a bit of background regarding my code/issue.<br /> Current Code (only included the important stuff):<br /> <pre class='prettyprint'> $parsed_text = $parser->do_parse($body); $message_pre = substr($parsed_text, 0, 500); $message = substr($message_pre, 0, strrpos($message_pre, ' ')); echo '<div id="a1"> echo '<div class="b">'; echo $message.'...'; echo '<div class="c"></div>'; echo '<div class="d">[<a href="">Read More...</a>]'; echo '</div>'; echo '<div class="e"></div></div></div>'; </pre> <br /> So, that's all fine. However, let's get some example database contents:<br /> (a modified piece of the HTML source of your post) with a length of 250 I get <div itemprop="commentText" class='post entry-content '> Hello all,<br> I'm recently writing a script on the homepage that would display certain threads from certain forum categories.<br /> My current SQL query and fetching the contents work well, except I encounter an odd issue when using the substring method on t...</div> Quote Link to comment https://forums.phpfreaks.com/topic/274029-fetching-contents-issue-with-substr/#findComment-1410345 Share on other sites More sharing options...
Mko Posted February 6, 2013 Author Share Posted February 6, 2013 (edited) I don't see anything that would cause an infinite loop but I do see a few things to fix. Can you do some debugging to find out where the problem is? I can improve upon what I said earlier but now it's probably getting hard to follow me. So I'll just throw the whole thing at you. I did some debugging with my previous version. From what I could tell, the Regular Expression I had (/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/) wasn't functioning properly -- yielding the 30 second execution time warning. Your example worked! I can follow everything you posted, except for one hiccup, regarding the Regular Expression. My question is: can you explain to me what the functionality of the # and #i before and after the Regular Expression ('#(</?([a-z]+)[^>]*>)#i') is? Thanks a bunch for your continued help , Mark Edited February 6, 2013 by Mko Quote Link to comment https://forums.phpfreaks.com/topic/274029-fetching-contents-issue-with-substr/#findComment-1410349 Share on other sites More sharing options...
requinix Posted February 6, 2013 Share Posted February 6, 2013 (edited) PCRE expressions need delimiters but you've got a lot of freedom as to what they are. Slashes are traditional. However if you want to use slashes in the expression itself, like I did with the /?, then you'd have to escape it lest it be interpreted as a delimiter. I don't like needlessly escaping things so I just changed to a different delimiter: # (another popular one). Between the two delimiters is the expression itself and after the delimiter comes optional "flags" (or "modifiers"). The /i flag (the shorthand tends to be written with the slash delimiter) means case-insensitivity. A [a-z] by itself is literally "a lowercase letter a-z" and would thus only match HTML tags written in lowercase. Of course they may all be lowercase for you, but it's cheap enough to do just in case that's not true. The manual has everything listed out if you'd like to keep reading. Edited February 6, 2013 by requinix Quote Link to comment https://forums.phpfreaks.com/topic/274029-fetching-contents-issue-with-substr/#findComment-1410365 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.