Jump to content

Return X words from string


jarvis
Go to solution Solved by Barand,

Recommended Posts

Hi,

 

I'm going mad and in desperation am reaching out for some help.

 

I have a site that outputs a snippet of text (30 words) from a full description (string). The description includes HTML for formatting, which I need to keep in the snippet.

 

My issue is that the snippet never returns 30 words! Below is my code:

function limit_words($string, $word_limit) {
	#$words = explode(" ",$string);
	$words = preg_split('/\s+/', $string);
	return implode(" ",array_splice($words,0,$word_limit));
}

function limit_words2($string, $word_limit) {
	$words = explode(" ",$string);
	#$words = preg_split('/\s+/', $string);
	return implode(" ",array_splice($words,0,$word_limit));
}

$content = "
<p><p><strong>LOCATION</strong><br />Centrally located in Kent just of the High Street.</p>
<p><strong>ACCOMMODATION</strong><br />231 sq ft.</p>
<p><strong>AMENITIES</strong><br />Entry Phone System<br />
Central Heating<br />
Car Parking</p>
<p><strong>TERMS</strong><br />A new lease for a term to be agreed.</p>
<p><strong>OUTGOINGS</strong><br />To be assessed.</p>
<p><strong>VAT</strong><br />All prices and rents are quoted exclusive of VAT. Any intending purchaser or lessee must satisfy themselves as to the incidence of VAT in respect of any transaction.</p>
<p><strong>LEGAL COSTS</strong><br />Each party to be responsible for their own legal costs.</p>
<p><strong>SERVICE CHARGE</strong><br />Tenant to be responsible for a proportion of costs towards insurance, maintenance and repairs.</p>
</p>
"; 

$content2 = "
<p><p><strong>LOCATION</strong><br />A shop/office premises to let in Kent, close to NatWest, Holland & Barrett and Fat Face.</p>
<p><strong>DESCRIPTION</strong><br />A shop/office premises to let in Kent, close to NatWest, Holland & Barrett and Fat Face.</p>
<p><strong>ACCOMMODATION</strong><br />Approximately 159 sq ft. </p>
<p><strong>AMENITIES</strong><br />Attractive display window<br />
Laminate floor<br />
Display lighting<br />
Alarm</p>
<p><strong>TERMS</strong><br />Easy in easy out terms.</p>
<p><strong>OUTGOINGS</strong><br />We understand that the current rateable value is £2550.<br />
Current UBR – 48.2p in £<br />
Small business relief may be available.</p>
<p><strong>VAT</strong><br />All prices and rents are quoted exclusive of VAT. Any intending purchaser or lessee must satisfy themselves as to the incidence of VAT in respect of any transaction. The rent is also subject to VAT.</p>
<p><strong>LEGAL COSTS</strong><br />Each party responsible for their own legal costs.</p>
<p><strong>SERVICE CHARGE</strong><br />Insurance currently £223.32 per annum plus VAT.</p>
</p>
";
 
echo limit_words($content,30);
 
echo '<hr>'; 
 
echo limit_words($content2,30); 

echo '<hr>';

echo limit_words2($content,30);
 
echo '<hr>'; 
 
echo limit_words2($content2,30); 

What on earth am I doing wrong?

 

Any help is much appreciated!

Link to comment
Share on other sites

If you output the array of words ($words), you'll see that some of the slots are taken up by white space.

function limit_words($string, $word_limit) {
    #$words = explode(" ",$string);
    $words = preg_split('/\s+/', $string);
 
    echo '<pre>' . print_r($words, true) . '</pre>';
 
    return implode(" ",array_splice($words,0,$word_limit));
}
Link to comment
Share on other sites

So something like this:

function limit_words($string, $word_limit) {
    #$words = explode(" ",$string);
    $words = preg_split('/\s+/', $string);
	
    #echo '<pre>' . print_r($words, true) . '</pre>';	
	#echo '<pre>' .print_r(array_filter($words)) . '</pre>';
	
	$filter = array_filter($words);
	
    #return implode(" ",array_splice($words,0,$word_limit));
	return implode(" ",array_splice($filter,0,$word_limit));
}

Although that doesn't seem to work either?

Link to comment
Share on other sites

...I'd like to keep the HTML formatting

 

Hmm...that increases the difficulty. You could try something like this:

function limit_words($string, $word_limit) {
    //PREPARE STRING
    $words = preg_replace('|<br[ /]*>|', ' ', $string); //replace <br> tags with spaces so that "LOCATION</strong><br />Centrally" is not considered one word after HTML tags are removed
    $words = strip_tags($words);                        //remove HTML tags
    $words = preg_split('/\s+/', $words);               //split sting into words
 
    //LOCATE 30TH WORD
    $currOffset = 0;
    $wordsFound = 0;
    $lastWord   = '';
    foreach($words as $currWord) {
        //IF NOT BLANK, FIND CURRENT WORD IN ORIGINAL STRING
        if($currWord != '') {
            //IF WORD IS FOUND
            $newOffset = strpos($string, $currWord, $currOffset);
            if($newOffset !== false) {
                //echo "<div>$currWord || $currOffset || $newOffset</div>";
 
                //UPDATE OFFSET AND WORD COUNTER
                $currOffset = $newOffset;  //offset is used so the next word will be found after the current word
                $wordsFound++;
 
                //IF WORD LIMIT WAS REACHED, STORE 3OTH WORD AND BREAK OUT OF THE LOOP
                if($wordsFound == $word_limit) {
                    $lastWord = $currWord;
                    break;
                }
            }
        }
    }
 
    //RETURN RESULT
    return substr($string, 0, $currOffset) . $lastWord;
}
 
 
echo limit_words($content,30);
Link to comment
Share on other sites

  • Solution

This closes off any current tags

$text = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam elementum ornare scelerisque.<br> <a href='xyz.com' target='_blank'>Vestibulum</a> iaculis mattis dui.</p>
<p>Aliquam <i>scelerisque</i> sapien at tellus accumsan varius. <img src='a.jpg'> Fusce facilisis ullamcorper dapibus. Aliquam dignissim</p>
<ul>
    <li>gravida</li>
    <li>dui eget</li>
    <li>aliquam</li>
</ul>
<p>Duis odio, semper eu sodales vel, sollicitudin eu enim. Cras tortor libero, pellentesque accumsan tempus in, ullamcorper nec augue. Mauris eu ipsum mauris, non imperdiet ipsum. In hac habitasse platea dictumst. Morbi ipsum mauris, tincidunt vitae pretium tempor, pretium a turpis. Nulla quis eros eu lorem aliquam congue non a nisl.</p>";

$voidtags = ['br','hr','img'];
$keeptags = '<a><b><i><br><p><ul><ol><li><u><strong><emphasis>';
$limit = 30;

$summary = limitText($text, $limit, $voidtags, $keeptags);
echo $summary;

function limitText($text, $limit, $voidtags, $keeptags)
{
    $result = '';
    $p=0;
    $tags=[];
    $currtag = '';
    $words = 0;
    $intag = $inword = 0;
    $text = strip_tags($text, $keeptags);
    $len = strlen($text);
    while ($p<$len) {
        $c = $text[$p];
        switch ($c) {
            case '<':
                if ($inword) {
                    $inword = 0;
                    $words++;
                    if ($words > $limit) break 2;
                }
                $intag = 1;
                break;
            case '>':
                if ($intag && $currtag != '') {
                    if (!in_array($currtag, $voidtags)) $tags[] = $currtag;
                    $currtag = '';
                }
                $intag = 0;
                break;
            case '/':
                if ($intag) {
                    array_pop($tags);
                    do {
                        $result .= $c;
                    }
                    while (($c=$text[++$p]) !='>');
                    $intag = 0;    
                }
                break;
            case "\n":
            case "\t":
            case ' ':
                if ($inword) {
                    $inword = 0;
                    $words++;
                    if ($words >= $limit) break 2;
                }
                elseif ($intag) {
                    $tags[] = $currtag;
                    do {
                        $result .= $c;
                    }
                    while (($c=$text[++$p]) !='>');
                    $intag = 0;
                }
                break;
            default:
                if ($intag) {
                    $currtag .= $c;
                }
                else $inword = 1;
                break;
        }
        $result .= $c;
        ++$p;
    }
    while ($t=array_pop($tags)) {
        $result .= "</{$t}>";  // close any open tags
    }
    return $result;
}

results

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam elementum ornare scelerisque.<br> <a href="xyz.com" target="_blank">Vestibulum</a> iaculis mattis dui.</p> 
<p>Aliquam <i>scelerisque</i> sapien at tellus accumsan varius.  Fusce facilisis ullamcorper dapibus. Aliquam dignissim</p> 
<ul>
    <li>gravida</li> 
    <li>dui</li></ul>
  • Like 1
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.