Can this function be modified to add a line break ?

David-fethiye · July 29, 2009

Hi,

I want to format my emails to about 70 characters wide and wrap

them so that words don't get cut up.

Normally php wordwrap() would do the job, but unfortunately this

function has a tendency of breaking up html tags and as I have hyperlinks

in my emails, I want them to remain intact.

In the php manual under wordwrap I found a function called htmlwrap()

which states that it does NOT add a "<br>" to the end of 70 character line and

force a line wrap.

( this script safely wraps long words without destroying

html tags which wordwrap has a tendency of doing - but it does not wrap the text at

a certain width)

What I want to do is add that line break so that it DOES force a

line wrap - but I am not sure where to insert it in the function

Can anyone suggest which line of the following function to change ?

/* htmlwrap() is a function which wraps HTML by breaking long words and
* preventing them from damaging your layout.  This function will NOT
* insert <br /> tags every "width" characters as in the PHP wordwrap()
* function. 
*/

function htmlwrap($str, $width = 70, $break = "\n", $nobreak = "") {

  // Split HTML content into an array delimited by < and >
  // The flags save the delimeters and remove empty variables
  $content = preg_split("/([<>])/", $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

  // Transform protected element lists into arrays
  $nobreak = explode(" ", strtolower($nobreak));

  // Variable setup
  $intag = false;
  $innbk = array();
  $drain = "";

  // List of characters it is "safe" to insert line-breaks at
  // It is not necessary to add < and > as they are automatically implied
  $lbrks = "/?!%)-}]\\\"':;&";

  // Is $str a UTF8 string?
  $utf8 = (preg_match("/^([\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF][\x80-\xBF]{2}|[\xF1-\xF3][\x80-\xBF]{3}|\xF4[\x80-\x8F][\x80-\xBF]{2})*$/", $str)) ? "u" : "";

  while (list(, $value) = each($content)) {
    switch ($value) {

      // If a < is encountered, set the "in-tag" flag
      case "<": $intag = true; break;

      // If a > is encountered, remove the flag
      case ">": $intag = false; break;

      default:

        // If we are currently within a tag...
        if ($intag) {

          // Create a lowercase copy of this tag's contents
          $lvalue = strtolower($value);

          // If the first character is not a / then this is an opening tag
          if ($lvalue{0} != "/") {

            // Collect the tag name   
            preg_match("/^(\w*?)(\s|$)/", $lvalue, $t);

            // If this is a protected element, activate the associated protection flag
            if (in_array($t[1], $nobreak)) array_unshift($innbk, $t[1]);

          // Otherwise this is a closing tag
          } else {

            // If this is a closing tag for a protected element, unset the flag
            if (in_array(substr($lvalue, 1), $nobreak)) {
              reset($innbk);
              while (list($key, $tag) = each($innbk)) {
                if (substr($lvalue, 1) == $tag) {
                  unset($innbk[$key]);
                  break;
                }
              }
              $innbk = array_values($innbk);
            }
          }

        // Else if we're outside any tags...
        } else if ($value) {

          // If unprotected...
          if (!count($innbk)) {

            // Use the ACK (006) ASCII symbol to replace all HTML entities temporarily
            $value = str_replace("\x06", "", $value);
            preg_match_all("/&([a-z\d]{2,7}|#\d{2,5});/i", $value, $ents);
            $value = preg_replace("/&([a-z\d]{2,7}|#\d{2,5});/i", "\x06", $value);

            // Enter the line-break loop
            do {
              $store = $value;

              // Find the first stretch of characters over the $width limit
              if (preg_match("/^(.*?\s)?([^\s]{".$width."})(?!(".preg_quote($break, "/")."|\s))(.*)$/s{$utf8}", $value, $match)) {

                if (strlen($match[2])) {
                  // Determine the last "safe line-break" character within this match
                  for ($x = 0, $ledge = 0; $x < strlen($lbrks); $x++) $ledge = max($ledge, strrpos($match[2], $lbrks{$x}));
                  if (!$ledge) $ledge = strlen($match[2]) - 1;

                  // Insert the modified string
                  $value = $match[1].substr($match[2], 0, $ledge + 1).$break.substr($match[2], $ledge + 1).$match[4];
                }
              }

            // Loop while overlimit strings are still being found
            } while ($store != $value);

            // Put captured HTML entities back into the string
            foreach ($ents[0] as $ent) $value = preg_replace("/\x06/", $ent, $value, 1);
          }
        }
    }

    // Send the modified segment down the drain
    $drain .= $value;
  }

  // Return contents of the drain
  return $drain;
}

?>

Just to summarize: I would be quite happy to use php wordwrap()

to make the line width 70 characters (or as near as possible)

if it was not for the fact that it damages the html tags.

Also I would be quite happy to use the htmlwrap() if it was not for the fact

that it does not wrap ( !! ) at the width.

So I wanted an combination of the two, a function that will wordwrap

and a given character number ( or as near as it can get with out breaking

a word ) and with the protection of the html tags.

Can this code - htmlwrap() be converted to wordwarp as well ?

Any help much appreciated

.josh · July 29, 2009

It's not entirely perfect, but it might be good enough for your needs...

preg_match_all('~<[^>]+>~',$content,$tags);

foreach($tags[0] as $tag) {
  $tagMasks[] = "<".++$x.">";
}
$content = str_replace($tags[0],$tagMasks,$content);
$content = wordwrap($content,70);
$content = str_replace($tagMasks,$tags[0],$content);

It doesn't completely ignore tags. It replaces them with <n> where n is a unique number. It then wordwraps with those <n>'s in there. It then replaces the <n>'s with original tags. This will prevent wordwrap from breaking up the tags, because there are no spaces in <n>. But since <n> is not zero width it will play a part in where the break occurs. Unfortunately, I can't really think of a way to completely ignore the tag and at the same time remember where they go. Could use regex to grab and remember stuff around each tag, but that cannot be guaranteed. I tested it out on some example content and IMO it works good enough, but that's justme.

David-fethiye · July 29, 2009

Thanks for this,

It looks very interesting and I am going to give it try

I'll let you know how I get on !

Sign In

Can this function be modified to add a line break ?

Recommended Posts

David-fethiye

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

David-fethiye

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information