Jump to content

Convert special HTML characters on include


muddy

Recommended Posts

Dear Freaks,

 

Any help on this would be greatly appreciated.

 

 

I have one file:  dailymenu.php  (the template for the daily menu)

 

 

All I need to do is include the file "menu.html" (the actual menu contents, exported from MS Word - a requirement) in the dailymenu.php template.  However, the menu.html file has all sorts of special characters - due to the fact that french, spanish, and itialian food names contain special characters:

 

example:  í, è, ó, â, é, ñ, etc. etc.

 

so when I do a simple:

 

<?php include("includes/menu.html"); ?>

 

in the dailymenu.php file, the output is crapped up by the special characters not being converted.

 

Any way you know to include the menu.html file in dailymenu.php template and fix the special characters all at once?

 

Thanks a million

Link to comment
Share on other sites

You could run the actual Microsoft file through html entities when it is uploaded (save a new copy of it).  Then when it comes time to show it, create the file again and decode it.  The last time I had problems with microsoft word, I just used http://www.byte.com/documents/s=9502/byt1125943459937/0905_pournelle.html and I didn't really have any problems after that.

Link to comment
Share on other sites

There is a cool php custom "function" I found on the php.net site. It is kind of an UBER-htmlentities

 

<?php
// Convert str to UTF-8 (if not already), then convert that to HTML named entities.
// and numbered references. Compare to native htmlentities() function.
// Unlike that function, this will skip any already existing entities in the string.
// mb_convert_encoding() doesn't encode ampersands, so use makeAmpersandEntities to convert those.
// mb_convert_encoding() won't usually convert to illegal numbered entities (128-159) unless
// there's a charset discrepancy, but just in case, correct them with correctIllegalEntities.

function makeSafeEntities($str, $convertTags = 0, $encoding = "") {

  if (is_array($arrOutput = $str)) {

    foreach (array_keys($arrOutput) as $key)

      $arrOutput[$key] = makeSafeEntities($arrOutput[$key],$encoding);

    return $arrOutput;

    }

  else if (!empty($str)) {

    $str = makeUTF8($str,$encoding);

    $str = mb_convert_encoding($str,"HTML-ENTITIES","UTF-8");

    $str = makeAmpersandEntities($str);

    if ($convertTags)

      $str = makeTagEntities($str);

    $str = correctIllegalEntities($str);

    return $str;

    }

  }

// Convert str to UTF-8 (if not already), then convert to HTML numbered decimal entities.
// If selected, it first converts any illegal chars to safe named (and numbered) entities
// as in makeSafeEntities(). Unlike mb_convert_encoding(), mb_encode_numericentity() will
// NOT skip any already existing entities in the string, so use a regex to skip them.

function makeAllEntities($str, $useNamedEntities = 0, $encoding = "") {

  if (is_array($str)) {

    foreach ($str as $s)

      $arrOutput[] = makeAllEntities($s,$encoding);

    return $arrOutput;

    }

  else if (!empty($str)) {

    $str = makeUTF8($str,$encoding);

    if ($useNamedEntities)

      $str = mb_convert_encoding($str,"HTML-ENTITIES","UTF-8");

    $str = makeTagEntities($str,$useNamedEntities);

    // Fix backslashes so they don't screw up following mb_ereg_replace
    // Single quotes are fixed by makeTagEntities() above

    $str = mb_ereg_replace('\\\\',"&#92;", $str);

    mb_regex_encoding("UTF-8");

    $str = mb_ereg_replace("(?>(&(?:[a-z]{0,4}\w{2,3};|#\d{2,5}))|(\S+?)",

                          "'\\1'.mb_encode_numericentity('\\2',array(0x0,0x2FFFF,0,0xFFFF),'UTF-8')", $str, "ime");

    $str = correctIllegalEntities($str);

    return $str;

    }

  }



// Convert common characters to named or numbered entities

function makeTagEntities($str, $useNamedEntities = 1) {

  // Note that we should use ' for the single quote, but IE doesn't like it

  $arrReplace = $useNamedEntities ? array('&#39;','"','<','>') : array('&#39;','&#34;','&#60;','&#62;');

  return str_replace(array("'",'"','<','>'), $arrReplace, $str);

  }

// Convert ampersands to named or numbered entities.
// Use regex to skip any that might be part of existing entities.

function makeAmpersandEntities($str, $useNamedEntities = 1) {

  return preg_replace("/&(?![A-Za-z]{0,4}\w{2,3};|#[0-9]{2,5};)/m", $useNamedEntities ? "&" : "&#38;", $str);

  }

// Convert illegal HTML numbered entities in the range 128 - 159 to legal couterparts

function correctIllegalEntities($str) {

  $chars = array(
    128 => '&#8364;',
    130 => '&#8218;',
    131 => '&#402;',
    132 => '&#8222;',
    133 => '&#8230;',
    134 => '&#8224;',
    135 => '&#8225;',
    136 => '&#710;',
    137 => '&#8240;',
    138 => '&#352;',
    139 => '&#8249;',
    140 => '&#338;',
    142 => '&#381;',
    145 => '&#8216;',
    146 => '&#8217;',
    147 => '&#8220;',
    148 => '&#8221;',
    149 => '&#8226;',
    150 => '&#8211;',
    151 => '&#8212;',
    152 => '&#732;',

    153 => '&#8482;',
    154 => '&#353;',
    155 => '&#8250;',
    156 => '&#339;',
    158 => '&#382;',
    159 => '&#376;');

  foreach (array_keys($chars) as $num)

    $str = str_replace("&#".$num.";", $chars[$num], $str);

  return $str;

  }

// Compare to native utf8_encode function, which will re-encode text that is already UTF-8

function makeUTF8($str,$encoding = "") {

  if (!empty($str)) {

    if (empty($encoding) && isUTF8($str))

      $encoding = "UTF-8";

    if (empty($encoding))

      $encoding = mb_detect_encoding($str,'UTF-8, ISO-8859-1');

    if (empty($encoding))

      $encoding = "ISO-8859-1"; //  if charset can't be detected, default to ISO-8859-1

    return $encoding == "UTF-8" ? $str : @mb_convert_encoding($str,"UTF-8",$encoding);

    }

  }

// Much simpler UTF-8-ness checker using a regular expression created by the W3C:
// Returns true if $string is valid UTF-8 and false otherwise.
// From http://w3.org/International/questions/qa-forms-utf-8.html

function isUTF8($str) {

   return preg_match('%^(?:

         [\x09\x0A\x0D\x20-\x7E]           // ASCII

       | [\xC2-\xDF][\x80-\xBF]            // non-overlong 2-byte

       | \xE0[\xA0-\xBF][\x80-\xBF]        // excluding overlongs

       | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} // straight 3-byte

       | \xED[\x80-\x9F][\x80-\xBF]        // excluding surrogates

       | \xF0[\x90-\xBF][\x80-\xBF]{2}     // planes 1-3

       | [\xF1-\xF3][\x80-\xBF]{3}         // planes 4-15

       | \xF4[\x80-\x8F][\x80-\xBF]{2}     // plane 16

   )*$%xs', $str);

  }

?>

 

businessman332211 had the right idea by having you run the upload file through htmlentities.

 

But, instead, run it through the above makeSafeEntities function  -- it was created to catch most of the holes in the native php htmlentities function.

Link to comment
Share on other sites

Dear Businessman and DBrimlow,

 

Thanks for the reply!

 

Businessman, when I click on your link, I get sent to some article about Microsoft Activation gripes.  Was there another link related to this topic, or do I need to register on that site to read more?

 

DBrimlow,

 

This looks like what I'm looking for.  So, how do I "run it through" the makeSafeEntities function.  I'm assuming I put this function in the dailymenu.php file, but how do I tell the dailymenu.php file to execute this function on the menu.html file?

 

Thanks again.

Link to comment
Share on other sites

Oh Man, if I could just get this figured out, I could retire!

 

I'll pay if someone can help me figure out how to "run my html file through the makeSafeEntities function" that dbrimlow wrote below.  It seems so simple, but I don't know or have the time to figure it out.

 

Thanks

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.