Jump to content

[SOLVED] Character set encoding / smart quotes problem


grejon04

Recommended Posts

Hello; I know this problem is so irritating but its sadly become my problem too. I did do the required reading before I posted this, though...

It's the damn curly text. I tried the convert_smart_quotes function from the article, and htmlentities (which actually solved the problem for the people who ran the site with the same code before).

 

So, since the problem was solved on another server, could the problem be with the server's default character encoding? (do they have this?)

Mac OSX server, MySQL database with tables of Latin1 encoding.

 

There is a generic header file that is used with every page,

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-AU">
  <head>
    <link rel="stylesheet" type="text/css" href="css/new.css" media="screen" title="New CSS" />
    <link rel="stylesheet" type="text/css" href="css/new-printer.css" media="print" title="New Print CSS" />
<script src="includes/scripts.js" type="text/javascript"></script>

<title><?php echo  $title ?></title>

 

Notice there is no meta tag, specifying

 

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

 

as with the other fix. And I feel like this may have something to do with it, but don't want to add it if it isn't necessary.

 

I looked at the site with my Firefox browser and the view->character encoding-> was set to ISO 8859-1. But I don't know if that means the site is set to that or just my browser, I suspect.

 

Any help with this would be awesome. I know you all have heard this too many times.

 

j

Link to comment
Share on other sites

The priorities are to prevent any translational errors from the database, and to get rid of the curly text.

 

MySQL tables default to latin1, but part of my question is that I don't know whether I can change that, or if I even need to, or if the problem lies in the fact that the HTML meta tags need to be specific. Would just inserting that type into the HTML header meta tags solve the problem?

Link to comment
Share on other sites

Smart quotes are not in Latin-1. You need to decide what your site should be running, then handle it properly. If you expect to handle international characters at some point, convert everything to UTF-8 now. If you only work in ISO-8859-1 (Latin-1), then you need an up-front process that will catch the smart quotes and translate them to normal quotes before they are entered into the database.

Link to comment
Share on other sites

you could do something like

 

 

<?php  
function convert_smart_quotes($string) { 
   $search = array(chr(145),chr(146),chr(147),chr(148),chr(151));      
$replace = array("'", "'", '"', '"','-');      returnstr_replace($search, $replace, $string); }  ?>

Link to comment
Share on other sites

<?php  
function convert_smart_quotes($string) { 
    $search = array(chr(145),chr(146),chr(147),chr(148),chr(151));      
$replace = array("'", "'", '"', '"','-');     
returnstr_replace($search, $replace, $string); }  ?>

try this function to convert smart quotes

 

also the header needs to be UTF-8

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

 

to be more specific if you want to update the entries already in the database

 

you could do something like

 

<?php


convert_smart_quotes($_GET['message']); ?>

Link to comment
Share on other sites

Ok, the curly text is gone . Awesome (but now the ? in black diamonds). I recall seeing previous forum posts about the question marks in the black diamonds. Will the convert function remove those?

 

I found the section of code that pulls the database entries...

$short_title = htmlspecialchars(substr($row[title],0,55),ENT_QUOTES );
			$title = htmlspecialchars($row[title],ENT_QUOTES);

 

will this do anything? Any other tips on the black question mark thingies before I go hunting?

 

Link to comment
Share on other sites

Wait a second...those aren't regular question marks. They're kind of like the curly text...bad representations of something that's supposed to be there.

 

And what I don't get is, if I leave the page in iso-8859-1, then call convert_smart_quotes($title) on the title as I call it from the db, which has the curly text in it, why doesn't that pull the quotes off?

 

function convert_smart_quotes($string) 
{ 
    $search = array(chr(145), 
                    chr(146), 
                    chr(147), 
                    chr(148), 
                    chr(151)); 

    $replace = array("'", 
                     "'", 
                     '"', 
                     '"', 
                     '-'); 

    return str_replace($search, $replace, $string); 
} 

Link to comment
Share on other sites

<?php
// assuming '†' is actually UTF8, htmlentities will assume it's iso-8859  
// since we did not specify in the 3rd argument of htmlentities.
// This generates "â[bad utf-8 character]"
// If passed to any libxml, it will generate a fatal error.
$badUTF8 = htmlentities('†');

// iconv() can ignore characters which cannot be encoded in the target character set
$goodUTF8 = iconv("utf-8", "utf-8//IGNORE", $badUTF8);
?>

Link to comment
Share on other sites

Actually, it seems like those two functions are what's causing those symbols.

 

what about

 

htmlentities($string, ENTQUOTES, "UTF-8")
iconv("ISO-8859-1", "UTF-8", $string)

 

Looks like I'll have to perform these conversions wherever db data comes out. What a pain in the ass.

 

by the way, thank you for your help. I appreciate it, and I think I've almost got it.

 

Also, what do you think about converting the database tables to UTF-8? Would that cause conversion problems?

 

 

Link to comment
Share on other sites

<?php
/*
*    Function htmlentities which support iso-8859-2
*
*    @param string
*    @return string
*    @author FanFataL
*/
function htmlentities_iso88592($string='') {
    $pl_iso = array('ê', 'ó', '±', '¶', '³', '¿', 
'¼', 'æ', 'ñ', 'Ê', 'Ó', '¡', '¦',
'£', '¬', '¯', 'Æ', 'Ñ');    
    $entitles = get_html_translation_table(HTML_ENTITIES);
    $entitles = array_diff($entitles, $pl_iso);
    return strtr($string, $entitles);
}
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.