Jump to content

Converting Smart Quotes to Regular Quotes


kittrellbj

Recommended Posts

I am having more trouble with this than is probably reasonable.  Here's what I'm trying to do:

 

1. Copy text from a word processor (Microsoft Office or OpenOffice) into a .txt file (Notepad, etc.).

2. Convert the .txt file into a .html file.

 

The problem I'm running into is that smart quotes (curly quotes), the long hyphen, and apostrophes are turning into ? in the final document.  I've gone around Google trying to locate a solution that will work converting these troublesome characters into regular old double quotes ("), but they don't work.

 

I'm working on a Windows XP machine, using XAMPP as my work environment.  Most people submitting the .txt files will be coming from a Windows computer.  (I know that Microsoft has done wonders in messing up the encoding system in regards to smart quotes...)

 

I've tried:

function convert_smart_quotes($string) {

$quotes = array(
    "\xC2\xAB"     => '"', // « (U+00AB) in UTF-8
    "\xC2\xBB"     => '"', // » (U+00BB) in UTF-8
    "\xE2\x80\x98" => "'", // ‘ (U+2018) in UTF-8
    "\xE2\x80\x99" => "'", // ’ (U+2019) in UTF-8
    "\xE2\x80\x9A" => "'", // ‚ (U+201A) in UTF-8
    "\xE2\x80\x9B" => "'", // ‛ (U+201B) in UTF-8
    "\xE2\x80\x9C" => '"', // “ (U+201C) in UTF-8
    "\xE2\x80\x9D" => '"', // ” (U+201D) in UTF-8
    "\xE2\x80\x9E" => '"', // „ (U+201E) in UTF-8
    "\xE2\x80\x9F" => '"', // ‟ (U+201F) in UTF-8
    "\xE2\x80\xB9" => "'", // ‹ (U+2039) in UTF-8
    "\xE2\x80\xBA" => "'", // › (U+203A) in UTF-8
);
$str = strtr($string, $quotes);
return $string;
}

 

and also

<?php 

function convert_smart_quotes($string) 
{ 
    $search = array(chr(145), 
                    chr(146), 
                    chr(147), 
                    chr(148), 
                    chr(151)); 

    $replace = array("'", 
                     "'", 
                     '"', 
                     '"', 
                     '-'); 

    return str_replace($search, $replace, $string); 
} 

?>

 

and also trying to display the HTML characters for them instead...

 

<?php 

$replace = array('‘', 
                 '’', 
                 '“', 
                 '”', 
                 '—'); 

?>

 

Nothing seems to work.  I know it has something to do with the encoding, but I can't seem to figure out a way to replace these little buggers and keep from having a million ? symbols throughout the file. :(

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.