Jump to content

Help Converting Smart Quotes


schilly

Recommended Posts

I've got some text in a longtext mysql field (latin1) that contains some smart quotes which are causing problems using this data to generate an xml file. the smart quotes break the xml structure.

 

I've tried a bunch of different conversion methods in PHP and MySQL with no luck. Does anyone have a concrete method for doing this?

 

I've tried search/replace with

$search = array(chr(145), chr(146), chr(147), chr(148), chr(151)); 

 

Changing longtext to binary then converting to UTF8 then converting back to longtext.

 

Using mb_convert_encoding(). Nothing seems to work.

 

 

Character encodings still cause a lot of confusion for me. Any help is appreciated. Thanks.

 

Link to comment
Share on other sites

Yup that was the search replace code above:

 

This:


<?php

function convert_smart_quotes($string){ 
	    
$search = array(chr(145), chr(146), chr(147), chr(148), chr(151)); 

$replace = array("'", "'", '"', '"', '-'); 

return str_replace($search, $replace, $string); 
} ?>

 

and

<?php function sanitizeString($string = null){

    if(is_null($string)) return false;

     //-> Replace all of those weird MS Word quotes and other high characters

     $badwordchars=array(

         "\xe2\x80\x98", // left single quote

         "\xe2\x80\x99", // right single quote

         "\xe2\x80\x9c", // left double quote

         "\xe2\x80\x9d", // right double quote

         "\xe2\x80\x94", // em dash

         "\xe2\x80\xa6" // elipses
         

     );

     $fixedwordchars=array(

         "'",

         "'",

         '"',

         '"',

         '—',

         '...'

     );

     return str_replace($badwordchars,$fixedwordchars, $string);

} ?>

 

I still have some weird chars showing up as âÂ

 

I'm not sure if this is because it's a UTF8 character being submitted through our form then being stored as latin1. It looks like most of the issues are with smart quotes and mdash.

 

The text shows up fine in the web as UTF8 but when I generate an XML of it, it completely craps out.

Link to comment
Share on other sites

Those are characters with accents, and the cent symbol. âÂ

 

If you want to replace every one of those with it's non-accented counterpart, you're going to have a huge list :D

 

 

Odd that your XML craps out, all 3 characters are in ISO 8859-1

Link to comment
Share on other sites

Those are characters with accents, and the cent symbol. âÂ

 

Well in latin1 there are those symbols but in UTF8 they are a smart quote I think. Because it's a 2-byte character, latin1 shows it as two characters instead of one.

Link to comment
Share on other sites

Ya I think I'm pretty screwed here. I tried the MySQL binary cast UTF8 conversion method and it didn't work. The single quote is all messed, breaking xml validity

 

Sample after converison:

 

With this in mind, and especially if you?¢‚Ǩ‚Ñ¢re an artist, musician, etc., it?¢‚Ǩ‚Ñ¢s extremely

 

There shouldn't be too many bad character types so I figure if I get a list of them all and do a find/replace of their latin1 counterpart it should work.

Link to comment
Share on other sites

I might have got it through trial and error:

<?php

function convert_smart_quotes($string){ 
	    
$search = array(chr(145), chr(146), chr(147), chr(148), chr(151), "\x80", "\x85", "\xA3", "\x96", "\xB5"); 

$replace = array("'", "'", '"', '"', '-', '…', "\n", "£", "-", "µ"); 

return str_replace($search, $replace, $string); 
} 

?>

 

Just a few outlier characters I needed to track down and convert. Hopefully I don't run into more in the future.

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.