schilly Posted June 30, 2011 Share Posted June 30, 2011 I've got some text in a longtext mysql field (latin1) that contains some smart quotes which are causing problems using this data to generate an xml file. the smart quotes break the xml structure. I've tried a bunch of different conversion methods in PHP and MySQL with no luck. Does anyone have a concrete method for doing this? I've tried search/replace with $search = array(chr(145), chr(146), chr(147), chr(148), chr(151)); Changing longtext to binary then converting to UTF8 then converting back to longtext. Using mb_convert_encoding(). Nothing seems to work. Character encodings still cause a lot of confusion for me. Any help is appreciated. Thanks. Quote Link to comment Share on other sites More sharing options...
EdwinPaul Posted June 30, 2011 Share Posted June 30, 2011 Did you also try http://php.net/manual/en/function.str-replace.php ? Quote Link to comment Share on other sites More sharing options...
schilly Posted June 30, 2011 Author Share Posted June 30, 2011 Yup that was the search replace code above: This: <?php function convert_smart_quotes($string){ $search = array(chr(145), chr(146), chr(147), chr(148), chr(151)); $replace = array("'", "'", '"', '"', '-'); return str_replace($search, $replace, $string); } ?> and <?php function sanitizeString($string = null){ if(is_null($string)) return false; //-> Replace all of those weird MS Word quotes and other high characters $badwordchars=array( "\xe2\x80\x98", // left single quote "\xe2\x80\x99", // right single quote "\xe2\x80\x9c", // left double quote "\xe2\x80\x9d", // right double quote "\xe2\x80\x94", // em dash "\xe2\x80\xa6" // elipses ); $fixedwordchars=array( "'", "'", '"', '"', '—', '...' ); return str_replace($badwordchars,$fixedwordchars, $string); } ?> I still have some weird chars showing up as â I'm not sure if this is because it's a UTF8 character being submitted through our form then being stored as latin1. It looks like most of the issues are with smart quotes and mdash. The text shows up fine in the web as UTF8 but when I generate an XML of it, it completely craps out. Quote Link to comment Share on other sites More sharing options...
EdwinPaul Posted June 30, 2011 Share Posted June 30, 2011 Is here anything helpfull for you: http://php.net/manual/en/function.utf8-encode.php ? Quote Link to comment Share on other sites More sharing options...
xyph Posted June 30, 2011 Share Posted June 30, 2011 Those are characters with accents, and the cent symbol. â If you want to replace every one of those with it's non-accented counterpart, you're going to have a huge list Odd that your XML craps out, all 3 characters are in ISO 8859-1 Quote Link to comment Share on other sites More sharing options...
schilly Posted June 30, 2011 Author Share Posted June 30, 2011 Those are characters with accents, and the cent symbol. â Well in latin1 there are those symbols but in UTF8 they are a smart quote I think. Because it's a 2-byte character, latin1 shows it as two characters instead of one. Quote Link to comment Share on other sites More sharing options...
xyph Posted June 30, 2011 Share Posted June 30, 2011 Ahh, I keep messing up ascii and 8859. The issue here is your longtext is 8859 and not UTF-8. Here's some reading http://nicj.net/2011/04/17/mysql-converting-an-incorrect-latin1-column-to-utf8 Quote Link to comment Share on other sites More sharing options...
schilly Posted July 5, 2011 Author Share Posted July 5, 2011 Ya I think I'm pretty screwed here. I tried the MySQL binary cast UTF8 conversion method and it didn't work. The single quote is all messed, breaking xml validity Sample after converison: With this in mind, and especially if you?¢‚Ǩ‚Ñ¢re an artist, musician, etc., it?¢‚Ǩ‚Ñ¢s extremely There shouldn't be too many bad character types so I figure if I get a list of them all and do a find/replace of their latin1 counterpart it should work. Quote Link to comment Share on other sites More sharing options...
schilly Posted July 6, 2011 Author Share Posted July 6, 2011 I might have got it through trial and error: <?php function convert_smart_quotes($string){ $search = array(chr(145), chr(146), chr(147), chr(148), chr(151), "\x80", "\x85", "\xA3", "\x96", "\xB5"); $replace = array("'", "'", '"', '"', '-', '…', "\n", "£", "-", "µ"); return str_replace($search, $replace, $string); } ?> Just a few outlier characters I needed to track down and convert. Hopefully I don't run into more in the future. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.