schilly Posted June 30, 2011 Share Posted June 30, 2011 I've got some text in a longtext mysql field (latin1) that contains some smart quotes which are causing problems using this data to generate an xml file. the smart quotes break the xml structure. I've tried a bunch of different conversion methods in PHP and MySQL with no luck. Does anyone have a concrete method for doing this? I've tried search/replace with $search = array(chr(145), chr(146), chr(147), chr(148), chr(151)); Changing longtext to binary then converting to UTF8 then converting back to longtext. Using mb_convert_encoding(). Nothing seems to work. Character encodings still cause a lot of confusion for me. Any help is appreciated. Thanks. Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/ Share on other sites More sharing options...
EdwinPaul Posted June 30, 2011 Share Posted June 30, 2011 Did you also try http://php.net/manual/en/function.str-replace.php ? Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/#findComment-1236891 Share on other sites More sharing options...
schilly Posted June 30, 2011 Author Share Posted June 30, 2011 Yup that was the search replace code above: This: <?php function convert_smart_quotes($string){ $search = array(chr(145), chr(146), chr(147), chr(148), chr(151)); $replace = array("'", "'", '"', '"', '-'); return str_replace($search, $replace, $string); } ?> and <?php function sanitizeString($string = null){ if(is_null($string)) return false; //-> Replace all of those weird MS Word quotes and other high characters $badwordchars=array( "\xe2\x80\x98", // left single quote "\xe2\x80\x99", // right single quote "\xe2\x80\x9c", // left double quote "\xe2\x80\x9d", // right double quote "\xe2\x80\x94", // em dash "\xe2\x80\xa6" // elipses ); $fixedwordchars=array( "'", "'", '"', '"', '—', '...' ); return str_replace($badwordchars,$fixedwordchars, $string); } ?> I still have some weird chars showing up as â I'm not sure if this is because it's a UTF8 character being submitted through our form then being stored as latin1. It looks like most of the issues are with smart quotes and mdash. The text shows up fine in the web as UTF8 but when I generate an XML of it, it completely craps out. Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/#findComment-1236893 Share on other sites More sharing options...
EdwinPaul Posted June 30, 2011 Share Posted June 30, 2011 Is here anything helpfull for you: http://php.net/manual/en/function.utf8-encode.php ? Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/#findComment-1236909 Share on other sites More sharing options...
xyph Posted June 30, 2011 Share Posted June 30, 2011 Those are characters with accents, and the cent symbol. â If you want to replace every one of those with it's non-accented counterpart, you're going to have a huge list Odd that your XML craps out, all 3 characters are in ISO 8859-1 Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/#findComment-1236911 Share on other sites More sharing options...
schilly Posted June 30, 2011 Author Share Posted June 30, 2011 Those are characters with accents, and the cent symbol. â Well in latin1 there are those symbols but in UTF8 they are a smart quote I think. Because it's a 2-byte character, latin1 shows it as two characters instead of one. Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/#findComment-1236955 Share on other sites More sharing options...
xyph Posted June 30, 2011 Share Posted June 30, 2011 Ahh, I keep messing up ascii and 8859. The issue here is your longtext is 8859 and not UTF-8. Here's some reading http://nicj.net/2011/04/17/mysql-converting-an-incorrect-latin1-column-to-utf8 Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/#findComment-1236958 Share on other sites More sharing options...
schilly Posted July 5, 2011 Author Share Posted July 5, 2011 Ya I think I'm pretty screwed here. I tried the MySQL binary cast UTF8 conversion method and it didn't work. The single quote is all messed, breaking xml validity Sample after converison: With this in mind, and especially if you?¢‚Ǩ‚Ñ¢re an artist, musician, etc., it?¢‚Ǩ‚Ñ¢s extremely There shouldn't be too many bad character types so I figure if I get a list of them all and do a find/replace of their latin1 counterpart it should work. Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/#findComment-1238725 Share on other sites More sharing options...
schilly Posted July 6, 2011 Author Share Posted July 6, 2011 I might have got it through trial and error: <?php function convert_smart_quotes($string){ $search = array(chr(145), chr(146), chr(147), chr(148), chr(151), "\x80", "\x85", "\xA3", "\x96", "\xB5"); $replace = array("'", "'", '"', '"', '-', '…', "\n", "£", "-", "µ"); return str_replace($search, $replace, $string); } ?> Just a few outlier characters I needed to track down and convert. Hopefully I don't run into more in the future. Link to comment https://forums.phpfreaks.com/topic/240816-help-converting-smart-quotes/#findComment-1239294 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.