Minklet Posted February 18, 2011 Share Posted February 18, 2011 Hi, I am geting questions marks in place of some characters. However, this only happens on my own queries and echo statements, the same database read by wordpress handles them fine. Can anyone tell me why? I have added UTF-8 headers and meta tags, the database is in UTF-8 encoding. I just can't figure out why wordpress can manage it and my php cant. I have tried htmlentities with no luck the offending page: http://subverb.net the wordpress blog echoing out the exact same title from the same database http://nottingham.subverb.net/blog/sounddhism/ Thankyou Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 18, 2011 Author Share Posted February 18, 2011 I have also tried SET NAMES utf8 and it then prints the string with the question mark in the diamond symbol Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 20, 2011 Author Share Posted February 20, 2011 Anybody? Quote Link to comment Share on other sites More sharing options...
Pikachu2000 Posted February 20, 2011 Share Posted February 20, 2011 Sorry, I am unable to validate this document because on line 516 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: utf8 "\x96" does not map to Unicode Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 20, 2011 Author Share Posted February 20, 2011 I don't know to rectify this tho. There are very little results in google for this error, none of which I can get any useful information from Ive tried utf8_decode and encode with no results Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 21, 2011 Author Share Posted February 21, 2011 I realise that it seems to be frowned upon to actually help people on this forum, but I genuinely cant find how to fix this. I've tried everything I can think of and had no joy, I've been looking all weekend. This is quite important and I can't see why wordpress can manage it with the same charset encoding, yet I can't in regular php. Can someone give me a hand please? Quote Link to comment Share on other sites More sharing options...
cssfreakie Posted February 21, 2011 Share Posted February 21, 2011 Hi i have no working solution but i copy pasted a part of the errormessage in google The error was: utf8 "\x96" does not map to Unicode And there are quite some threads around the net about this. in the end i ended up on this page: http://php.net/manual/en/function.utf8-encode.php Maybe have a look at that. some comments gave example of code. Also some fora told it could be due to some poor copy pasting of code. the UTF-8 shouldnt 'that be utf-8 in your meta tag? I hope this helps, because i have no real experience with this. Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 21, 2011 Author Share Posted February 21, 2011 Thanks for checking. I tried this <?php function _convert($content) { if(!mb_check_encoding($content, 'UTF-8') OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) { $content = mb_convert_encoding($content, 'UTF-8'); if (mb_check_encoding($content, 'UTF-8')) { // log('Converted to UTF-8'); } else { // log('Could not converted to UTF-8'); } } return $content; } ?> With no result. What I don't understand is that wordpress manages to display the exact same database results. The offending content is copied directly from facebook, which I HAVE to be able to do, there is no way I can ask clients to retype everything - why would they? This is really irritating. I would go through the wordpress functions, but I really don't know where it would be and it's very doubtful that it would be easily decipherable Quote Link to comment Share on other sites More sharing options...
cssfreakie Posted February 21, 2011 Share Posted February 21, 2011 Isn't this a good one? <?php function get_correct_utf8_mysql_string($s) { if(empty($s)) return $s; $s = preg_match_all("#[\x09\x0A\x0D\x20-\x7E]| [\xC2-\xDF][\x80-\xBF]| \xE0[\xA0-\xBF][\x80-\xBF]| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}| \xED[\x80-\x9F][\x80-\xBF]#x", $s, $m ); return implode("",$m[0]); } ?> Credits to http://www.php.net/manual/en/function.utf8-encode.php#99982 Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 21, 2011 Author Share Posted February 21, 2011 Sadly hasn't made a difference. Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 21, 2011 Author Share Posted February 21, 2011 I've just tried copying the contents of the field directly from phpadmin into my text editor and it pasted the dodgy characters as question marks, so this is even before it gets to browser there is a problem, does this mean anything? The database is in utf8_general_ci Apparently that's just my text editor, as copying and pasting into a text field on another website displays the characters correctly. I'm getting seriously annoyed by this now. Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 21, 2011 Author Share Posted February 21, 2011 P.s. this is the series of characters that are throwing up a question mark ∷∷∷∷∷∷ and these ○○○ and they aren't in the UTF-8 table, so does that mean I should be using a different encoding? Which wouldn't make sense because that is what the wordpress pages are using. Quote Link to comment Share on other sites More sharing options...
harristweed Posted February 21, 2011 Share Posted February 21, 2011 Not that I know anything about it but it could be that the Wordpress server is set up differently from the server running your code? Apache is outside of my knowledge. I have had issues with character encoding for years and I confess I've never actually found a perfect solution. Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 21, 2011 Author Share Posted February 21, 2011 Not that I know anything about it but it could be that the Wordpress server is set up differently from the server running your code? Apache is outside of my knowledge. I have had issues with character encoding for years and I confess I've never actually found a perfect solution. It's running from the same server and it's the same database Quote Link to comment Share on other sites More sharing options...
harristweed Posted February 21, 2011 Share Posted February 21, 2011 Oh! This also might be a bit obvious but what's the page encoding? <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> or <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 21, 2011 Author Share Posted February 21, 2011 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Tried charset=iso-8859-1 and it managed to display the ••• characters correctly, but still not correct on the ones I posted above Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 23, 2011 Author Share Posted February 23, 2011 This is still not working and it is even breaking RSS feed. Does anyone have ANY information or ideas where I can get some? This is actually driving me insane Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 23, 2011 Author Share Posted February 23, 2011 Is there anything that might be going on at the server that I should ask about? But then, wordpress is working managing to display it absolutely fine. RSS validation is throwing this: Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII" and 'utf8' codec can't decode byte 0x96 in position 3657: unexpected code byte (maybe a high-bit character?) The character in question is just a dash, that is actually being displayed fine when rendered as content. F*cksake. Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 23, 2011 Author Share Posted February 23, 2011 Sorry to keep bumping this but I really need to get this sorted and the stupid modify rules for posts on here mean i cant just add information to posts. It seems copying and pasting from other blogs, facebook and other sites into the text inputs for my site results in something that causes it to render them incorrectly For instance, this tracklist copied into the tracklisting field is fine http://forum.breakbeat.co.uk/tm.aspx?m=1972135876&mpage=1&key=shiverman But this one is not http://nottingham.subverb.net/blog/saltwater/ Despite being EXACTLY THE SAME I cant not allow my users to copy and paste from the main sites that they use and even the site that they are on! What is going on, can someone PLEASE help me here? There has to be something that I am doing wrong, either on the inputs or outputs - but I CANT FIND OUT WHAT Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 24, 2011 Author Share Posted February 24, 2011 Does anyone have ANY information about sanitizing (or whatever the term would be) text copied from websites into a text box? Because this could be worth an attempt to sort it out maybe. Seriously, no one can help me here? Quote Link to comment Share on other sites More sharing options...
mattspriggs28 Posted February 24, 2011 Share Posted February 24, 2011 I may be able to help you. I recently built a website that needed to display in Russian and therefore required UTF-8 encoding. What I did was opened up in Notepad++ the various scripts that weren't displaying the characters correctly and encoded them as UTF-8 WITHOUT BOM. I'm not sure whether this would work for you but worked a treat for me. Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 24, 2011 Author Share Posted February 24, 2011 I may be able to help you. I recently built a website that needed to display in Russian and therefore required UTF-8 encoding. What I did was opened up in Notepad++ the various scripts that weren't displaying the characters correctly and encoded them as UTF-8 WITHOUT BOM. I'm not sure whether this would work for you but worked a treat for me. It's worth a try. Appreciated Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 26, 2011 Author Share Posted February 26, 2011 Sadly didn't make a difference. One of the characters is an ampersand and is outputted as & How can there be a problem outputting this? Surely this is a clue? I'm actually willing to pay someone to help me now, so if anyone can come up with a reasonable offer to help me, give me a shout. PLEASE! Quote Link to comment Share on other sites More sharing options...
Minklet Posted February 28, 2011 Author Share Posted February 28, 2011 Still cant get this to work. Surely someone can point me to something to help me out? I refuse to believe that this issue is unfixable. These characters are handled fine by countless other sites, surely there is an answer. Like I said I can pay someone something (not much I'm a student). I might do a larry david and pay for just a response. Quote Link to comment Share on other sites More sharing options...
Minklet Posted March 1, 2011 Author Share Posted March 1, 2011 Strangely, despite adding it in the scripts, adding UTF-8 to htmlentities and changing the mysql charset did the trick. Thanks to the people who tried to help No thanks to the people who didn't (not one of you knew of a decent resource or even a place to find a freelancer?), I cant remember the last time I got some decent help on this forum. Sarcastic posting of the manual is usually the best I get, regardless of the question. Dev Shed helped me within an hour and 1 post, so kudos to them Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.