Jump to content

UTF-8 encoding not working on select pages


Minklet

Recommended Posts

Hi, I am geting questions marks in place of some characters. However, this only happens on my own queries and echo statements, the same database read by wordpress handles them fine.

 

Can anyone tell me why?

 

I have added UTF-8 headers and meta tags, the database is in UTF-8 encoding. I just can't figure out why wordpress can manage it and my php cant. I have tried htmlentities with no luck

 

the offending page:

http://subverb.net

 

the wordpress blog echoing out the exact same title from the same database

http://nottingham.subverb.net/blog/sounddhism/

 

 

Thankyou

Link to comment
Share on other sites

Sorry, I am unable to validate this document because on line 516 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.

 

The error was: utf8 "\x96" does not map to Unicode

Link to comment
Share on other sites

I realise that it seems to be frowned upon to actually help people on this forum, but I genuinely cant find how to fix this. I've tried everything I can think of and had no joy, I've been looking all weekend. This is quite important and I can't see why wordpress can manage it with the same charset encoding, yet I can't in regular php.

 

Can someone give me a hand please?

Link to comment
Share on other sites

Hi i have no working solution but i copy pasted a part of the errormessage in google

The error was: utf8 "\x96" does not map to Unicode

And there are quite some threads around the net about this.

in the end i ended up on this page: http://php.net/manual/en/function.utf8-encode.php

 

Maybe have a look at that. some comments gave example of code.

Also some fora told it could be due to some poor copy pasting of code.

the UTF-8 shouldnt 'that be utf-8 in your meta tag?

 

I hope this helps, because i have no real experience with this.

 

Link to comment
Share on other sites

Thanks for checking.

 

I tried this

<?php 
function _convert($content) { 
    if(!mb_check_encoding($content, 'UTF-8') 
        OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) { 

        $content = mb_convert_encoding($content, 'UTF-8'); 

        if (mb_check_encoding($content, 'UTF-8')) { 
            // log('Converted to UTF-8'); 
        } else { 
            // log('Could not converted to UTF-8'); 
        } 
    } 
    return $content; 
} 
?>

 

With no result. What I don't understand is that wordpress manages to display the exact same database results. The offending content is copied directly from facebook, which I HAVE to be able to do, there is no way I can ask clients to retype everything - why would they?

 

This is really irritating. I would go through the wordpress functions, but I really don't know where it would be and it's very doubtful that it would be easily decipherable

Link to comment
Share on other sites

Isn't this a good one?

<?php
function get_correct_utf8_mysql_string($s)
{
    if(empty($s)) return $s;
    $s = preg_match_all("#[\x09\x0A\x0D\x20-\x7E]|
[\xC2-\xDF][\x80-\xBF]|
\xE0[\xA0-\xBF][\x80-\xBF]|
[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|
\xED[\x80-\x9F][\x80-\xBF]#x", $s, $m );
    return implode("",$m[0]);
}
?> 

 

Credits to http://www.php.net/manual/en/function.utf8-encode.php#99982

Link to comment
Share on other sites

I've just tried copying the contents of the field directly from phpadmin into my text editor and it pasted the dodgy characters as question marks, so this is even before it gets to browser there is a problem, does this mean anything?

 

The database is in utf8_general_ci

 

Apparently that's just my text editor, as copying and pasting into a text field on another website displays the characters correctly. I'm getting seriously annoyed by this now.

Link to comment
Share on other sites

P.s. this is the series of characters that are throwing up a question mark

 

∷∷∷∷∷∷

 

and these

 

○○○

 

and they aren't in the UTF-8 table, so does that mean I should be using a different encoding? Which wouldn't make sense because that is what the wordpress pages are using.

Link to comment
Share on other sites

Not that I know anything about it but it could be that the Wordpress server is set up differently from the server running your code? Apache is outside of my knowledge. I have had issues with character encoding for years and I confess I've never actually found a perfect solution.

 

It's running from the same server and it's the same database

Link to comment
Share on other sites

Is there anything that might be going on at the server that I should ask about?

 

But then, wordpress is working managing to display it absolutely fine.

 

RSS validation is throwing this:

Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII"

 

and

'utf8' codec can't decode byte 0x96 in position 3657: unexpected code byte (maybe a high-bit character?)

 

 

The character in question is just a dash, that is actually being displayed fine when rendered as content. F*cksake.

Link to comment
Share on other sites

Sorry to keep bumping this but I really need to get this sorted and the stupid modify rules for posts on here mean i cant just add information to posts.

 

It seems copying and pasting from other blogs, facebook and other sites into the text inputs for my site results in something that causes it to render them incorrectly

 

For instance, this tracklist copied into the tracklisting field is fine

http://forum.breakbeat.co.uk/tm.aspx?m=1972135876&mpage=1&key=shiverman

 

But this one is not

http://nottingham.subverb.net/blog/saltwater/

 

Despite being EXACTLY THE SAME

 

I cant not allow my users to copy and paste from the main sites that they use and even the site that they are on! What is going on, can someone PLEASE help me here? There has to be something that I am doing wrong, either on the inputs or outputs - but I CANT FIND OUT WHAT

Link to comment
Share on other sites

I may be able to help you.

 

I recently built a website that needed to display in Russian and therefore required UTF-8 encoding. What I did was opened up in Notepad++ the various scripts that weren't displaying the characters correctly and encoded them as UTF-8 WITHOUT BOM.

 

I'm not sure whether this would work for you but worked a treat for me.

Link to comment
Share on other sites

I may be able to help you.

 

I recently built a website that needed to display in Russian and therefore required UTF-8 encoding. What I did was opened up in Notepad++ the various scripts that weren't displaying the characters correctly and encoded them as UTF-8 WITHOUT BOM.

 

I'm not sure whether this would work for you but worked a treat for me.

 

It's worth a try. Appreciated

Link to comment
Share on other sites

Sadly didn't make a difference.

 

One of the characters is an ampersand and is outputted as &  How can there be a problem outputting this? Surely this is a clue?

 

I'm actually willing to pay someone to help me now, so if anyone can come up with a reasonable offer to help me, give me a shout. PLEASE!

Link to comment
Share on other sites

Still cant get this to work.

 

Surely someone can point me to something to help me out? I refuse to believe that this issue is unfixable. These characters are handled fine by countless other sites, surely there is an answer. Like I said I can pay someone something (not much I'm a student).

 

I might do a larry david and pay for just a response.

Link to comment
Share on other sites

Strangely, despite adding it in the scripts, adding UTF-8 to htmlentities and changing the mysql charset did the trick.

 

Thanks to the people who tried to help

 

No thanks to the people who didn't (not one of you knew of a decent resource or even a place to find a freelancer?), I cant remember the last time I got some decent help on this forum. Sarcastic posting of the manual is usually the best I get, regardless of the question. Dev Shed helped me within an hour and 1 post, so kudos to them

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.