Jump to content

[SOLVED] character sets and php


wrathican

Recommended Posts

hey people, i have recently had a bit of trouble with character sets and php/mysql

 

a client is pasting stuff from word documents into a form field. this pasted data seems to contain 'funny' characters.

 

when dis playing the data on a page i see diamond '?' where the character shoule be.

 

ive run some test strpos and preg_replace to see if i can replace the characters with normal characters but to no avail.

 

i first did this little test:

<?php

$needle = '“';
$haystack = "“We felt it was important to contribute something useful.”";
$pos = strpos($haystack, $needle);
if ($pos !== false) {
 echo "The string '{$needle}' was found in the string '{$haystack}'";
 echo " and exists at position {$pos}<br><br><br><br><br>";
 $string = preg_replace('~(“)|(”)~', '"', $haystack);
 echo $string;
} else {
 echo "The string '{$needle}' was not found in the string '{$haystack}'";
}

?>

this worked. and converted the 'funny' character to what i wanted.

 

now i tried the same thing but using a database selection and apparently there was no match.

 

the only difference was that $haystack was equal to a datbase field.

 

What can i do to convert these characters?

 

Thanks

 

Wrath

Link to comment
Share on other sites

thanks for the reply,

 

i tried that but again to no avail.

 

i echoed three combinations of my output:

1 - normal echo.

2 - utf8_encode

3 - utf8_decode

 

1 output the normal string with inline 'funny' characters

 

2 out put my string but with  infront of all my 'funny' characters

 

3 did the same as 1

 

my database coallition is: latin1_swedish_ci - if that helps

 

 

Link to comment
Share on other sites

yeah, i know that pasting from MS word isnt ideal.

 

but what i am asking is if there is a way in which i can detect what the charset of a string is, then convert the string to a normal charset

 

I honestly do not think you can, the best you can do is assume that you have to replace certain characters already. I hate MS Word for this exact reason, you have to check for those smart quotes, the - and a bunch of other non-sense and replace them.

 

The easiest way I found was to create 2 arrays, one with the bad vals and one with the good vals and use that replace the bad vals with the good vals.

 

The worst part was this happened to me after I had my site running for about 6 months, so changing charsets was not probable. Wish I would have known to use a different charset back then. Oh well.

 

Hope that helps.

Link to comment
Share on other sites

yeah, i know that pasting from MS word isnt ideal.

 

but what i am asking is if there is a way in which i can detect what the charset of a string is, then convert the string to a normal charset

 

You can do it with multi-byte charsets, but I don't know if you can do it with single-byte charsets.

 

That being said, I don't think that the problem is with your charset - word deals with charsets just fine (I actually use it to detect charsets sometimes when document encoding gets screwed up, as almost every site I deal with is in Japanese), the problem is that word adds extra non-visible characters to the text, which show up weird when you paste them into the browser. This means that word isn't just 'not an ideal solution', rather it's the wrong solution.

 

If your client insists on doing it this way, have them paste the text into notepad or wordpad, and then copy it again and paste it into the browser. That *should* strip out all the extra characters.

Link to comment
Share on other sites

Thanks for the advice, and i will certainly take this up with my client.

 

I managed to figure a way around for the already inserted items.

 

I tried converting the string (using PHP's iconv function) into different charsets to find the correct one, when i had i did exactly what you said premiso, i had an array of good and bad chars and used preg_replace to sort it out.

 

I hope this will be of use to someone in the future.

 

Thanks for the advice people!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.