msr Posted March 14, 2007 Share Posted March 14, 2007 Hi folks, I would be thankful if someone could clarify the following points. I have been having a real hell of a time dealing with unicode in php... in fact i wanna dump php and move elsewhere....! 1. I am using str_replace function to replace a string, say : $x=str_replace ("a", è£",$x);. It is supposed to replace all "a" with "è£" but it is replacing it with "裠" (probably a japanese character). The ascii values of the characters in the string ("è£") as given by the ord function is 232 and 163. But if i use: echo chr(232).chr(163), i get the same "裠". What is wrong here? I am using the utf-8 charset. Experimenting with other charsets too did not produce much different results. 2. A generic question: How does one handle unicode in php? Is python or something better than php for handling unicodes extensively? thanks -msr Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/ Share on other sites More sharing options...
per1os Posted March 14, 2007 Share Posted March 14, 2007 Do you have the charset declared in the html? If not try that, if that did not help try a different charset, if there was a charset defined remove that char set. See what happens. --FrosT Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207136 Share on other sites More sharing options...
effigy Posted March 14, 2007 Share Posted March 14, 2007 è is 0xE8 and £ is 0xA3. The ideogram 裠 (which means a short skirt by the way), has a UTF-8 encoding of 0xE8 0xA3 0xA0. Do you see the connection? è and £ are two thirds of the encoding for 裠. Are you sure your replace is working? Are the a's being removed? What are the surrounding characters? Is the string you're processing encoded or decoded? Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207170 Share on other sites More sharing options...
msr Posted March 14, 2007 Author Share Posted March 14, 2007 hi frost: yes, i did mention in the html that it is charset utf-8 hi effigy: thanks for your observation... yes, the replacement is working....the sad part is that only that is working... not the way as intended... (and btw, how did you find the lookup table for the skirt symbol...probably chinese girl's!) i tried to print: $ss1=str_replace("2*3",chr(232).chr(163),$ss1); echo $ss1; => it is outputing the same short skirt!...) also in a series of replacements, php is ignoring valid replacements... The case here is: ... $st=str_replace("$2995;$3016;","¬÷", $st); $st=str_replace("$2979;$3016;","¬í", $st); $st=str_replace("$2970;$3016;","¬ê", $st); $st=str_replace("$2965;$3006;","è£", $st); ..... (x) $ss="2*3"; //some dummy expression to test $ss1=str_replace("2*3",chr(232).chr(163),$ss); echo $ss1; ... in this case the statement marked (x), though valid replacement is to be made, has been ignored. thanks! Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207198 Share on other sites More sharing options...
effigy Posted March 14, 2007 Share Posted March 14, 2007 U+88E0 What is in your string and how is the string encoded? Keep in mind that some replaces may have an affect on others. What are you trying to achieve through all of these? Are you cleaning up incorrect data, attempting a character set conversion...? Is "¬÷" the actual replacement you want, or an encoding for another character? If you want the literal characters, you should be passing them through utf8_encode. Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207217 Share on other sites More sharing options...
msr Posted March 14, 2007 Author Share Posted March 14, 2007 hi effigy!... as you guessed correctly, i am attempting a font conversion...the program is to convert unicode to a given font and vice versa... i tried help on utf8_encode but i felt it is not the one i wanted... the long string of characters (as mentioned, eg: $2995;$3016;) are the unicode characters (ளை) that are to be replaced by those extended ascii symbols... the javascript version is working perfectly... thanks!! Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207227 Share on other sites More sharing options...
effigy Posted March 14, 2007 Share Posted March 14, 2007 Here's an example: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <pre> <?php ### Make a UTF-8 string. $left_dbl_quote = pack('c*', 0xE2, 0x80, 0x9C); $right_dbl_quote = pack('c*', 0xE2, 0x80, 0x9D); print $utf8_string = $left_dbl_quote . 'quote' . $right_dbl_quote; print '<br>'; ### Convert it to ISO-8859-1. $iso_8859_1_string = preg_replace('/[\x{201C}-\x{201D}]/u', '"', $utf8_string); print $iso_8859_1_string; ?> </pre> You may want to look at this; I found it in the User Notes on php.net. Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207262 Share on other sites More sharing options...
msr Posted March 14, 2007 Author Share Posted March 14, 2007 thanks a lot genius!... i have a feeling that it might work now... i will get back with my notes... Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207327 Share on other sites More sharing options...
msr Posted March 14, 2007 Author Share Posted March 14, 2007 Hi!.... the above example has not helped.... the output this time was 裢 ! The class at http://mikolajj.republika.pl/ is excellent... but i dont know if it would help solve the problem easily...! Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207502 Share on other sites More sharing options...
effigy Posted March 15, 2007 Share Posted March 15, 2007 Did you run the code I posted by itself? What version of PHP are you using? I get the following: “quote” "quote" If you still have troubles, could you provide a specific example, like mine, which contains the original string and the desired string? Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-207623 Share on other sites More sharing options...
msr Posted March 18, 2007 Author Share Posted March 18, 2007 i found the problem... it was in fact in the declaration of charset=utf-8; once it was removed, it worked fine. Interfacing html with php leads to such problems... Thanks!!! Quote Link to comment https://forums.phpfreaks.com/topic/42694-a-question-of-character/#findComment-210154 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.