dorm Posted July 24, 2007 Share Posted July 24, 2007 Hello, I'm a new user in this forum from Israel. At first I would like to say sorry for my bad English. And now for my problem, that it's solution I couldn't find anywhere so you're kind of my last hope. I'm writing a system with PHP which encodes with UTF-8 encoding. Everything is encoded with UTF-8 encoding. In order to work with UTF-8 encoded strings, I need to use special functions - mbString function (stands for Multi Byte String), that specially compatible for UTF-8 encoding and others. The problem is that there aren't enough mbString functions so that I will be able to work well with UTF-8 encoded strings. Many important mbString functions are missing. I wrote a list of regular functions and I need to know if they can work well & suitable for UTF-8 encoded strings. Here is the list (links to the functions are included): mysql_real_escape_string() http://il2.php.net/manual/en/function.mysql-real-escape-string.php stripslashes() http://il2.php.net/manual/en/function.stripslashes.php addslashes() http://il2.php.net/manual/en/function.addslashes.php strstr() http://il2.php.net/manual/en/function.strstr.php trim() http://il2.php.net/manual/en/function.trim.php wordwrap() http://il2.php.net/manual/en/function.wordwrap.php vsprintf() http://il2.php.net/manual/en/function.vsprintf.php nl2br() http://il.php.net/manual/en/function.nl2br.php The list above contains only part of the functions that I need to know if I can use with UTF-8 encoded strings. Does someone know if the above functions are compatible for UTF-8 encoded strings? How can I tell which functions is suitable for UTF-8 encoded strings? If all the above functions aren't compatibale for UTF-8 encoded strings, so what am I need to do which replace these functions? What is the solution? THANK YOU VERY MUCH !!! Dor. Quote Link to comment Share on other sites More sharing options...
dorm Posted July 25, 2007 Author Share Posted July 25, 2007 Please help me with this, each forum that I ask this question I get 0 answers. It's very important to me and preventing my project's advancement. Quote Link to comment Share on other sites More sharing options...
teng84 Posted July 25, 2007 Share Posted July 25, 2007 do yuo know what utf8? the function you list works in regular character but i dont try that using greek i dont use special char so i dont know maybe try to use even one of those functon using your character then if it works i guess everything will work fine Quote Link to comment Share on other sites More sharing options...
ss32 Posted July 25, 2007 Share Posted July 25, 2007 you could try writing those functions yourself... coding wise they are very simple (unlike a complex regex interpreter) Quote Link to comment Share on other sites More sharing options...
btherl Posted July 25, 2007 Share Posted July 25, 2007 The functions you listed are fine for UTF8 in MOST cases. UTF8 extends ascii by using characters with the high bit set, so as long as you are dealing only with the standard ascii character set, you are ok. Basically, most functions will treat your utf8 extended characters as binary data and ignore them. For example, stripslashes() deals with the character '\', which is standard ascii, so it is safe. But calling trim() to trim a character above 0x7f may corrupt your UTF8. Standard trim() is fine Functions like mysql_real_escape_string() are binary-safe, so you do not need to worry what encoding you are using. binary safe - mysql_real_escape_string() http://il2.php.net/manual/en/function.mysql-real-escape-string.php safe - stripslashes() http://il2.php.net/manual/en/function.stripslashes.php ? - addslashes() http://il2.php.net/manual/en/function.addslashes.php safe - strstr() http://il2.php.net/manual/en/function.strstr.php safe for input < 0x7f - trim() http://il2.php.net/manual/en/function.trim.php ? probably safe - wordwrap() http://il2.php.net/manual/en/function.wordwrap.php safe - vsprintf() http://il2.php.net/manual/en/function.vsprintf.php safe - nl2br() http://il.php.net/manual/en/function.nl2br.php I don't think addslashes makes much sense on UTF8. I would avoid it if possible. Whatever you use addslashes for can be replaced with more specific escaping. Note that trim() will only trim ascii whitespace, and will not trim any UTF8 characters that are "whitespace". You'll have to do that yourself if you happen to have some of those. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.