watsmyname Posted October 2, 2009 Share Posted October 2, 2009 well, lets say text in my language is "वनमान्छे जोडीलाई सन्तान" [without quotes], this text is saved in database as like this format "एकै गा", I have search function in my website where user types search keyword in my language. So problem is it doesnt work because my user types say "वनमान्छे " in searchbox, but in database it is saved in different character format. So how to convert my language into the format saved in database, so that i can match database against the character that is converted. ? Thanks in watsmyname Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/ Share on other sites More sharing options...
Mark Baker Posted October 2, 2009 Share Posted October 2, 2009 You'd be better storing the data in your database as UTF-8, or as the appropriate character set for your language. The following code will convert an individual character such as न to its HTML encoded equivalent (where possible) function CHARACTER($character) { $character = self::flattenSingleValue($character); if (function_exists('mb_convert_encoding')) { return mb_convert_encoding('&#'.intval($character).';', 'UTF-8', 'HTML-ENTITIES'); } else { return chr(intval($character)); } } You could loop through your search string, performing this conversion against each character, to build up a new string that (hopefully) should match what's stored on your database Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-929048 Share on other sites More sharing options...
watsmyname Posted October 4, 2009 Author Share Posted October 4, 2009 Thanks for the reply but the function didnt worked, showed flattenSingleValue undefined error Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930022 Share on other sites More sharing options...
Mark Baker Posted October 4, 2009 Share Posted October 4, 2009 Apologies, Rumplestiltskin, I simply cut and pasted it from a class. The line in question simply traps in case an array is passed into the function rather than a character, and can be completely removed. function CHARACTER($character) { if (function_exists('mb_convert_encoding')) { return mb_convert_encoding('&#'.intval($character).';', 'UTF-8', 'HTML-ENTITIES'); } else { return chr(intval($character)); } } Your best solution would still have been to modify your database charset Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930035 Share on other sites More sharing options...
watsmyname Posted October 4, 2009 Author Share Posted October 4, 2009 Thanks for the quick reply, that wouldnt return the character in format ा, it returns something weird binary like character. I have seen SMF forum's database, they too save our language characters in the format ा, but when i search it from forum with some words in my language it gives correct result...i dont know how they are converting user inputed unicode character into this format and search it in the database, got any idea? Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930040 Share on other sites More sharing options...
Mark Baker Posted October 4, 2009 Share Posted October 4, 2009 Looking at it, the board seems to be messing with some of the code: Anthing that looks like &# should be &# function CHARACTER($character) { if (function_exists('mb_convert_encoding')) { return mb_convert_encoding('&#'.intval($character).';', 'UTF-8', 'HTML-ENTITIES'); } else { return chr(intval($character)); } } The idea is to take the character, convert it to its HTML entity number If that doesn't work, try: function CHARACTER($character) { if (function_exists('mb_convert_encoding')) { return '&#'.intval($character).';'; } else { return intval($character); } } Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930043 Share on other sites More sharing options...
watsmyname Posted October 4, 2009 Author Share Posted October 4, 2009 thanks again, it is returning "" [without quotes] seems intval($character) is returning 0 no matter what i put in the searchbox; intval returns always 0, if the variable is not string isnt it, like intval("12.50")=12 and intval("abc")=0 ?? Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930046 Share on other sites More sharing options...
Mark Baker Posted October 4, 2009 Share Posted October 4, 2009 Well that does tell us that the mbstring extension for PHP isn't enabled. Without that, you're going to have a lot of difficulty manipulating non-ANSI characters in any way. If you're going to be working with character sets like वनमान्छे, (and can't use mbstring or iconv) then you need to be using UTF-8 (or the actual character set in question) consistently between your page, the database and all communications between. You are going to have to use UTF-8 for your database rather than converting to html entities. Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930071 Share on other sites More sharing options...
thebadbad Posted October 4, 2009 Share Posted October 4, 2009 What? The OP is right, intval() returns 0 on strings/characters that can't be directly casted to an integer. Has got nothing to do with the multibyte functions (correct me if I'm wrong). Simply try to use htmlentities() (code tags f**k up the characters): <?php $str = 'वनमान्छे जोडीलाई सन्तान'; $entities = htmlentities($str, ENT_QUOTES, 'UTF-8'); ?> Remember to use the appropriate second parameter, depending on how quotes are stored in your database. But I agree with Mark in that you should store the data as UTF-8 characters instead. Takes up much less space. Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930078 Share on other sites More sharing options...
Mark Baker Posted October 4, 2009 Share Posted October 4, 2009 What? The OP is right, intval() returns 0 on strings/characters that can't be directly casted to an integer. Has got nothing to do with the multibyte functions (correct me if I'm wrong).Correct, the use of intval is a) not correct, b) not capable of handling multibyte... but that's also how the routine was trying to provide a fallback if mb_convert_encoding wasn't available... nothing to do with multibyte strings. There must have been some reason why I used intval() at the time, but no idea what it was now. Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930097 Share on other sites More sharing options...
thebadbad Posted October 4, 2009 Share Posted October 4, 2009 Well, you are still using intval() when mb_convert_encoding() is available, rendering the function useless. Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930100 Share on other sites More sharing options...
Mark Baker Posted October 4, 2009 Share Posted October 4, 2009 Having mulled it over, I've figured out my error. Because it was the wrong function for the OPs solution - which is still to change his database - it was the reverse function, the multibyte equivalent of chr() rather than of ord().... feed it a numeric value and it returns the UTF-8 character with that value. Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930151 Share on other sites More sharing options...
watsmyname Posted October 5, 2009 Author Share Posted October 5, 2009 What? The OP is right, intval() returns 0 on strings/characters that can't be directly casted to an integer. Has got nothing to do with the multibyte functions (correct me if I'm wrong). Simply try to use htmlentities() (code tags f**k up the characters): <?php $str = 'वनमान्छे जोडीलाई सन्तान'; $entities = htmlentities($str, ENT_QUOTES, 'UTF-8'); ?> Remember to use the appropriate second parameter, depending on how quotes are stored in your database. But I agree with Mark in that you should store the data as UTF-8 characters instead. Takes up much less space. This isnt working either i just want to convert the given string to "एकै गा" so that i can search it in the database. BTW what you call this format "एकै गा"? Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930486 Share on other sites More sharing options...
watsmyname Posted October 5, 2009 Author Share Posted October 5, 2009 thanks guys, i found a function that would work perfectly as i wanted <?php function charset_decode_utf_8 ($string) { /* Only do the slow convert if there are 8-bit characters */ /* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */ if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string)) return $string; // decode three byte unicode characters $string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e", "''.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'", $string); // decode two byte unicode characters $string = preg_replace("/([\300-\337])([\200-\277])/e", "''.((ord('\\1')-192)*64+(ord('\\2')-128)).';'", $string); return $string; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/176283-solved-converting-characters/#findComment-930503 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.