[SOLVED] Converting characters

watsmyname · October 2, 2009

well,

lets say text in my language is "वनमान्छे जोडीलाई सन्तान" [without quotes], this text is saved in database as like this format "एकै गा", I have search function in my website where user types search keyword in my language. So problem is it doesnt work because my user types say "वनमान्छे " in searchbox, but in database it is saved in different character format. So how to convert my language into the format saved in database, so that i can match database against the character that is converted. ?

Thanks

in watsmyname

Mark Baker · October 2, 2009

You'd be better storing the data in your database as UTF-8, or as the appropriate character set for your language.

The following code will convert an individual character such as न to its HTML encoded equivalent (where possible)

function CHARACTER($character) {
$character	= self::flattenSingleValue($character);

if (function_exists('mb_convert_encoding')) {
	return mb_convert_encoding('&#'.intval($character).';', 'UTF-8', 'HTML-ENTITIES');
} else {
	return chr(intval($character));
}
}

You could loop through your search string, performing this conversion against each character, to build up a new string that (hopefully) should match what's stored on your database

watsmyname · October 4, 2009

Thanks for the reply but the function didnt worked, showed flattenSingleValue undefined error

Mark Baker · October 4, 2009

Apologies, Rumplestiltskin, I simply cut and pasted it from a class.

The line in question simply traps in case an array is passed into the function rather than a character, and can be completely removed.

function CHARACTER($character) {
if (function_exists('mb_convert_encoding')) {
	return mb_convert_encoding('&#38;#'.intval($character).';', 'UTF-8', 'HTML-ENTITIES');
} else {
	return chr(intval($character));
}
}

Your best solution would still have been to modify your database charset

watsmyname · October 4, 2009

Thanks for the quick reply, that wouldnt return the character in format ा, it returns something weird binary like character. I have seen SMF forum's database, they too save our language characters in the format ा, but when i search it from forum with some words in my language it gives correct result...i dont know how they are converting user inputed unicode character into this format and search it in the database, got any idea?

Mark Baker · October 4, 2009

Looking at it, the board seems to be messing with some of the code:

Anthing that looks like &# should be &#

function CHARACTER($character) {
if (function_exists('mb_convert_encoding')) {
	return mb_convert_encoding('&#'.intval($character).';', 'UTF-8', 'HTML-ENTITIES');
} else {
	return chr(intval($character));
}
}

The idea is to take the character, convert it to its HTML entity number

If that doesn't work, try:

function CHARACTER($character) {
if (function_exists('mb_convert_encoding')) {
	return '&#'.intval($character).';';
} else {
	return intval($character);
}
}

watsmyname · October 4, 2009

thanks again,

it is returning "" [without quotes]

seems intval($character) is returning 0 no matter what i put in the searchbox;

intval returns always 0, if the variable is not string isnt it, like intval("12.50")=12 and intval("abc")=0 ??

Mark Baker · October 4, 2009

Well that does tell us that the mbstring extension for PHP isn't enabled. Without that, you're going to have a lot of difficulty manipulating non-ANSI characters in any way.

If you're going to be working with character sets like वनमान्छे, (and can't use mbstring or iconv) then you need to be using UTF-8 (or the actual character set in question) consistently between your page, the database and all communications between. You are going to have to use UTF-8 for your database rather than converting to html entities.

thebadbad · October 4, 2009

What? The OP is right, intval() returns 0 on strings/characters that can't be directly casted to an integer. Has got nothing to do with the multibyte functions (correct me if I'm wrong).

Simply try to use htmlentities() (code tags f**k up the characters):

<?php

$str = 'वनमान्छे जोडीलाई सन्तान';

$entities = htmlentities($str, ENT_QUOTES, 'UTF-8');

?>

Remember to use the appropriate second parameter, depending on how quotes are stored in your database.

But I agree with Mark in that you should store the data as UTF-8 characters instead. Takes up much less space.

Mark Baker · October 4, 2009

What? The OP is right, intval() returns 0 on strings/characters that can't be directly casted to an integer. Has got nothing to do with the multibyte functions (correct me if I'm wrong).

Correct, the use of intval is a) not correct, b) not capable of handling multibyte... but that's also how the routine was trying to provide a fallback if mb_convert_encoding wasn't available... nothing to do with multibyte strings.

There must have been some reason why I used intval() at the time, but no idea what it was now.

thebadbad · October 4, 2009

Well, you are still using intval() when mb_convert_encoding() is available, rendering the function useless.

Mark Baker · October 4, 2009

Having mulled it over, I've figured out my error.

Because it was the wrong function for the OPs solution - which is still to change his database - it was the reverse function, the multibyte equivalent of chr() rather than of ord().... feed it a numeric value and it returns the UTF-8 character with that value.

watsmyname · October 5, 2009

What? The OP is right, intval() returns 0 on strings/characters that can't be directly casted to an integer. Has got nothing to do with the multibyte functions (correct me if I'm wrong).

Simply try to use htmlentities() (code tags f**k up the characters):

<?php

$str = 'वनमान्छे जोडीलाई सन्तान';

$entities = htmlentities($str, ENT_QUOTES, 'UTF-8');

?>

Remember to use the appropriate second parameter, depending on how quotes are stored in your database.

But I agree with Mark in that you should store the data as UTF-8 characters instead. Takes up much less space.

This isnt working either

i just want to convert the given string to "एकै गा" so that i can search it in the database. BTW what you call this format "एकै गा"?

watsmyname · October 5, 2009

thanks guys, i found a function that would work perfectly as i wanted

<?php
function charset_decode_utf_8 ($string) {
      /* Only do the slow convert if there are 8-bit characters */
    /* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
    if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
        return $string;

    // decode three byte unicode characters
    $string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",       
    "''.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",   
    $string);

    // decode two byte unicode characters
    $string = preg_replace("/([\300-\337])([\200-\277])/e",
    "''.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
    $string);

    return $string;
} 
?>

Sign In

[SOLVED] Converting characters

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Important Information