Jump to content

Accent Folding for Multilingual PHP


random1

Recommended Posts

How can you create an "Accent-Folding" function in PHP that folds/unfolds foreign accents on characters?

 

I.e. Accent Folding is changing 'Hello, Jürgen!' to 'hello-juergen' and visa versa.

 

Does PHP have a built-in function for this?

 

References:

 

http://www.alistapart.com/articles/accent-folding-for-auto-complete/

http://www.gyro-php.org/posts/16/

Link to comment
https://forums.phpfreaks.com/topic/197969-accent-folding-for-multilingual-php/
Share on other sites

You can use a function (faster than iconv):

EDIT: (The last 128 character turn into entities on this forum, but you may not need them, darn auto encoding eh?)

<?php
function removeaccent($str) {
  $a = array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø','Ù','Ú','Û','Ü','Ý','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ','ö','ø','ù','ú','û','ü','ý','ÿ','Ā','ā','Ă','ă','Ą','ą','Ć','ć','Ĉ','ĉ','Ċ','ċ','Č','č','Ď','ď','Đ','đ','Ē','ē','Ĕ','ĕ','Ė','ė','Ę','ę','Ě','ě','Ĝ','ĝ','Ğ','ğ','Ġ','ġ','Ģ','ģ','Ĥ','ĥ','Ħ','ħ','Ĩ','ĩ','Ī','ī','Ĭ','ĭ','Į','į','İ','ı','IJ','ij','Ĵ','ĵ','Ķ','ķ','Ĺ','ĺ','Ļ','ļ','Ľ','ľ','Ŀ','ŀ','Ł','ł','Ń','ń','Ņ','ņ','Ň','ň','ʼn','Ō','ō','Ŏ','ŏ','Ő','ő','Œ','œ','Ŕ','ŕ','Ŗ','ŗ','Ř','ř','Ś','ś','Ŝ','ŝ','Ş','ş','Š','š','Ţ','ţ','Ť','ť','Ŧ','ŧ','Ũ','ũ','Ū','ū','Ŭ','ŭ','Ů','ů','Ű','ű','Ų','ų','Ŵ','ŵ','Ŷ','ŷ','Ÿ','Ź','ź','Ż','ż','Ž','ž','ſ','ƒ','Ơ','ơ','Ư','ư','Ǎ','ǎ','Ǐ','ǐ','Ǒ','ǒ','Ǔ','ǔ','Ǖ','ǖ','Ǘ','ǘ','Ǚ','ǚ','Ǜ','ǜ','Ǻ','ǻ','Ǽ','ǽ','Ǿ','ǿ'); 
$b = array('A','A','A','A','A','A','AE','C','E','E','E','E','I','I','I','I','D','N','O','O','O','O','O','O','U','U','U','U','Y','s','a','a','a','a','a','a','ae','c','e','e','e','e','i','i','i','i','n','o','o','o','o','o','o','u','u','u','u','y','y','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','D','d','E','e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i','I','i','I','i','IJ','ij','J','j','K','k','L','l','L','l','L','l','L','l','l','l','N','n','N','n','N','n','n','O','o','O','o','O','o','OE','oe','R','r','R','r','R','r','S','s','S','s','S','s','S','s','T','t','T','t','T','t','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Y','Z','z','Z','z','Z','z','s','f','O','o','U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','A','a','AE','ae','O','o');   
   return strreplace($a, $b, $str); 
}
?>

 

Or:

//Convert accents into unaccented, Note iconv is an external executable which may be sluggish compared to string replacing.
$text = iconv('UTF-8', 'US-ASCII//TRANSLIT', $text);

Thanks, how about the reverse: going from unaccented to accented? e.g. juergen into Jürgen

 

Why can't you display the transliterated characters, but store the original accented ones as UTF8? You'd have to do an impossible str_replace scheme which doesn't make so much sense.

Fair enough :D I ended up with:

 

	public function removeAccents($string)
{
	// Accented Array
	$a = array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø',
	'Ù','Ú','Û','Ü','Ý','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ',
	'ö','ø','ù','ú','û','ü','ý','ÿ','&#256;','&#257;','&#258;','&#259;','&#260;','&#261;','&#262;','&#263;',
	'&#264;','&#265;','&#266;','&#267;','&#268;','&#269;','&#270;','&#271;','&#272;','&#273;','&#274;','&#275;',
	'&#276;','&#277;','&#278;','&#279;','&#280;','&#281;','&#282;','&#283;','&#284;','&#285;','&#286;','&#287;',
	'&#288;','&#289;','&#290;','&#291;','&#292;','&#293;','&#294;','&#295;','&#296;','&#297;','&#298;','&#299;',
	'&#300;','&#301;','&#302;','&#303;','&#304;','&#305;','&#306;','&#307;','&#308;','&#309;','&#310;','&#311;',
	'&#313;','&#314;','&#315;','&#316;','&#317;','&#318;','&#319;','&#320;','&#321;','&#322;','&#323;','&#324;',
	'&#325;','&#326;','&#327;','&#328;','&#329;','&#332;','&#333;','&#334;','&#335;','&#336;','&#337;','&#338;',
	'&#339;','&#340;','&#341;','&#342;','&#343;','&#344;','&#345;','&#346;','&#347;','&#348;','&#349;','&#350;',
	'&#351;','&#352;','&#353;','&#354;','&#355;','&#356;','&#357;','&#358;','&#359;','&#360;','&#361;','&#362;',
	'&#363;','&#364;','&#365;','&#366;','&#367;','&#368;','&#369;','&#370;','&#371;','&#372;','&#373;','&#374;',
	'&#375;','&#376;','&#377;','&#378;','&#379;','&#380;','&#381;','&#382;','&#383;','&#402;','&#416;','&#417;',
	'&#431;','&#432;','&#461;','&#462;','&#463;','&#464;','&#465;','&#466;','&#467;','&#468;','&#469;','&#470;',
	'&#471;','&#472;','&#473;','&#474;','&#475;','&#476;','&#506;','&#507;','&#508;','&#509;','&#510;','&#511;');

	// Unaccented
	$b = array('A','A','A','A','A','A','AE','C','E','E','E','E','I','I','I','I','D','N','O','O','O','O','O','O',
	'U','U','U','U','Y','s','a','a','a','a','a','a','ae','c','e','e','e','e','i','i','i','i','n','o','o','o','o',
	'o','o','u','u','u','u','y','y','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','D','d','E',
	'e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i',
	'I','i','I','i','IJ','ij','J','j','K','k','L','l','L','l','L','l','L','l','l','l','N','n','N','n','N','n','n',
	'O','o','O','o','O','o','OE','oe','R','r','R','r','R','r','S','s','S','s','S','s','S','s','T','t','T','t','T',
	't','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Y','Z','z','Z','z','Z','z','s','f','O','o',
	'U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','A','a','AE','ae','O','o');

	return str_replace($a, $b, $string);

 

Just needed to use str_replace instead of strreplace.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.