Jump to content

Accent Folding for Multilingual PHP


random1

Recommended Posts

How can you create an "Accent-Folding" function in PHP that folds/unfolds foreign accents on characters?

 

I.e. Accent Folding is changing 'Hello, Jürgen!' to 'hello-juergen' and visa versa.

 

Does PHP have a built-in function for this?

 

References:

 

http://www.alistapart.com/articles/accent-folding-for-auto-complete/

http://www.gyro-php.org/posts/16/

Link to comment
Share on other sites

You can use a function (faster than iconv):

EDIT: (The last 128 character turn into entities on this forum, but you may not need them, darn auto encoding eh?)

<?php
function removeaccent($str) {
  $a = array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø','Ù','Ú','Û','Ü','Ý','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ','ö','ø','ù','ú','û','ü','ý','ÿ','Ā','ā','Ă','ă','Ą','ą','Ć','ć','Ĉ','ĉ','Ċ','ċ','Č','č','Ď','ď','Đ','đ','Ē','ē','Ĕ','ĕ','Ė','ė','Ę','ę','Ě','ě','Ĝ','ĝ','Ğ','ğ','Ġ','ġ','Ģ','ģ','Ĥ','ĥ','Ħ','ħ','Ĩ','ĩ','Ī','ī','Ĭ','ĭ','Į','į','İ','ı','IJ','ij','Ĵ','ĵ','Ķ','ķ','Ĺ','ĺ','Ļ','ļ','Ľ','ľ','Ŀ','ŀ','Ł','ł','Ń','ń','Ņ','ņ','Ň','ň','ʼn','Ō','ō','Ŏ','ŏ','Ő','ő','Œ','œ','Ŕ','ŕ','Ŗ','ŗ','Ř','ř','Ś','ś','Ŝ','ŝ','Ş','ş','Š','š','Ţ','ţ','Ť','ť','Ŧ','ŧ','Ũ','ũ','Ū','ū','Ŭ','ŭ','Ů','ů','Ű','ű','Ų','ų','Ŵ','ŵ','Ŷ','ŷ','Ÿ','Ź','ź','Ż','ż','Ž','ž','ſ','ƒ','Ơ','ơ','Ư','ư','Ǎ','ǎ','Ǐ','ǐ','Ǒ','ǒ','Ǔ','ǔ','Ǖ','ǖ','Ǘ','ǘ','Ǚ','ǚ','Ǜ','ǜ','Ǻ','ǻ','Ǽ','ǽ','Ǿ','ǿ'); 
$b = array('A','A','A','A','A','A','AE','C','E','E','E','E','I','I','I','I','D','N','O','O','O','O','O','O','U','U','U','U','Y','s','a','a','a','a','a','a','ae','c','e','e','e','e','i','i','i','i','n','o','o','o','o','o','o','u','u','u','u','y','y','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','D','d','E','e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i','I','i','I','i','IJ','ij','J','j','K','k','L','l','L','l','L','l','L','l','l','l','N','n','N','n','N','n','n','O','o','O','o','O','o','OE','oe','R','r','R','r','R','r','S','s','S','s','S','s','S','s','T','t','T','t','T','t','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Y','Z','z','Z','z','Z','z','s','f','O','o','U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','A','a','AE','ae','O','o');   
   return strreplace($a, $b, $str); 
}
?>

 

Or:

//Convert accents into unaccented, Note iconv is an external executable which may be sluggish compared to string replacing.
$text = iconv('UTF-8', 'US-ASCII//TRANSLIT', $text);

Link to comment
Share on other sites

Thanks, how about the reverse: going from unaccented to accented? e.g. juergen into Jürgen

 

Why can't you display the transliterated characters, but store the original accented ones as UTF8? You'd have to do an impossible str_replace scheme which doesn't make so much sense.

Link to comment
Share on other sites

Fair enough :D I ended up with:

 

	public function removeAccents($string)
{
	// Accented Array
	$a = array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø',
	'Ù','Ú','Û','Ü','Ý','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ',
	'ö','ø','ù','ú','û','ü','ý','ÿ','&#256;','&#257;','&#258;','&#259;','&#260;','&#261;','&#262;','&#263;',
	'&#264;','&#265;','&#266;','&#267;','&#268;','&#269;','&#270;','&#271;','&#272;','&#273;','&#274;','&#275;',
	'&#276;','&#277;','&#278;','&#279;','&#280;','&#281;','&#282;','&#283;','&#284;','&#285;','&#286;','&#287;',
	'&#288;','&#289;','&#290;','&#291;','&#292;','&#293;','&#294;','&#295;','&#296;','&#297;','&#298;','&#299;',
	'&#300;','&#301;','&#302;','&#303;','&#304;','&#305;','&#306;','&#307;','&#308;','&#309;','&#310;','&#311;',
	'&#313;','&#314;','&#315;','&#316;','&#317;','&#318;','&#319;','&#320;','&#321;','&#322;','&#323;','&#324;',
	'&#325;','&#326;','&#327;','&#328;','&#329;','&#332;','&#333;','&#334;','&#335;','&#336;','&#337;','&#338;',
	'&#339;','&#340;','&#341;','&#342;','&#343;','&#344;','&#345;','&#346;','&#347;','&#348;','&#349;','&#350;',
	'&#351;','&#352;','&#353;','&#354;','&#355;','&#356;','&#357;','&#358;','&#359;','&#360;','&#361;','&#362;',
	'&#363;','&#364;','&#365;','&#366;','&#367;','&#368;','&#369;','&#370;','&#371;','&#372;','&#373;','&#374;',
	'&#375;','&#376;','&#377;','&#378;','&#379;','&#380;','&#381;','&#382;','&#383;','&#402;','&#416;','&#417;',
	'&#431;','&#432;','&#461;','&#462;','&#463;','&#464;','&#465;','&#466;','&#467;','&#468;','&#469;','&#470;',
	'&#471;','&#472;','&#473;','&#474;','&#475;','&#476;','&#506;','&#507;','&#508;','&#509;','&#510;','&#511;');

	// Unaccented
	$b = array('A','A','A','A','A','A','AE','C','E','E','E','E','I','I','I','I','D','N','O','O','O','O','O','O',
	'U','U','U','U','Y','s','a','a','a','a','a','a','ae','c','e','e','e','e','i','i','i','i','n','o','o','o','o',
	'o','o','u','u','u','u','y','y','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','D','d','E',
	'e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i',
	'I','i','I','i','IJ','ij','J','j','K','k','L','l','L','l','L','l','L','l','l','l','N','n','N','n','N','n','n',
	'O','o','O','o','O','o','OE','oe','R','r','R','r','R','r','S','s','S','s','S','s','S','s','T','t','T','t','T',
	't','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Y','Z','z','Z','z','Z','z','s','f','O','o',
	'U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','A','a','AE','ae','O','o');

	return str_replace($a, $b, $string);

 

Just needed to use str_replace instead of strreplace.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.