anarchoi Posted June 14, 2009 Share Posted June 14, 2009 Hello, i am using the following syntax to replace words in a string "#\b$word(\n|s|\b)(?=\s|[.,?!;:]\s)#i"; $word is a name, like "Élisée Reclus" or "Emile Pouget". Right now the syntax works perfectly and will replace these names. but i am wondering if there is a way to match the same words with or without accents. exemple: if $word is "Élisée Reclus" i would like to match "Élisée Reclus" AND "Elisee Reclus" AND "Èlisêè Reclùs", etc... if $word is "Emile Pouget" i would like to match "Émile Pouget" AND "Emile Pouget" AND "Êmîlè Pôùgèt", etc... thanks a lot! Quote Link to comment Share on other sites More sharing options...
thebadbad Posted June 14, 2009 Share Posted June 14, 2009 Not with regular expressions as far as I know. I guess you could first 'normalize' every accented letter in $name (e.g. using strtr()) and then replace every character in $name with a character class of all the similar accented characters, like a => [aáàâäå], but that would be pretty long winded. Should work though. Quote Link to comment Share on other sites More sharing options...
thebadbad Posted June 14, 2009 Share Posted June 14, 2009 Okay, I actually had some fun writing this, since it ended up working Since the code contains some very odd Unicode characters, I uploaded it to my server instead of posting it here, 'cause the forum messes with the chars: http://kronb.org/php/accents.phps Note that in my example I use ~ as pattern delimiter, and supply it in preg_quote() once. Also, my regex pattern uses the modifier u, treating the pattern as UTF-8. That's important since my function handles UTF-8 chars. And yeah, obviously your PHP file needs to be encoded in UTF-8 too. Basically, my function accents() either 'normalizes' the input string: accents('Ȩḷiséẽ Řeclůs', true): elisee reclus Or returns an array of all different versions of the input character: accents('a'): Array ( [0] => A [1] => a [2] => Á [3] => á [4] => À [5] => à [6] => Ă [7] => ă [8] => Ắ [9] => ắ [10] => Ằ [11] => ằ [12] => Ẵ [13] => ẵ [14] => Ẳ [15] => ẳ [16] => Â [17] => â [18] => Ấ [19] => ấ [20] => Ầ [21] => ầ [22] => Ẫ [23] => ẫ [24] => Ẩ [25] => ẩ [26] => Ǎ [27] => ǎ [28] => Å [29] => å [30] => Ǻ [31] => ǻ [32] => Ä [33] => ä [34] => Ǟ [35] => ǟ [36] => Ã [37] => ã [38] => Ȧ [39] => ȧ [40] => Ǡ [41] => ǡ [42] => Ą [43] => ą [44] => Ā [45] => ā [46] => Ả [47] => ả [48] => Ȁ [49] => ȁ [50] => Ȃ [51] => ȃ [52] => Ạ [53] => ạ [54] => Ặ [55] => ặ [56] => Ậ [57] => ậ [58] => Ḁ [59] => ḁ [60] => Ⱥ [61] => ⱥ [62] => ᶏ [63] => Ɐ [64] => ɐ [65] => Ɑ [66] => ɑ ) With that functionality you can normalize $name and then build a string with character classes to be used in a regular expression. See the example in the script. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.