anarchoi Posted June 14, 2009 Share Posted June 14, 2009 Hello, i am using the following syntax to replace words in a string "#\b$word(\n|s|\b)(?=\s|[.,?!;:]\s)#i"; $word is a name, like "Élisée Reclus" or "Emile Pouget". Right now the syntax works perfectly and will replace these names. but i am wondering if there is a way to match the same words with or without accents. exemple: if $word is "Élisée Reclus" i would like to match "Élisée Reclus" AND "Elisee Reclus" AND "Èlisêè Reclùs", etc... if $word is "Emile Pouget" i would like to match "Émile Pouget" AND "Emile Pouget" AND "Êmîlè Pôùgèt", etc... thanks a lot! Link to comment https://forums.phpfreaks.com/topic/162129-regex-match-words-with-accents/ Share on other sites More sharing options...
thebadbad Posted June 14, 2009 Share Posted June 14, 2009 Not with regular expressions as far as I know. I guess you could first 'normalize' every accented letter in $name (e.g. using strtr()) and then replace every character in $name with a character class of all the similar accented characters, like a => [aáàâäå], but that would be pretty long winded. Should work though. Link to comment https://forums.phpfreaks.com/topic/162129-regex-match-words-with-accents/#findComment-855549 Share on other sites More sharing options...
thebadbad Posted June 14, 2009 Share Posted June 14, 2009 Okay, I actually had some fun writing this, since it ended up working Since the code contains some very odd Unicode characters, I uploaded it to my server instead of posting it here, 'cause the forum messes with the chars: http://kronb.org/php/accents.phps Note that in my example I use ~ as pattern delimiter, and supply it in preg_quote() once. Also, my regex pattern uses the modifier u, treating the pattern as UTF-8. That's important since my function handles UTF-8 chars. And yeah, obviously your PHP file needs to be encoded in UTF-8 too. Basically, my function accents() either 'normalizes' the input string: accents('Ȩḷiséẽ Řeclůs', true): elisee reclus Or returns an array of all different versions of the input character: accents('a'): Array ( [0] => A [1] => a [2] => Á [3] => á [4] => À [5] => à [6] => Ă [7] => ă [8] => Ắ [9] => ắ [10] => Ằ [11] => ằ [12] => Ẵ [13] => ẵ [14] => Ẳ [15] => ẳ [16] => Â [17] => â [18] => Ấ [19] => ấ [20] => Ầ [21] => ầ [22] => Ẫ [23] => ẫ [24] => Ẩ [25] => ẩ [26] => Ǎ [27] => ǎ [28] => Å [29] => å [30] => Ǻ [31] => ǻ [32] => Ä [33] => ä [34] => Ǟ [35] => ǟ [36] => Ã [37] => ã [38] => Ȧ [39] => ȧ [40] => Ǡ [41] => ǡ [42] => Ą [43] => ą [44] => Ā [45] => ā [46] => Ả [47] => ả [48] => Ȁ [49] => ȁ [50] => Ȃ [51] => ȃ [52] => Ạ [53] => ạ [54] => Ặ [55] => ặ [56] => Ậ [57] => ậ [58] => Ḁ [59] => ḁ [60] => Ⱥ [61] => ⱥ [62] => ᶏ [63] => Ɐ [64] => ɐ [65] => Ɑ [66] => ɑ ) With that functionality you can normalize $name and then build a string with character classes to be used in a regular expression. See the example in the script. Link to comment https://forums.phpfreaks.com/topic/162129-regex-match-words-with-accents/#findComment-855641 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.