yaba Posted September 11, 2006 Author Share Posted September 11, 2006 So are you saying I should "rebuild" every word I get off the DB to UTF8?That seems a lot of processing :/Even if I did that, how do I replace say 'α' with $alpha dynamically? Quote Link to comment Share on other sites More sharing options...
effigy Posted September 11, 2006 Share Posted September 11, 2006 I'm just using that to create the character for the sake of example; try it as you normally would without extra processing. Also, the above code is not block/language/chart specific. They're basically Unicode classes, just like[tt] \b[/tt], [tt] \w[/tt], and[tt] \s[/tt]. Quote Link to comment Share on other sites More sharing options...
yaba Posted September 11, 2006 Author Share Posted September 11, 2006 Right, roger that :)Could you elaborate a bit on the pattern you used please?I don't think I've encountered before \p, for example. And those '?<=' and '?=.' bits you used... And correct if I'm worng, the \1 is what is captured by the first set of parentheses, right? I.e. (?<=\p{Z})I don't quite get it, too advanced for me.Any place I could check these out with examples etc? Couldn't find anything... Or some feedback? Thanks! ;) Quote Link to comment Share on other sites More sharing options...
effigy Posted September 11, 2006 Share Posted September 11, 2006 [tt]\p{[i]property[/i]} [/tt]matches Unicode characters that have the property, whereas [tt]\P{[i]property[/i]} [/tt]matches Unicode characters that [u]do not[/u] have the property. (This is a common syntax for specifying "match" and "don't match"; compare to[tt] \s [/tt]and[tt] \S[/tt],[tt] \w [/tt]and[tt] \W[/tt].) The "Z" property is for "Separators," which "mark the boundaries between units of text." The[tt] (?<=...) [/tt] and[tt] (?=...) [/tt] are lookarounds, specifically, a positive lookbehind and a positive lookahead. They [i]look[/i]--they don't match. You can find more information on these from the links in my signature.Therefore, the pattern results in:[code]/ (?<=\p{Z}) ### Make sure the next character is preceded by a separator. (' . $alpha . ') ### Match the charaacter. (?=.) ### Make sure the character is followed by another character, e.g., match "an" but not "a"./xu[/code] Quote Link to comment Share on other sites More sharing options...
kevins Posted September 25, 2006 Share Posted September 25, 2006 If your server does not support unicode completely, here you can test the server's response and find out some tricks accordingly:http://www.nottodolist.com/turkishTest.phpThe script is here. ( Just replace <> with < ):http://www.nottodolist.com/turkishTest.htmlhttp://www.nottodolist.com/test2.html Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.