PHP + preg + international chars problem

yaba · September 11, 2006

So are you saying I should "rebuild" every word I get off the DB to UTF8?

That seems a lot of processing :/

Even if I did that, how do I replace say 'α' with $alpha dynamically?

effigy · September 11, 2006

I'm just using that to create the character for the sake of example; try it as you normally would without extra processing. Also, the above code is not block/language/chart specific. They're basically Unicode classes, just like[tt] \b[/tt], [tt] \w[/tt], and[tt] \s[/tt].

yaba · September 11, 2006

Right, roger that :)

Could you elaborate a bit on the pattern you used please?

I don't think I've encountered before \p, for example. And those '?<=' and '?=.' bits you used... And correct if I'm worng, the \1 is what is captured by the first set of parentheses, right? I.e. (?<=\p{Z})

I don't quite get it, too advanced for me.

Any place I could check these out with examples etc? Couldn't find anything... Or some feedback? Thanks! ;)

effigy · September 11, 2006

[tt]\p{[i]property[/i]} [/tt]matches Unicode characters that have the property, whereas [tt]\P{[i]property[/i]} [/tt]matches Unicode characters that [u]do not[/u] have the property. (This is a common syntax for specifying "match" and "don't match"; compare to[tt] \s [/tt]and[tt] \S[/tt],[tt] \w [/tt]and[tt] \W[/tt].) The "Z" property is for "Separators," which "mark the boundaries between units of text." The[tt] (?<=...) [/tt] and[tt] (?=...) [/tt] are lookarounds, specifically, a positive lookbehind and a positive lookahead. They [i]look[/i]--they don't match. You can find more information on these from the links in my signature.

Therefore, the pattern results in:

[code]
/
(?<=\p{Z}) ### Make sure the next character is preceded by a separator.
(' . $alpha . ') ### Match the charaacter.
(?=.) ### Make sure the character is followed by another character, e.g., match "an" but not "a".
/xu
[/code]

kevins · September 25, 2006

If your server does not support unicode completely, here you can test the server's response and find out some tricks accordingly:

http://www.nottodolist.com/turkishTest.php

The script is here. ( Just replace <> with < ):
http://www.nottodolist.com/turkishTest.html
http://www.nottodolist.com/test2.html

Sign In

PHP + preg + international chars problem

Recommended Posts

yaba

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

yaba

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

kevins

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information