Jump to content

Archived

This topic is now archived and is closed to further replies.

yaba

PHP + preg + international chars problem

Recommended Posts

So are you saying I should "rebuild" every word I get off the DB to UTF8?

That seems a lot of processing :/

Even if I did that, how do I replace say 'α' with $alpha dynamically?

Share this post


Link to post
Share on other sites
I'm just using that to create the character for the sake of example; try it as you normally would without extra processing. Also, the above code is not block/language/chart specific. They're basically Unicode classes, just like[tt] \b[/tt], [tt] \w[/tt], and[tt] \s[/tt].

Share this post


Link to post
Share on other sites
Right, roger that :)

Could you elaborate a bit on the pattern you used please?

I don't think I've encountered before \p, for example. And those '?<=' and '?=.' bits you used... And correct if I'm worng, the \1 is what is captured by the first set of parentheses, right? I.e. (?<=\p{Z})

I don't quite get it, too advanced for me.

Any place I could check these out with examples etc? Couldn't find anything... Or some feedback? Thanks! ;)

Share this post


Link to post
Share on other sites
[tt]\p{[i]property[/i]} [/tt]matches Unicode characters that have the property, whereas [tt]\P{[i]property[/i]} [/tt]matches Unicode characters that [u]do not[/u] have the property. (This is a common syntax for specifying "match" and "don't match"; compare to[tt] \s [/tt]and[tt] \S[/tt],[tt] \w [/tt]and[tt] \W[/tt].) The "Z" property is for "Separators," which "mark the boundaries between units of text." The[tt] (?<=...) [/tt] and[tt] (?=...) [/tt] are lookarounds, specifically, a positive lookbehind and a positive lookahead. They [i]look[/i]--they don't match. You can find more information on these from the links in my signature.

Therefore, the pattern results in:

[code]
/
(?<=\p{Z}) ### Make sure the next character is preceded by a separator.
(' . $alpha . ') ### Match the charaacter.
(?=.) ### Make sure the character is followed by another character, e.g., match "an" but not "a".
/xu
[/code]

Share this post


Link to post
Share on other sites
If your server does not support unicode completely, here you can test the server's response and find out some tricks accordingly:

http://www.nottodolist.com/turkishTest.php

The script is here. ( Just replace <> with < ):
http://www.nottodolist.com/turkishTest.html
http://www.nottodolist.com/test2.html

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.