exo_duz Posted January 27, 2009 Share Posted January 27, 2009 Hi all, Been scratching my head about this all week. Is there a way to create a Regex statement to get all Kanji characters in Japanese? I am creating a PHP website which uses pretty URLs and would like to get this up in the URI. The only thing is that with the mb_ereg_replace() function which is a multibyte regex function it can only pick up the Hiragana and Katakana. http://jp2.php.net/mb_ereg_replace According to that, in the first example you can do that but not the Kanji. Is there a way to do it? Thanks a lot for your help. Quote Link to comment Share on other sites More sharing options...
effigy Posted January 27, 2009 Share Posted January 27, 2009 Are all of these Kanji? You can use their code point ranges to match them, e.g. preg_match_all('/[\x{3000}-\x{303F}]+/u', $string, $matches);. Quote Link to comment Share on other sites More sharing options...
haku Posted January 27, 2009 Share Posted January 27, 2009 I was working on this a couple months ago, can't remember if I figured out something that would work or not. The project was at the office, and I'm not, so I'll check when I get there tomorrow. What charset are you using? Quote Link to comment Share on other sites More sharing options...
exo_duz Posted January 27, 2009 Author Share Posted January 27, 2009 i am using utf-8 Quote Link to comment Share on other sites More sharing options...
exo_duz Posted January 27, 2009 Author Share Posted January 27, 2009 effigy... yes they are.... i will try that... thanks Quote Link to comment Share on other sites More sharing options...
exo_duz Posted January 28, 2009 Author Share Posted January 28, 2009 Thanks to effigy for his input I figured it out. For all those having trouble I did this: According to the website which contains all the Japanese Unicode Lib. http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml //convert japanese characters $url = mb_convert_kana($url, "asKHV"); //remove all symbols //table provided at http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml $pattern = '/[^\wぁ-ゔァ-ヺー\x{4E00}-\x{9FAF}_\-]+/u'; $url = preg_replace($pattern, '+', $url); Just in case anyone ever needs to do this. The function will convert all the characters first using the mb_convert_kanahttp://php.net/mb_kana_convert function then will remove all Japanese Symbols and only leave Hiragana, Katakana and Kanji. Hope this helps anyone having this problem. Quote Link to comment Share on other sites More sharing options...
haku Posted January 28, 2009 Share Posted January 28, 2009 Nice. I'm bookmarking this thread. I went back and looked at the project I was working on (its currently on hold), and I hadn't come up with a solution that worked yet, so its good to know this. I spent a fair bit of time on Japanese sites trying to find if anyone Japanese had come up with a solution, and I didn't find anything that worked particularly well. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.