Checking to see if a string is only zen-kaku characters

paullb · August 22, 2009

I am trying to write a function that will check if the contents of a given string are only full width (zen-kaku) Japanese characters (Kana as well as Kanji). Or the opposite check (to see if a string contants any half-width characters) would work too.

I am at a bit of a loss of how to go about writing this so any advice would be appreciated.

ignace · August 22, 2009

What you see on screen are actually just little images your keyboard however passes an hexadecimal code to your computer which translates that code to an image. Like when typed a uppercase Z you actually passed (0x5A) or lowercase z (0x7A).

print ord('Z');

Type your japanese lowest- (A 0x41) and highest character (Z 0x5A) for each of the tree different types afterwards check if the typed character falls within any of these ranges:

$ascii = ord($char);
if ($ascii >= ord('A') && $ascii <= ord('Z')) {
    //$ascii is between A and Z
}

Daniel0 · August 22, 2009

Except Japanese characters aren't part of ASCII...

ignace · August 22, 2009

Except Japanese characters aren't part of ASCII...

Yeah I know, but it was an example. Nevertheless wether he types chinese, french or some other language the signals are still hexadecimal and thus falls within some range just like A-Z does or a-z does. IMO it's worth a go. And if it doesn't work which is most likely as ord() converts to ASCII then I wonder which other ways their are to retrieve the hexadecimal value of a japanese character? Would converting to the appropriate encoding help?

Daniel0 · August 22, 2009

Well, ord() only works for ASCII characters.

Hiragana characters are in the range 3040 to 309F and the katakana characters are in 30A0 to 30FF. The kanji are in 4E00 to 9FBF.

So to check if a string solely consists of kana or kanji you'll have to check that each character in the string lies within these ranges in Unicode.

There are a few functions in the comments on ord that allegedly works with Unicode.

ignace · August 22, 2009

Well, ord() only works for ASCII characters.

Hiragana characters are in the range 3040 to 309F and the katakana characters are in 30A0 to 30FF. The kanji are in 4E00 to 9FBF.

So to check if a string solely consists of kana or kanji you'll have to check that each character in the string lies within these ranges in Unicode.

There are a few functions in the comments on ord that allegedly works with Unicode.

Where did you find those ranges?

Daniel0 · August 22, 2009

All the ranges are specified on unicode.org

ignace · August 23, 2009

All the ranges are specified on unicode.org

Thx

Sign In

Checking to see if a string is only zen-kaku characters

Recommended Posts

paullb

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

Daniel0

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

Daniel0

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

Daniel0

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information