Jump to content

Recommended Posts

I am trying to write a function that will check if the contents of a given string are only full width (zen-kaku) Japanese characters (Kana as well as Kanji). Or the opposite check (to see if a string contants any half-width characters) would work too.

 

I am at a bit of a loss of how to go about writing this so any advice would be appreciated.

What you see on screen are actually just little images your keyboard however passes an hexadecimal code to your computer which translates that code to an image. Like when typed a uppercase Z you actually passed (0x5A) or lowercase z (0x7A).

 

print ord('Z');

 

Type your japanese lowest- (A 0x41) and highest character (Z 0x5A) for each of the tree different types afterwards check if the typed character falls within any of these ranges:

 

$ascii = ord($char);
if ($ascii >= ord('A') && $ascii <= ord('Z')) {
    //$ascii is between A and Z
}

Except Japanese characters aren't part of ASCII...

 

Yeah I know, but it was an example. Nevertheless wether he types chinese, french or some other language the signals are still hexadecimal and thus falls within some range just like A-Z does or a-z does. IMO it's worth a go. And if it doesn't work which is most likely as ord() converts to ASCII then I wonder which other ways their are to retrieve the hexadecimal value of a japanese character? Would converting to the appropriate encoding help?

Well, ord() only works for ASCII characters.

 

Hiragana characters are in the range 3040 to 309F and the katakana characters are in 30A0 to 30FF. The kanji are in 4E00 to 9FBF.

 

So to check if a string solely consists of kana or kanji you'll have to check that each character in the string lies within these ranges in Unicode.

 

There are a few functions in the comments on ord that allegedly works with Unicode.

Well, ord() only works for ASCII characters.

 

Hiragana characters are in the range 3040 to 309F and the katakana characters are in 30A0 to 30FF. The kanji are in 4E00 to 9FBF.

 

So to check if a string solely consists of kana or kanji you'll have to check that each character in the string lies within these ranges in Unicode.

 

There are a few functions in the comments on ord that allegedly works with Unicode.

 

Where did you find those ranges?

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.