Jump to content

Checking to see if a string is only zen-kaku characters


paullb

Recommended Posts

I am trying to write a function that will check if the contents of a given string are only full width (zen-kaku) Japanese characters (Kana as well as Kanji). Or the opposite check (to see if a string contants any half-width characters) would work too.

 

I am at a bit of a loss of how to go about writing this so any advice would be appreciated.

Link to comment
Share on other sites

What you see on screen are actually just little images your keyboard however passes an hexadecimal code to your computer which translates that code to an image. Like when typed a uppercase Z you actually passed (0x5A) or lowercase z (0x7A).

 

print ord('Z');

 

Type your japanese lowest- (A 0x41) and highest character (Z 0x5A) for each of the tree different types afterwards check if the typed character falls within any of these ranges:

 

$ascii = ord($char);
if ($ascii >= ord('A') && $ascii <= ord('Z')) {
    //$ascii is between A and Z
}

Link to comment
Share on other sites

Except Japanese characters aren't part of ASCII...

 

Yeah I know, but it was an example. Nevertheless wether he types chinese, french or some other language the signals are still hexadecimal and thus falls within some range just like A-Z does or a-z does. IMO it's worth a go. And if it doesn't work which is most likely as ord() converts to ASCII then I wonder which other ways their are to retrieve the hexadecimal value of a japanese character? Would converting to the appropriate encoding help?

Link to comment
Share on other sites

Well, ord() only works for ASCII characters.

 

Hiragana characters are in the range 3040 to 309F and the katakana characters are in 30A0 to 30FF. The kanji are in 4E00 to 9FBF.

 

So to check if a string solely consists of kana or kanji you'll have to check that each character in the string lies within these ranges in Unicode.

 

There are a few functions in the comments on ord that allegedly works with Unicode.

Link to comment
Share on other sites

Well, ord() only works for ASCII characters.

 

Hiragana characters are in the range 3040 to 309F and the katakana characters are in 30A0 to 30FF. The kanji are in 4E00 to 9FBF.

 

So to check if a string solely consists of kana or kanji you'll have to check that each character in the string lies within these ranges in Unicode.

 

There are a few functions in the comments on ord that allegedly works with Unicode.

 

Where did you find those ranges?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.