Jump to content

Add a space


haku

Recommended Posts

I need to add a space character to this regex:

 

/(?:\xEF\xBD[\xA1-\xBF]|\xEF\xBE[\x80-\x9F])/

 

and this regex as well

 

/^(\xe3(\x82[\xa1-\xbf]|\x83[\x80-\xb6]|\x83\xbc))*$/

 

I'm right crap with PRCE regex, and I'm even more crap with hexidecimal regex, so I really don't know how/where to add the space into this. Can anyone give me a hand?

 

 

Link to comment
Share on other sites

Well,

 

/(?:\xEF\xBD[\xA1-\xBF]|\xEF\xBE[\x80-\x9F])/

 

either matches \xEF, \xBD and \xA1-\xBF (3 chars total) or \xEF, \xBE and \x80-\x9F (also 3 chars total). Where do you want to allow whitespace?

 

/^(\xe3(\x82[\xa1-\xbf]|\x83[\x80-\xb6]|\x83\xbc))*$/

 

matches \xe3 followed by either \x82 and \xa1-\xbf, \x83 and \x80-\xb6 or \x83 and \xbc (3 chars total in either case). And all that is matched 0 or more times within the full string. Again - where do you want to allow whitespace?

Link to comment
Share on other sites

The first one was missing the *$ - I figured that out after posting this.

 

I want to allow whitespace anywhere. And actually, I would like to combine the two statements into one if possible.

 

Any help is muchly appreciated.

Link to comment
Share on other sites

It would really help if you explained what you're trying to do with this regex. And e.g. provide sample haystacks and expected matches/non-matches.

 

But here's a guess: Do you want to allow all the characters found in your patterns, including whitespace, and require the string to be between 0 and 3 in length? Do the allowed characters have to appear in a certain order? Not sure if that would make sense, but that's what I can decipher from your posts :)

Link to comment
Share on other sites

You may not be able to see this, but I am looking to confirm that the entire field consists of only these characters:

 

アイウエオカキクケコサシスセソタチツテトラリルレロマミムメモナニヌネノワヲンガギグゲゴダヂヅデドザジズゼゾバビブベボパピプペポ  アイウエオカキクケコサシスセソタチツテトラリルレロマミムメモナニヌネノワヲンガギグゲゴダヂヅデドザジズゼゾバビブベボパピプペポ

 

(including the double width and single width spaces in the middle). There are no length limitations, and. I just want to make sure that the submitted value is only within this range.

Link to comment
Share on other sites

Nice! Thanks MadT. That's much easier to use than what I was using.

 

I have one last problem to solve with it now - I need to allow zenkaku spaces. This is a double-byte space. The hex code is 8140 (retrieved using bin2hex()). Any idea how I can add this?

Link to comment
Share on other sites

Actually, I may have answered my own question. I did this:

/^\p{Katakana}|\x8140+$/iu

and it seems to work. Does that look right?

 

edit: Nope, it doesn't work. It allows this through:

トム tom

which it shouldn't because of the English.

 

Link to comment
Share on other sites

I stay up till that time every weekend. I would on weeknites too if it weren't for work!

 

Thanks for the mb_convert_kana tip Techie- but I'm trying to do this without relying on the mbstring functions, as they aren't enabled at runtime.

Link to comment
Share on other sites

I tried to match some of your characters with \p{Katakana}, and not all of them matched. The easy solution is to simply add all the characters including a single and double-byte space inside a character class. I also tried that, and it worked.

 

~^[アイウエオカキクケコサシスセソタチツテトラリルレロマミムメモナニヌネノワヲンガギグゲゴダヂヅデドザジズゼゾバビブベボパピプペポ アイウエオカキクケコサシスセソタチツテトラリルレロマミムメモナニヌネノワヲンガギグゲゴダヂヅデドザジズゼゾバビブベボパピプペポ]*$~iuD

 

I'm not sure the double-byte space is added in there^, because of the forum, so you might have to add that afterwards in your script.

Link to comment
Share on other sites

Hi haku,

My Chinese friend 徐曌 (Rick) says

"Kanji" could be "Han", i.e. if you put \p{Han} instead of \p{Kanji}, it should work as Japamese Kanji is actually Chinese ancient characters (AKA: Traditional Chinese).

 

Also we were playing with some Chinese today and found some characters where missing ie comma's to resolved this we used \p{Common} to check only Chinese was entered

 

this is what we used for Chinese

$subject = "這是中文測試,这是中文测试哦。";

<?php
if (preg_match('/^[\p{Han}\p{Common}]+$/iu', $subject)) {
   echo "ok";
} else {
   echo "failed";
}
?>

 

This is the correct way of dealing with Unicode instead of just adding the characters,

 

I hope this helps

Link to comment
Share on other sites

This is the correct way of dealing with Unicode instead of just adding the characters,

 

I resorted to that because the Unicode scripts I tried either didn't match all the characters he wanted to allow or possibly allowed heaps of other characters he didn't specify.

Link to comment
Share on other sites

Japanese kanji isn't exactly the same as traditional Chinese unfortunately. For the most part they are the same, but there are a few different kanji, so that wouldn't work. Fortunately I already have a regex for kanji, I was just wondering earlier in this thread if a shortcut existed.

 

This thread though was about katakana, which is a phonetic alphabet that doesn't exist in Chinese at all. There are three alphabets in Japanese (four if you count English letters) - kanji, which is Chinese characters, hiragana, which is phonetic characters used with Japanese words, and katakana which is phonetic characters used with words taken from other languages, or for onomatopoeia, or emphasizing words.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.