phdphd Posted November 4, 2013 Share Posted November 4, 2013 Hi All, I am trying to set up a pregmatch that validates some data entered by the user. Since my website is intended to any European-language speaking people (including German, Swedish, Spanish, etc.), I am testing it with a string that contains characters from most of those languages. I initially thought that a simple #[[:alnum:]]# would do the job. It actually works fine for English, French and Spanish, but it still does not accept the test string. Any help welcome. Thanks ! Quote Link to comment Share on other sites More sharing options...
requinix Posted November 5, 2013 Share Posted November 5, 2013 Character encoding? preg_match() needs to know the character encoding, which might be UTF-8 (and you use the /u flag) or something else (and you convert it to UTF-8 and use the /u flag). Quote Link to comment Share on other sites More sharing options...
phdphd Posted November 9, 2013 Author Share Posted November 9, 2013 Finally #[^[:alnum:]]# did the job. Quote Link to comment Share on other sites More sharing options...
phdphd Posted November 10, 2013 Author Share Posted November 10, 2013 Actually #[^[:alnum:]]# will accept "hello world öÄßáéíóúÁÉÍÓÚüÜñÑ" or just "öÄßáéíóúÁÉÍÓÚüÜñÑ" but not just "world". It seems it needs at least one "special" character (that could be a space). I think I finally found the solution by combining both regexes. The one below accepts strings like "solution", "öÄáéíóúÁÉÍÓÚ" and "a f öÄßáéíóúÁÉÍÓÚ t üÜz z z z ñÑ dsd dfd ?! hope I found the solution now :-(" "[^[:alnum:]]|[[:alnum:]]" Quote Link to comment Share on other sites More sharing options...
requinix Posted November 10, 2013 Share Posted November 10, 2013 (edited) You didn't. Your new expression says "the string must either (a) contain an alphanumeric character or (b) contain an non-alphanumeric character". It'll match against anything but an empty string. Edited November 10, 2013 by requinix Quote Link to comment Share on other sites More sharing options...
phdphd Posted November 10, 2013 Author Share Posted November 10, 2013 Indeed, this is the first thought that came to my mind when I woke up this morning . So coming back to your suggestion and since the character encoding is UTF-8, what would be the right regex syntax combining [[:alnum:]] with the /u flag ? All regexes that I tried (like /#[[:alnum:]]#/u) return either false or a modifier issue. Thanks. Quote Link to comment Share on other sites More sharing options...
requinix Posted November 10, 2013 Share Posted November 10, 2013 You had # before as the delimiter. You don't need to add /s on top of them. It's just #[[:alnum:]]#u. If you want to verify the entire string, and not just that there is an alphanumeric character somewhere in it, then you need to check every character from the beginning to the end. #^[[:alnum:]]+$#u Quote Link to comment Share on other sites More sharing options...
phdphd Posted November 10, 2013 Author Share Posted November 10, 2013 Thanks for your reply. Unfortunately neither works when special chars are involved, like in the German string "Schönen Tag". Is there something wrong in my UTF-8 setting ? My php file is UTF-8 saved and the contents of the POST is converted to UTF-8 using htmlspecialchars($_POST['regex'], ENT_NOQUOTES, 'UTF-8');. Quote Link to comment Share on other sites More sharing options...
Solution requinix Posted November 10, 2013 Solution Share Posted November 10, 2013 (edited) the contents of the POST is converted to UTF-8 using htmlspecialchars($_POST['regex'], ENT_NOQUOTES, 'UTF-8');.1. That's bad. Don't do that. The only time you should ever be using htmlspecialchars() is immediately before you're about to output something in HTML. Not any other point before then. Especially not when you're inserting it into your database. 2. That doesn't convert encodings. All you did was tell it that it should interpret the string as if it was UTF-8. If it wasn't to begin with then it still won't be after. var_dump(preg_match('/^[[:alnum:]]+$/u', 'Schönen Tag'));works for me as long as I make sure I put that code into a file and save it using UTF-8. If I don't then it might be ISO 8859-1 by default and I'd have to utf8_encode() (which converts from that encoding to UTF-8 ) the string first. Edited November 10, 2013 by requinix Quote Link to comment Share on other sites More sharing options...
phdphd Posted November 10, 2013 Author Share Posted November 10, 2013 Thanks a lot Requinix. I think the problem was that my php file was initially a non-utf-8 file that I had resaved to utf-8. I created a new php file, saved it to utf-8 format, then pasted in it some pieces of code. So for whom it may be useful, the pregmatch regex instruction that works for me is if (preg_match('#[[:alnum:]]#u', $_POST['regex'])) One can even mix chars from left-to-right and right-to-left alphabets. Thanks again, have a nice Sunday. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.