Jump to content


Photo

Looking For A Regex That Validates "öÄßáéóúÁÉÍÓÚüÜñÑ"


Best Answer requinix, 10 November 2013 - 06:43 AM

the contents of the POST is converted to UTF-8 using  htmlspecialchars($_POST['regex'], ENT_NOQUOTES, 'UTF-8');.

1. That's bad. Don't do that. The only time you should ever be using htmlspecialchars() is immediately before you're about to output something in HTML. Not any other point before then. Especially not when you're inserting it into your database.
2. That doesn't convert encodings. All you did was tell it that it should interpret the string as if it was UTF-8. If it wasn't to begin with then it still won't be after.
 
var_dump(preg_match('/^[[:alnum:]]+$/u', 'Schönen Tag'));
works for me as long as I make sure I put that code into a file and save it using UTF-8. If I don't then it might be ISO 8859-1 by default and I'd have to utf8_encode() (which converts from that encoding to UTF-8 ) the string first. Go to the full post


  • Please log in to reply
9 replies to this topic

#1 phdphd

phdphd

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts

Posted 04 November 2013 - 06:26 PM

Hi All,

I am trying to set up a pregmatch that validates some data entered by the user. Since my website is intended to any European-language speaking people (including German, Swedish, Spanish, etc.), I am testing it with a string that contains characters from most of those languages.

 

I initially thought that a simple 

#[[:alnum:]]#

would do the job. It actually works fine for English, French and Spanish, but it still does not accept the test string.

 

Any help welcome.

 

Thanks !

 

 



#2 requinix

requinix

    Hopeless Member

  • Moderators
  • 5,978 posts
  • LocationWA

Posted 04 November 2013 - 08:26 PM

Character encoding? preg_match() needs to know the character encoding, which might be UTF-8 (and you use the /u flag) or something else (and you convert it to UTF-8 and use the /u flag).
When in doubt, CHECK YOUR ERROR LOG

#3 phdphd

phdphd

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts

Posted 09 November 2013 - 09:11 AM

Finally 

#[^[:alnum:]]#

did the job.



#4 phdphd

phdphd

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts

Posted 09 November 2013 - 07:17 PM

Actually

#[^[:alnum:]]#

will accept "hello world öÄßáéíóúÁÉÍÓÚüÜñÑ" or just "öÄßáéíóúÁÉÍÓÚüÜñÑ" but not just "world". It seems it needs at least one "special" character (that could be a space).

 

I think I finally found the solution by combining both regexes. The one below accepts strings like "solution", "öÄáéíóúÁÉÍÓÚ" and "a f öÄßáéíóúÁÉÍÓÚ t üÜz z z z ñÑ dsd dfd ?! hope I found the solution now :-("

"[^[:alnum:]]|[[:alnum:]]"


#5 requinix

requinix

    Hopeless Member

  • Moderators
  • 5,978 posts
  • LocationWA

Posted 09 November 2013 - 09:52 PM

You didn't. Your new expression says "the string must either (a) contain an alphanumeric character or (b) contain an non-alphanumeric character". It'll match against anything but an empty string.

Edited by requinix, 09 November 2013 - 09:55 PM.

When in doubt, CHECK YOUR ERROR LOG

#6 phdphd

phdphd

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts

Posted 10 November 2013 - 05:23 AM

Indeed, this is the first thought that came to my mind when I woke up this morning :( . So coming back to your suggestion and since the character encoding is UTF-8, what would be the right regex syntax combining [[:alnum:]] with the /u flag ? All regexes that I tried (like /#[[:alnum:]]#/u) return either false or a modifier issue. Thanks.



#7 requinix

requinix

    Hopeless Member

  • Moderators
  • 5,978 posts
  • LocationWA

Posted 10 November 2013 - 05:46 AM

You had # before as the delimiter. You don't need to add /s on top of them. It's just #[[:alnum:]]#u.

If you want to verify the entire string, and not just that there is an alphanumeric character somewhere in it, then you need to check every character from the beginning to the end.
#^[[:alnum:]]+$#u

When in doubt, CHECK YOUR ERROR LOG

#8 phdphd

phdphd

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts

Posted 10 November 2013 - 06:11 AM

Thanks for your reply. Unfortunately neither works when special chars are involved, like in the German string  "Schönen Tag".

Is there something wrong in my UTF-8 setting ? My php file is UTF-8 saved and the contents of the POST is converted to UTF-8 using  htmlspecialchars($_POST['regex'], ENT_NOQUOTES, 'UTF-8');.



#9 requinix

requinix

    Hopeless Member

  • Moderators
  • 5,978 posts
  • LocationWA

Posted 10 November 2013 - 06:43 AM   Best Answer

the contents of the POST is converted to UTF-8 using  htmlspecialchars($_POST['regex'], ENT_NOQUOTES, 'UTF-8');.

1. That's bad. Don't do that. The only time you should ever be using htmlspecialchars() is immediately before you're about to output something in HTML. Not any other point before then. Especially not when you're inserting it into your database.
2. That doesn't convert encodings. All you did was tell it that it should interpret the string as if it was UTF-8. If it wasn't to begin with then it still won't be after.
 
var_dump(preg_match('/^[[:alnum:]]+$/u', 'Schönen Tag'));
works for me as long as I make sure I put that code into a file and save it using UTF-8. If I don't then it might be ISO 8859-1 by default and I'd have to utf8_encode() (which converts from that encoding to UTF-8 ) the string first.

Edited by requinix, 10 November 2013 - 06:43 AM.

When in doubt, CHECK YOUR ERROR LOG

#10 phdphd

phdphd

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts

Posted 10 November 2013 - 07:10 AM

Thanks a lot Requinix. I think the problem was that my php file was initially a non-utf-8 file that I had resaved to utf-8. I created a new php file, saved it to utf-8 format, then pasted in it some pieces of code.

 

So for whom it may be useful, the pregmatch regex instruction that works for me is

if (preg_match('#[[:alnum:]]#u', $_POST['regex']))

One can even mix chars from left-to-right and right-to-left alphabets.

 

Thanks again, have a nice Sunday.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Cheap Linux VPS from $5
SSD Storage, 30 day Guarantee
1 TB of BW, 100% Network Uptime

AlphaBit.com