Jump to content

UTF-8 filtering/validation


jammesz

Recommended Posts

First off, i have made a validation class using array_diff() to check if there are invalid characters submitted by a user.

Some extracts:

$this->chr_alpha_lower=array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z');
$this->chr_alpha_upper=array('A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
$this->chr_numeric=array('0','1','2','3','4','5','6','7','8','9');
$this->chr_symbol=array(' ','!','@','#','$','%','&','(',')','[',']','.',',',':',';','\'','"','/','=','\\','-','_','?');

...

// lets say only alpha and numeric characters are allowed
$allowed_characters=array();
$allowed_characters=array_merge($allowed_characters,$this->chr_alpha_lower,$this->chr_alpha_upper);
$allowed_characters=array_merge($allowed_characters,$this->chr_numeric);

$input_split=str_split($this->input);
$invalid=array_diff($input_split,$allowed_characters);

if(empty($invalid)){
return true;
}
return false;

 

I have two questions.

First, for acsii characters is this method 'full proof' or is there a way for someone to get around this validation script?

 

Second, a new site im developing is going to be built with UTF-8 in mind so people can use the site in their own language. How do i validate UTF-8 user input? I know about mb_strings and sanitizing UTF-8 input using this:

preg_match_all('/([\x09\x0a\x0d\x20-\x7e]'. // ASCII characters
'|[\xc2-\xdf][\x80-\xbf]'. // 2-byte (except overly longs)
'|\xe0[\xa0-\xbf][\x80-\xbf]'. // 3 byte (except overly longs)
'|[\xe1-\xec\xee\xef][\x80-\xbf]{2}'. // 3 byte (except overly longs)
'|\xed[\x80-\x9f][\x80-\xbf])+/', // 3 byte (except UTF-16 surrogates)
$input, $clean_pieces );

$clean_output = join('?', $clean_pieces[0] );

But how do i know what to filter?

 

Example:

a field for the user to input first name

English: only alpha characters

Other language: ?????

 

Help much appreciated.

Link to comment
https://forums.phpfreaks.com/topic/215284-utf-8-filteringvalidation/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.