sw0o0sh Posted August 23, 2012 Share Posted August 23, 2012 Hi, I'm trying to make something that ensures that certain input is only between a specified range. Specially \x20 trough \x7e, as seen on chart: I came up with the expression, (using preg_match) /^[\x20-\x7e\t\s]+?$/ Which seemed to work at first, though leakage is occurring somewhere? I would only like to allow \x20-\x7e (\x20-\x7e), tab's, spaces and new lines (\t\s), and while it does seem to block out some characters that are not of that range, some still do slip through and I am unsure how. Can anyone see the problem here? Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 23, 2012 Share Posted August 23, 2012 First I'd like to point out that having a non-greedy match when you're trying to match the entire string is a bit unnecessary, and might actually hurt the performance. Secondly, you don't need to define the range with the hex values, you just need to set up a range: $RegExp = "/^[ -~\\t\\n]+\\z/"; While that is said, I don't see any problem with the RegExp you have there. Do you have any examples of the strings that had some data that got through despite it? Also, what does you code look like, where you're using that RegExp and storing the data? Quote Link to comment Share on other sites More sharing options...
sw0o0sh Posted August 23, 2012 Author Share Posted August 23, 2012 First I'd like to point out that having a non-greedy match when you're trying to match the entire string is a bit unnecessary, and might actually hurt the performance. Secondly, you don't need to define the range with the hex values, you just need to set up a range: $RegExp = "/^[ -~\\t\\n]+\\z/"; While that is said, I don't see any problem with the RegExp you have there. Do you have any examples of the strings that had some data that got through despite it? Also, what does you code look like, where you're using that RegExp and storing the data? I'll check that out soon, thank you for some of the tips. I had setup a test page (using a textarea) to send data to the regular expression. The php... <?php $response = isset($_POST['post']) ? $_POST['post'] : null; $result = null; if ( $response !== null ) { $response = trim($response); if ( preg_match('/^[\x20-\x7e\t\s]+?$/', $response) ) { $result = 'string length: ' . strlen($response) . ' Validated input ' . $response; } else { $result = 'nope'; } } ?> <!DOCTYPE html> <html> <head> <title></title> <link rel="stylesheet" type="text/css" href="style/base.css" /> <link rel="stylesheet" type="text/css" href="style/xform.css" /> </head> <body> <?php echo $result; ?> <div class="xform"> <form method="post" action=""> <div class="inxform"> <fieldset> <legend>Message</legend> <div class="overlay"> <textarea name="post"><?php echo isset($_POST['post']) ? $_POST['post'] : null; ?></textarea> </div> <div class="overlay"> <input type="submit" name="submit_post" value="Post" /> </div> </fieldset> </div> </form> </div> </body> </html> For example, it'd validate īĬĭ and strange characters like that, but not , and so on and so forth with many random characters, that I had generated with the following code... <?php for( $i=0; $i < 1000; $i++) echo "&#" . $i . ";"; ?> So of course all the weird false positives had thrown me off that expression altogether when I was certain I was doing it right. I'm not sure if something else is causing it, as you say the initial expression should in theory work. Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 23, 2012 Share Posted August 23, 2012 Hmm... It might be related to the fact that you're trying to validate an non-ASCII string. If you're using UTF-8 (which you should), then just add the "u" modifier to the RegExp to switch it to UTF-8 mode. Quote Link to comment Share on other sites More sharing options...
sw0o0sh Posted August 24, 2012 Author Share Posted August 24, 2012 I tried your regular expression as follows, /^[ -~\\t\\n]+\\z/ It validated abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*()_+`-=[]\{}|;':",./<>?Ĝĝ But not abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*()_+`-=[]\{}|;':",./<>? I even attempted to change mine with your given tips, which resulted in: /^[\x20-\x7e\t\s]+$/u But produced the same false positives as yours did (also added the u modifier to your Regex as well). Still not sure what the proper solution may be. Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 24, 2012 Share Posted August 24, 2012 Just tried on my local server: $String = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*()_+`-=[]\{}|;\':",./<>?Ĝĝ'; $RegExp = '/^[ -~\\t\\n]+\\z/u'; var_dump (preg_match ($RegExp, $String)); // int(0) $String = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*()_+`-=[]\{}|;\':",./<>?'; var_dump (preg_match ($RegExp, $String)); // int(1) PS: Sorry for the HTML entities in the post, the forum software seems to double escape some times. Quote Link to comment Share on other sites More sharing options...
sw0o0sh Posted August 24, 2012 Author Share Posted August 24, 2012 Just tried on my local server: $String = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*()_+`-=[]\{}|;\':",./<>?Ĝĝ'; $RegExp = '/^[ -~\\t\\n]+\\z/u'; var_dump (preg_match ($RegExp, $String)); // int(0) $String = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*()_+`-=[]\{}|;\':",./<>?'; var_dump (preg_match ($RegExp, $String)); // int(1) PS: Sorry for the HTML entities in the post, the forum software seems to double escape some times. I had the same results on my server when statically placing the input string as you have, however for some reason in the context of input that's being received through $_POST, it still accepts these characters as valid input. Quote Link to comment Share on other sites More sharing options...
sw0o0sh Posted August 24, 2012 Author Share Posted August 24, 2012 I'd like to say the issues been figured out. header('Content-type: text/html; charset=utf-8'); Seemed to fix the problem regarding POST data. Thank you for the tips ChristianF Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 24, 2012 Share Posted August 24, 2012 Ah, mixing charsets will cause issues, indeed. Glad you figured it out though, and happy to be of assistance. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.