doubledee Posted August 20, 2011 Share Posted August 20, 2011 What is the best RegEx to use for a Comments field? I suppose I would like people to be able to enter any characters found on an English keyboard. (Is that too lax?) Should I restrict any particular characters like quotes? I am asking this question from two standpoints... 1.) It's a pain to create a Regex that has ever character on your keyboard typed out! 2.) From a security standpoint, I'm not sure what to allow?! Thanks, Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/ Share on other sites More sharing options...
cssfreakie Posted August 20, 2011 Share Posted August 20, 2011 If you want to allow people to leave a comment without performing a XSS (cross side script)attack. You don't need regex, but you need something like: htmlspecialchars(); or htmlentitities(); These two functions covert certain special characters into html entities so that they don't cause any harm (like javascript). for instance: <script>alert('boe');</script> will be converted into <scriptalert('boe');</script> If you look in the source code after executing the script with those functions you will see the html entities So there is no need to add some sort of super white list of characters. Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1259965 Share on other sites More sharing options...
KevinM1 Posted August 20, 2011 Share Posted August 20, 2011 To add to that, it's best to use one of the two functions (htmlentities or htmlspecialchars) before displaying that data. There's no need to use it on data insertion, so don't treat them as you would mysql_real_escape_string. So, for absolute clarity, use mysql_real_escape_string when inserting data into your database. Use one of htmlentities/htmlspecialchars when pulling your data from your database and displaying it in the browser. Regex is best used as validation for other, smaller inputs in your form, like an email address input. Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1259969 Share on other sites More sharing options...
voip03 Posted August 20, 2011 Share Posted August 20, 2011 Nightslyr for inserting data into your database I am using strip_tags and addslashes, but do u think mysql_real_escape_string() is best? Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1259978 Share on other sites More sharing options...
doubledee Posted August 21, 2011 Author Share Posted August 21, 2011 To add to that, it's best to use one of the two functions (htmlentities or htmlspecialchars) before displaying that data. There's no need to use it on data insertion, so don't treat them as you would mysql_real_escape_string. So, for absolute clarity, use mysql_real_escape_string when inserting data into your database. Use one of htmlentities/htmlspecialchars when pulling your data from your database and displaying it in the browser. Regex is best used as validation for other, smaller inputs in your form, like an email address input. But isn't that what we are talking about also? Validating data... So you are saying there is no key on my laptop right now that is inherently "evil"? (As long as I use mysqli_real_escape_string while INSERTing data into MySQL.) I was think of something like... if (preg_match('#^[A-Z \'.,!@#$%^&*()_+=:;\"?-]{2,100}$#i', $trimmed['body'])){ // Escape problematic characters. $body = mysqli_real_escape_string($dbc, $trimmed['body']); }else{ $errors['body'] = 'Article Body must be 2-100 characters (A-Z \'.,!@#$%^&*()_+=:;\"?-)'; } But I dunno... Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1259994 Share on other sites More sharing options...
The Little Guy Posted August 21, 2011 Share Posted August 21, 2011 Nightslyr for inserting data into your database I am using strip_tags and addslashes, but do u think mysql_real_escape_string() is best? of course mysql_real_escape_string is best It was't added for kicks and giggles... Always, Always, Always, Always, Always, Always, Always, Always, Always, Always, Always, Always, Always, Always use mysql_real_escape_string when inserting strings into a database. When not using strings you can do this: <?php echo $int = (int)"12.123's"; // $int = 12 echo "<br />"; echo $float = (float)"34.234"; // $float = 34.234 echo "<br />"; echo $bool = (bool)"kitty"; // $bool = 1 echo "<br />"; echo $bool = (bool)null; // $bool = 0 ?> Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1259995 Share on other sites More sharing options...
cssfreakie Posted August 21, 2011 Share Posted August 21, 2011 So you are saying there is no key on my laptop right now that is inherently "evil"? exactly! On insert IF you just use mysqli_real_escape_string (for strings) and type casting (for integers) on Insert in your database (or prepared statements) you're good to go on Output htmlspecialchars or htmlentities on output you're save against xss attacks (in all modern browsers, small exception for IE6 but that is a bit technical). validating Keep in mind though validating depends on what you expect. So if you expect a telephone number you are looking for digits and a certain length. IF your looking for an emailaddress you validate if it is an email. If your looking for a name you expect alpha characters. Feel the difference between validating and sanitizing ? Here is some more reading if you have nothing on your hands: https://www.owasp.org/index.php/Interpreter_Injection#PHP_specific_examples Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1259997 Share on other sites More sharing options...
doubledee Posted August 21, 2011 Author Share Posted August 21, 2011 So you are saying there is no key on my laptop right now that is inherently "evil"? exactly! On insert IF you just use mysqli_real_escape_string (for strings) and type casting (for integers) on Insert in your database (or prepared statements) you're good to go on Output htmlspecialchars or htmlentities on output you're save against xss attacks (in all modern browsers, small exception for IE6 but that is a bit technical). validating Keep in mind though validating depends on what you expect. So if you expect a telephone number you are looking for digits and a certain length. IF your looking for an emailaddress you validate if it is an email. If your looking for a name you expect alpha characters. Feel the difference between validating and sanitizing ? Here is some more reading if you have nothing on your hands: https://www.owasp.org/index.php/Interpreter_Injection#PHP_specific_examples So for fields like these... - HTML Title - Description - Page Title - Page Subtitle - Body - Reference Listing - Endnote Listing ...it almost sounds like I should just skip using Regular Expressions because any character could be valid?? Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260033 Share on other sites More sharing options...
cssfreakie Posted August 21, 2011 Share Posted August 21, 2011 o for fields like these... - HTML Title - Description - Page Title - Page Subtitle - Body - Reference Listing - Endnote Listing ...it almost sounds like I should just skip using Regular Expressions because any character could be valid?? Well pretty much, yes. ALthough I assume you agree with me that you expect a different length for a title and a description than for the body of a message, right? So besides running stuff through htmlspecialchars() or htmlentities() (to prevent bad things) you probably also want to check for instance the length. But those have a different purpose. The first things is to prevent bad things, the second is to add for instance consistency or readability. as an example. Say I have the following wishlist: I want, - a description of max 150 characters and minimum 20 characters - a title of max 100 characters and minimum of 20 characters - a message of maximum 1000 characters minimum of 100 characters. We expect other things of them, But they all have atleast 1 thing in common We out put it, thus we must run it through either htmlentities() or htmlspecialchars to prevent bad things apart from our wishes to make it a certain length. (this is true for any user input) A little check for the title could look like this. (notice i only did the title one) <?php error_reporting(E_ALL); ini_set("display_errors", 1); header('Content-type: text/html; charset=utf-8'); ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title></title> <style type="text/css">label, textarea{display:block;}</style> </head> <body> <div id="wrapper"> <?php if(isset($_POST['submit'])&& !empty($_POST['title'])){ // if pressed submit and the value is not empty //check title value. if(strlen($_POST['title'])<101 && strlen($_POST['title']) > 19){ // title between 19 and 101 characters $title = htmlspecialchars($_POST['title']); }else{ echo 'title is either to long or to short'; } //check other stuff }else{ echo 'insert stuff'; } ?> <form action="<?php echo $_SERVER['SCRIPT_NAME']; ?>" method="post"> <p> <label for="title">Title:</label> <textarea id="title" name="title" rows="3" cols="50"><?php echo isset($title)? $title : ''; ?></textarea> <label for="description">Description:</label> <textarea id="description" name="description" rows="5" cols="100"></textarea> <label for="messagebody">Message:</label> <textarea id="messagebody" name="messagebody" rows="10" cols="100"></textarea> <input type="submit" name="submit" value="submit form" /> </p> </form> </div> </body> </html> P.s. IF you don't need to use regex, don't! it uses much more resources, and if you are not looking for a distinct pattern it's useless. Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260123 Share on other sites More sharing options...
cssfreakie Posted August 21, 2011 Share Posted August 21, 2011 Ah i noticed a little error in my code, nothing big but to keep the input persistent even if the input is outside the min and max characters place the assignment for $title outside the success part of the clause like so. <?php if(isset($_POST['submit'])&& !empty($_POST['title'])){ // if pressed submit and the value is not empty //check title value. $title = htmlspecialchars($_POST['title']); // MOVED IT TO here to keep it persistent if(strlen($_POST['title'])<101 && strlen($_POST['title']) > 19){ // title between 19 and 101 characters //all is good }else{ echo 'title is either to long or to short'; } //check other stuff }else{ echo 'insert stuff'; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260133 Share on other sites More sharing options...
doubledee Posted August 21, 2011 Author Share Posted August 21, 2011 o for fields like these... - HTML Title - Description - Page Title - Page Subtitle - Body - Reference Listing - Endnote Listing ...it almost sounds like I should just skip using Regular Expressions because any character could be valid?? Well pretty much, yes. Although I assume you agree with me that you expect a different length for a title and a description than for the body of a message, right? So besides running stuff through htmlspecialchars() or htmlentities() (to prevent bad things) you probably also want to check for instance the length. But those have a different purpose. The first things is to prevent bad things, the second is to add for instance consistency or readability. Okay, so maybe I'm trying too hard. Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260151 Share on other sites More sharing options...
teynon Posted August 21, 2011 Share Posted August 21, 2011 doubledee, I don't know if you can try too hard when you think about security of your program. However, in addition to what everyone else is saying, you can use the RegEx for different purposes. You are proposing that you should use it to secure data. Where as, the reality is you can use what those above mentioned for the security. What, in my opinion, you should use RegEx for is validating. You might think validating and securing are the same thing, but they are not. They do go hand in hand, though. We validate to ensure that the user is typing in the appropriate thing. The easiest way to explain this is like so: User submits email of "thisisnotavalidemail" This string won't hurt our system, but it's not what you wanted. So you validate it using RegEx. Now you can slap the user in the face with "You must enter a valid email." ------- Also, on a separate note... If you are letting users create pages and you want to allow them to insert HTML, make sure you use htmlspecialchars_decode before outputting it to the browser. Tom Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260163 Share on other sites More sharing options...
doubledee Posted August 21, 2011 Author Share Posted August 21, 2011 Also, on a separate note... If you are letting users create pages and you want to allow them to insert HTML, make sure you use htmlspecialchars_decode before outputting it to the browser. Tom I am working on two things in tandom... "Add an Article" which is just for me and needs to allow me to add HTML/Text "Add a Comment" which will allow Members to add comments on my articles and - for now - will be text-only and must be Admin-approved! Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260167 Share on other sites More sharing options...
The Little Guy Posted August 21, 2011 Share Posted August 21, 2011 here is basically how I like to do it: (untested) <?php function between($min, $max, $str){ if(strlen($str) >= $min && strlen($str) <= $max) return true; return false; } function post($key, $default = ""){ if(isset($_POST[$key])) return $_POST[$key]; return $default; } $title = post('title'); $comment = post('comment'); $description = post('description'); $errors = array(); if(!between(3, 20, $title)){ $errors[] = "Title is an invalid length)"; } if(!between(10, 1000, $comment)){ $errors[] = "Comment is an invalid length)"; } if(!between(3, 150, $description)){ $errors[] = "Description is an invalid length)"; } if(count($errors) > 0){ implode("<br />", $errors); }else{ $title = mysql_real_escape_string($title); $descr = mysql_real_escape_string($description); $comment = mysql_real_escape_string($comment); mysql_query("insert into comments (title, descr, comment) values ('$title', '$descr', '$comment')"); header("location: /comments.php#new"); exit; } ?> <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post"> <input type="text" name="title" maxlength="20" value="<?php echo $title; ?>" /> <input type="text" name="description" maxlength="150" value="<?php echo $description; ?>" /> <textarea cols="100" rows="5"><?php echo $description; ?></textarea> </form> Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260172 Share on other sites More sharing options...
teynon Posted August 21, 2011 Share Posted August 21, 2011 The Little Guy: I don't really see the relevance to the post. DoubleDee, if you want html for admin and not for comments / users, I would do htmlspecialchars (with ENT_QUOTES) before you input it (reguardless of what it is). Then when you are approving comments, ensure that it has no html. You could use strip_tags on the comments when you are reviewing it to ensure it has no html values. Also, when you are reviewing it, I would review it in a textarea so that you can easily see all characters that were submitted. Then, when you post it to the page, run htmlspecialchars_decode($value, ENT_QUOTES) to decode everything. If there isn't any html then it won't do anything. Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260175 Share on other sites More sharing options...
doubledee Posted August 21, 2011 Author Share Posted August 21, 2011 here is basically how I like to do it: You know that all of that could be done with one line of RegEx? Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260189 Share on other sites More sharing options...
darkfreaks Posted August 21, 2011 Share Posted August 21, 2011 better yet why not just use HTML PURIFIER library it has built in regex and it sanitizes pretty good. Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260201 Share on other sites More sharing options...
doubledee Posted August 21, 2011 Author Share Posted August 21, 2011 better yet why not just use HTML PURIFIER library it has built in regex and it sanitizes pretty good. Because I'm already feeling overwhelmed... Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260206 Share on other sites More sharing options...
darkfreaks Posted August 21, 2011 Share Posted August 21, 2011 http://www.symantec.com/connect/articles/detection-sql-injection-and-cross-site-scripting-attacks Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260210 Share on other sites More sharing options...
doubledee Posted August 21, 2011 Author Share Posted August 21, 2011 What do you have to do to install HTMLPurifier? What do you have to do to use HTMLPurifier? Why use that versus a home-gron solution? What "catches" are there? Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260211 Share on other sites More sharing options...
darkfreaks Posted August 21, 2011 Share Posted August 21, 2011 http://htmlpurifier.org/ Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260212 Share on other sites More sharing options...
teynon Posted August 21, 2011 Share Posted August 21, 2011 Debbie, I haven't used HTMLPurifier, but I'd bet a couple pennies that you use it similar to a class. Advantages: 1) It has solutions built / accepted by a community. Disadvantages: 2) You learn a library that quite possibly could be made obsolete in a couple years. There's more, for each, but I don't like relying on libraries because they often times become more involved and time consuming than just making a tailored solution for the specific project. That's just my two cents. I say keep on doing what your doing and learn the best ways to handle each situation. That will help you more in the long run. You will learn more and find out what you should and shouldn't do. But again, that's just my two cents. Tom Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260213 Share on other sites More sharing options...
doubledee Posted August 21, 2011 Author Share Posted August 21, 2011 Debbie, I haven't used HTMLPurifier, but I'd bet a couple pennies that you use it similar to a class. Advantages: 1) It has solutions built / accepted by a community. Disadvantages: 2) You learn a library that quite possibly could be made obsolete in a couple years. There's more, for each, but I don't like relying on libraries because they often times become more involved and time consuming than just making a tailored solution for the specific project. That's just my two cents. I say keep on doing what your doing and learn the best ways to handle each situation. That will help you more in the long run. You will learn more and find out what you should and shouldn't do. But again, that's just my two cents. Tom I'm a big believer of "home-grown" solutions and learning-as-you-go. (If I wanted "easy" I would have used WordPress and been done months ago?!) Debbie Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260214 Share on other sites More sharing options...
cssfreakie Posted August 21, 2011 Share Posted August 21, 2011 Debbie, please reread the stuff above , and notice the difference between validating and sanitizing. Again DONT use regex if you don't need to. php has htmlspecialchars() and htmlentities for a reason. Just look up in the manual or for the sake of it any security guide. What you wont find there is regex. regex is used for instance to check if an email address is valid... But if you want to, please do I do not mind. Pretty much all examples are given and it should be clear now. Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260307 Share on other sites More sharing options...
phpSensei Posted August 21, 2011 Share Posted August 21, 2011 Debbie, I haven't used HTMLPurifier, but I'd bet a couple pennies that you use it similar to a class. Advantages: 1) It has solutions built / accepted by a community. Disadvantages: 2) You learn a library that quite possibly could be made obsolete in a couple years. There's more, for each, but I don't like relying on libraries because they often times become more involved and time consuming than just making a tailored solution for the specific project. That's just my two cents. I say keep on doing what your doing and learn the best ways to handle each situation. That will help you more in the long run. You will learn more and find out what you should and shouldn't do. But again, that's just my two cents. Tom I'm a big believer of "home-grown" solutions and learning-as-you-go. (If I wanted "easy" I would have used WordPress and been done months ago?!) Debbie Exactly, I don't see how you learn anything by just installing a class to do everything for you. Doubledee, do it yourself, then compare your method with other methods and see what you could improve on. Real security issues are past this validating and sanitizing stuff in my opinion... Quote Link to comment https://forums.phpfreaks.com/topic/245314-best-regex-for-comments-field/#findComment-1260310 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.