Jump to content

Best Regex for Comments field


doubledee

Recommended Posts

What is the best RegEx to use for a Comments field?

 

I suppose I would like people to be able to enter any characters found on an English keyboard.  (Is that too lax?)  :shrug:

 

Should I restrict any particular characters like quotes?

 

I am asking this question from two standpoints...

 

1.) It's a pain to create a Regex that has ever character on your keyboard typed out!

 

2.) From a security standpoint, I'm not sure what to allow?!

 

Thanks,

 

 

Debbie

 

Link to comment
Share on other sites

If you want to allow people to leave a comment without performing a XSS (cross side script)attack.

You don't need regex, but you need something like:

htmlspecialchars();

or

htmlentitities();

 

These two functions covert certain special characters into html entities so that they don't cause any harm (like javascript).

for instance:

<script>alert('boe');</script>

will be converted into

<scriptalert('boe');</script>

If you look in the source code after executing the script with those functions you will see the html entities

 

So there is no need to add some sort of super white list of characters.

 

 

 

 

Link to comment
Share on other sites

To add to that, it's best to use one of the two functions (htmlentities or htmlspecialchars) before displaying that data.  There's no need to use it on data insertion, so don't treat them as you would mysql_real_escape_string.

 

So, for absolute clarity, use mysql_real_escape_string when inserting data into your database.  Use one of htmlentities/htmlspecialchars when pulling your data from your database and displaying it in the browser.

 

Regex is best used as validation for other, smaller inputs in your form, like an email address input.

Link to comment
Share on other sites

To add to that, it's best to use one of the two functions (htmlentities or htmlspecialchars) before displaying that data.  There's no need to use it on data insertion, so don't treat them as you would mysql_real_escape_string.

 

So, for absolute clarity, use mysql_real_escape_string when inserting data into your database.  Use one of htmlentities/htmlspecialchars when pulling your data from your database and displaying it in the browser.

 

Regex is best used as validation for other, smaller inputs in your form, like an email address input.

 

But isn't that what we are talking about also?  Validating data...

 

So you are saying there is no key on my laptop right now that is inherently "evil"?  (As long as I use mysqli_real_escape_string while INSERTing data into MySQL.)

 

I was think of something like...

 

if (preg_match('#^[A-Z \'.,!@#$%^&*()_+=:;\"?-]{2,100}$#i', $trimmed['body'])){
// Escape problematic characters.
$body = mysqli_real_escape_string($dbc, $trimmed['body']);
}else{
$errors['body'] = 'Article Body must be 2-100 characters (A-Z \'.,!@#$%^&*()_+=:;\"?-)';
}

 

But I dunno...

 

 

Debbie

 

Link to comment
Share on other sites

Nightslyr

 

for inserting data into your database I am using strip_tags and  addslashes, but do u think mysql_real_escape_string()  is best?

 

of course mysql_real_escape_string is best :) It was't added for kicks and giggles...

 

Always, Always, Always, Always, Always, Always, Always, Always, Always, Always, Always, Always, Always, Always use mysql_real_escape_string when inserting strings into a database.

 

When not using strings you can do this:

 

<?php
echo $int = (int)"12.123's";  // $int = 12
echo "<br />";
echo $float = (float)"34.234";  // $float = 34.234
echo "<br />";
echo $bool = (bool)"kitty"; // $bool = 1
echo "<br />";
echo $bool = (bool)null; // $bool = 0
?>

Link to comment
Share on other sites

So you are saying there is no key on my laptop right now that is inherently "evil"? 

exactly!

 

On insert

IF you just use mysqli_real_escape_string (for strings) and type casting (for integers) on Insert in your database (or prepared statements) you're good to go

on Output

htmlspecialchars or htmlentities on output you're save against xss attacks (in all modern browsers, small exception for IE6 but that is a bit technical).

 

validating

Keep in mind though validating depends on what you expect. So if you expect a telephone number you are looking for digits and a certain length. IF your looking for an emailaddress you validate if it is an email. If your looking for a name you expect alpha characters.

Feel the difference between validating and sanitizing ?

 

Here is some more reading if you have nothing on your hands: https://www.owasp.org/index.php/Interpreter_Injection#PHP_specific_examples

 

 

 

 

 

Link to comment
Share on other sites

So you are saying there is no key on my laptop right now that is inherently "evil"? 

exactly!

 

On insert

IF you just use mysqli_real_escape_string (for strings) and type casting (for integers) on Insert in your database (or prepared statements) you're good to go

on Output

htmlspecialchars or htmlentities on output you're save against xss attacks (in all modern browsers, small exception for IE6 but that is a bit technical).

 

validating

Keep in mind though validating depends on what you expect. So if you expect a telephone number you are looking for digits and a certain length. IF your looking for an emailaddress you validate if it is an email. If your looking for a name you expect alpha characters.

Feel the difference between validating and sanitizing ?

 

Here is some more reading if you have nothing on your hands: https://www.owasp.org/index.php/Interpreter_Injection#PHP_specific_examples

 

So for fields like these...

 

  - HTML Title

  - Description

  - Page Title

  - Page Subtitle

  - Body

  - Reference Listing

  - Endnote Listing

 

...it almost sounds like I should just skip using Regular Expressions because any character could be valid??  :shrug:

 

 

Debbie

 

 

Link to comment
Share on other sites

o for fields like these...

 

  - HTML Title

  - Description

  - Page Title

  - Page Subtitle

  - Body

  - Reference Listing

  - Endnote Listing

 

...it almost sounds like I should just skip using Regular Expressions because any character could be valid??  :shrug:

 

Well pretty much, yes. ALthough I assume you agree with me that you expect a different length for a title and a description than for the body of a message, right? So besides running stuff through htmlspecialchars() or htmlentities() (to prevent bad things) you probably also want to check for instance the length. But those have a different purpose. The first things is to prevent bad things, the second is to add for instance consistency or readability.

 

as an example. Say I have  the following wishlist: I want,

- a description of max 150 characters and minimum 20 characters

- a title of max 100 characters and minimum of 20 characters

- a message of maximum 1000 characters minimum of 100 characters.

 

We expect other things of them, But they all have atleast 1 thing in common We out put it, thus we must run it through either htmlentities() or htmlspecialchars to prevent bad things apart from our wishes to make it a certain length. (this is true for any user input)

 

A little check for the title could look like this. (notice i only did the title one)

 

<?php error_reporting(E_ALL);
ini_set("display_errors", 1);
header('Content-type: text/html; charset=utf-8');
?>
<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <title></title>
        <style type="text/css">label, textarea{display:block;}</style>
    </head>
    <body>
       <div id="wrapper">
           <?php

            if(isset($_POST['submit'])&& !empty($_POST['title'])){ // if pressed submit and the value is not empty
                //check title value.
                if(strlen($_POST['title'])<101 && strlen($_POST['title']) > 19){ // title between 19 and 101 characters
                    $title = htmlspecialchars($_POST['title']);
                }else{
                    echo 'title is either to long or to short';
                }
                //check other stuff
            }else{
                echo 'insert stuff';
            }
            ?>
           <form action="<?php echo $_SERVER['SCRIPT_NAME']; ?>" method="post">
            <p>
                <label for="title">Title:</label>
                <textarea id="title" name="title" rows="3" cols="50"><?php echo isset($title)? $title : ''; ?></textarea>
                <label for="description">Description:</label>
                <textarea id="description" name="description" rows="5" cols="100"></textarea>
                <label for="messagebody">Message:</label>
                <textarea id="messagebody" name="messagebody" rows="10" cols="100"></textarea>
                <input type="submit" name="submit" value="submit form" />
            </p>
           </form>
       </div>
    </body>
</html>

 

P.s. IF you don't need to use regex, don't! it uses much more resources, and if you are not looking for a distinct pattern it's useless.

Link to comment
Share on other sites

Ah i noticed a little error in my code, nothing big but to keep the input persistent even if the input is outside the min and max characters  place the assignment for $title outside the success part of the clause like so.

 

<?php
if(isset($_POST['submit'])&& !empty($_POST['title'])){ // if pressed submit and the value is not empty
                //check title value.
                $title = htmlspecialchars($_POST['title']);  // MOVED IT TO here to keep it persistent
                if(strlen($_POST['title'])<101 && strlen($_POST['title']) > 19){ // title between 19 and 101 characters
                   //all is good
                }else{
                    echo 'title is either to long or to short';
                }
                //check other stuff
            }else{
                echo 'insert stuff';
            }

?>

Link to comment
Share on other sites

o for fields like these...

 

  - HTML Title

  - Description

  - Page Title

  - Page Subtitle

  - Body

  - Reference Listing

  - Endnote Listing

 

...it almost sounds like I should just skip using Regular Expressions because any character could be valid??  :shrug:

 

Well pretty much, yes. Although I assume you agree with me that you expect a different length for a title and a description than for the body of a message, right? So besides running stuff through htmlspecialchars() or htmlentities() (to prevent bad things) you probably also want to check for instance the length. But those have a different purpose. The first things is to prevent bad things, the second is to add for instance consistency or readability.

 

Okay, so maybe I'm trying too hard.

 

 

Debbie

 

 

Link to comment
Share on other sites

doubledee,

 

  I don't know if you can try too hard when you think about security of your program. However, in addition to what everyone else is saying, you can use the RegEx for different purposes.

 

  You are proposing that you should use it to secure data. Where as, the reality is you can use what those above mentioned for the security. What, in my opinion, you should use RegEx for is validating. You might think validating and securing are the same thing, but they are not. They do go hand in hand, though. We validate to ensure that the user is typing in the appropriate thing. The easiest way to explain this is like so:

 

User submits email of "thisisnotavalidemail"

 

This string won't hurt our system, but it's not what you wanted. So you validate it using RegEx. Now you can slap the user in the face with "You must enter a valid email."

 

-------

 

Also, on a separate note... If you are letting users create pages and you want to allow them to insert HTML, make sure you use htmlspecialchars_decode before outputting it to the browser.

 

Tom

Link to comment
Share on other sites

Also, on a separate note... If you are letting users create pages and you want to allow them to insert HTML, make sure you use htmlspecialchars_decode before outputting it to the browser.

 

Tom

 

I am working on two things in tandom...

 

"Add an Article" which is just for me and needs to allow me to add HTML/Text

 

"Add a Comment" which will allow Members to add comments on my articles and - for now - will be text-only and must be Admin-approved!

 

 

Debbie

 

 

Link to comment
Share on other sites

here is basically how I like to do it:

 

(untested)

<?php
function between($min, $max, $str){
if(strlen($str) >= $min && strlen($str) <= $max)
	return true;
return false;
}
function post($key, $default = ""){
if(isset($_POST[$key]))
	return $_POST[$key];
return $default;
}

$title = post('title');
$comment = post('comment');
$description = post('description');
$errors = array();
if(!between(3, 20, $title)){
$errors[] = "Title is an invalid length)";
}
if(!between(10, 1000, $comment)){
$errors[] = "Comment is an invalid length)";
}
if(!between(3, 150, $description)){
$errors[] = "Description is an invalid length)";
}

if(count($errors) > 0){
implode("<br />", $errors);
}else{
$title = mysql_real_escape_string($title);
$descr = mysql_real_escape_string($description);
$comment = mysql_real_escape_string($comment);
mysql_query("insert into comments (title, descr, comment) values ('$title', '$descr', '$comment')");
header("location: /comments.php#new");
exit;
}
?>
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="title" maxlength="20" value="<?php echo $title; ?>" />
<input type="text" name="description" maxlength="150" value="<?php echo $description; ?>" />
<textarea cols="100" rows="5"><?php echo $description; ?></textarea>
</form>

Link to comment
Share on other sites

The Little Guy: I don't really see the relevance to the post.

 

DoubleDee, if you want html for admin and not for comments / users, I would do htmlspecialchars (with ENT_QUOTES) before you input it (reguardless of what it is). Then when you are approving comments, ensure that it has no html. You could use strip_tags on the comments when you are reviewing it to ensure it has no html values. Also, when you are reviewing it, I would review it in a textarea so that you can easily see all characters that were submitted. Then, when you post it to the page, run htmlspecialchars_decode($value, ENT_QUOTES) to decode everything. If there isn't any html then it won't do anything.

Link to comment
Share on other sites

Debbie,

 

  I haven't used HTMLPurifier, but I'd bet a couple pennies that you use it similar to a class.

 

Advantages:

 

  1) It has solutions built / accepted by a community.

 

Disadvantages:

 

  2) You learn a library that quite possibly could be made obsolete in a couple years.

 

There's more, for each, but I don't like relying on libraries because they often times become more involved and time consuming than just making a tailored solution for the specific project. That's just my two cents. I say keep on doing what your doing and learn the best ways to handle each situation. That will help you more in the long run. You will learn more and find out what you should and shouldn't do. But again, that's just my two cents.

Tom

Link to comment
Share on other sites

Debbie,

 

  I haven't used HTMLPurifier, but I'd bet a couple pennies that you use it similar to a class.

 

Advantages:

 

  1) It has solutions built / accepted by a community.

 

Disadvantages:

 

  2) You learn a library that quite possibly could be made obsolete in a couple years.

 

There's more, for each, but I don't like relying on libraries because they often times become more involved and time consuming than just making a tailored solution for the specific project. That's just my two cents. I say keep on doing what your doing and learn the best ways to handle each situation. That will help you more in the long run. You will learn more and find out what you should and shouldn't do. But again, that's just my two cents.

Tom

 

I'm a big believer of "home-grown" solutions and learning-as-you-go.  (If I wanted "easy" I would have used WordPress and been done months ago?!)

 

 

Debbie

 

 

Link to comment
Share on other sites

Debbie,

 

please reread the stuff above , and notice the difference between

 

validating and sanitizing.

 

Again DONT use regex if you don't need to. php has htmlspecialchars() and htmlentities for a reason. Just look up in the manual or for the sake of it any security guide. What you wont find there is regex. regex is used for instance to check if an email address is valid...

But if you want to, please do I do not mind. Pretty much all examples are given and it should be clear now.  :(

Link to comment
Share on other sites

Debbie,

 

  I haven't used HTMLPurifier, but I'd bet a couple pennies that you use it similar to a class.

 

Advantages:

 

  1) It has solutions built / accepted by a community.

 

Disadvantages:

 

  2) You learn a library that quite possibly could be made obsolete in a couple years.

 

There's more, for each, but I don't like relying on libraries because they often times become more involved and time consuming than just making a tailored solution for the specific project. That's just my two cents. I say keep on doing what your doing and learn the best ways to handle each situation. That will help you more in the long run. You will learn more and find out what you should and shouldn't do. But again, that's just my two cents.

Tom

 

I'm a big believer of "home-grown" solutions and learning-as-you-go.  (If I wanted "easy" I would have used WordPress and been done months ago?!)

 

 

Debbie

 

 

 

Exactly, I don't see how you learn anything by just installing a class to do everything for you. Doubledee, do it yourself, then compare your method with other methods and see what you could improve on. Real security issues are past this validating and sanitizing stuff in my opinion...

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.