Jump to content

validate user input


orange08

Recommended Posts

hi, when i first learn php while developing my first php site, i just ignore the need to validate the user input. but, then when more i learn and more i read online, i found that is very important. so, now i need to check back my entire site to include the validation for my user input.

 

i have know some about this, but still realize that it's not enough, and maybe i still left out some case to concern, so hope you guys here can provide me with your assistance and experience...

 

i think i have few situations need to validate the user input.

 

1. when i get user input to process and insert into my database.

here, i use preg_match() and mysql_real_escape_string()

 

2. check the user login

here, i use mysql_real_escape_string(), i wonder whether need to use preg_match() here?

 

3. to search criteria

here, i use mysql_real_escape_string(), i wonder whether need to use preg_match() here?

 

4. $_GET

here, i use mysql_real_escape_string(), i wonder whether need to use preg_match() here?

 

if you guys feel that i still left out some validations to do, please feel free to give your suggestion here. your assistance is appreciated here! thx!

Link to comment
Share on other sites

Why are you using preg_match() in item #1. You should always use mysql_real_escape_string() when using any input in a query. But, there are many different kinds of validations that should be done and various methods of accomplishing them. It all depends on the particular context.

 

Here are some examples of typical validations that I do (this is not all inclusive, only what came to mind just now. I typically analyze each input and determine the appropriate validations).

 

If the user has a select list of static values I never assume that the value being passed is one of those values. I will typically have an array of those values and ensure the submitted value is in that array. If the list is generated from a list in the database I will ensure the value does exist in the database [using mysql_real_escape_string() of course].

 

If the value should be a number I will ensure that the submitted value is a number and, if applicable, that it is an interger and/or a positive number.

 

For text inputs I will always trim the value before doing any further validation. With very few exceptions, text input should always be timmed. Especially important when validating required fields.

 

If you have any date inputs you would need to validate that they are in fact a date if you are storing the value as a date type in the database.

 

In some instances one input is dependant upon another, such as the selection of a state and a city when selected from "linked" select lists. I will validate that the two values submitted are appropriate.

 

Some other validations are optional. For example you can validate that an email address is in the proper format. This is more to help the user who may have made a mistake since someone can still enter a bogus email address. Other examples of this would be phone numbers, zip codes or anything that "should" accept certain character ranges or has a predetermined format.

Link to comment
Share on other sites

Why are you using preg_match() in item #1. You should always use mysql_real_escape_string() when using any input in a query. But, there are many different kinds of validations that should be done and various methods of accomplishing them. It all depends on the particular context.

 

Here are some examples of typical validations that I do (this is not all inclusive, only what came to mind just now. I typically analyze each input and determine the appropriate validations).

 

If the user has a select list of static values I never assume that the value being passed is one of those values. I will typically have an array of those values and ensure the submitted value is in that array. If the list is generated from a list in the database I will ensure the value does exist in the database [using mysql_real_escape_string() of course].

 

If the value should be a number I will ensure that the submitted value is a number and, if applicable, that it is an interger and/or a positive number.

 

For text inputs I will always trim the value before doing any further validation. With very few exceptions, text input should always be timmed. Especially important when validating required fields.

 

If you have any date inputs you would need to validate that they are in fact a date if you are storing the value as a date type in the database.

 

In some instances one input is dependant upon another, such as the selection of a state and a city when selected from "linked" select lists. I will validate that the two values submitted are appropriate.

 

Some other validations are optional. For example you can validate that an email address is in the proper format. This is more to help the user who may have made a mistake since someone can still enter a bogus email address. Other examples of this would be phone numbers, zip codes or anything that "should" accept certain character ranges or has a predetermined format.

 

thanks for your sharing. for item 1, the purpose for me to use preg_match() is to prevent user to input some special character like ; ~ ^# <> and so on...

 

did you make use of any other function or concept to prevent being hacked?

Link to comment
Share on other sites

Validation generally comes down to either whitelisting or blacklisting. Whitelisting is when you create a set of allowed values. For example, in the drop down list that mjdamato mentioned, he creates an array of all the drop down values, then checks the user input to make sure that the input is one of the values. Since only a certain number of values are allowed, this is whitelisting.

 

Blacklisting is when you let in everything except for what is in the black list. For example, you may have a textarea, and want to disallow it anytime there is a swear word in the input. Since all values except for swear words are allowed, this is a blacklist.

 

Look at each of your inputs and decide whether you need to whitelist or blacklist them, then set the values accordingly.

Link to comment
Share on other sites

thanks for your sharing. for item 1, the purpose for me to use preg_match() is to prevent user to input some special character like ; ~ ^# <> and so on...

 

Is there a particular reason you wanted to strip those characters? There may be a valid reason for doing so, but it would have nothing to do with validation or security. mysql_real_escape_string() will take care of those characters just fine when doing a database query. However, you did make me think of another validation I didn't mention: HTML characters in user input.

 

Again, this all depends on the context that the input will be used. Let's say the input will be displayed within HTML on the page and you don't want the user's input to mess up the page display. Then, you should simply use htmlentities() or htmlspecialchars() when writing the values to the page. Some people would argue that you shoudl escape the value before saving, but I disagree. If you want to allow the user to edit the input you would want to disply the original content with a text input. I will almost always save the original input to the database and ensure to use the proper code when displaying the input depending on the context.

Link to comment
Share on other sites

thanks for your sharing. for item 1, the purpose for me to use preg_match() is to prevent user to input some special character like ; ~ ^# <> and so on...

 

Is there a particular reason you wanted to strip those characters? There may be a valid reason for doing so, but it would have nothing to do with validation or security. mysql_real_escape_string() will take care of those characters just fine when doing a database query. However, you did make me think of another validation I didn't mention: HTML characters in user input.

 

Again, this all depends on the context that the input will be used. Let's say the input will be displayed within HTML on the page and you don't want the user's input to mess up the page display. Then, you should simply use htmlentities() or htmlspecialchars() when writing the values to the page. Some people would argue that you shoudl escape the value before saving, but I disagree. If you want to allow the user to edit the input you would want to disply the original content with a text input. I will almost always save the original input to the database and ensure to use the proper code when displaying the input depending on the context.

 

i'm confuse now...what's the different between htmlentities() and htmlspecialchars()?

 

and in what case should use the above function? is that for any user input? input that will saved to database?

input that will displayed in the webpage? input that will used in the search function?

Link to comment
Share on other sites

Each function has it's particular purpose. It is your responsibility to decide what to use and when. Read the documentation for each function (including the user comments). You will learn a lot.

 

I'll give you a quick rundown of some of those you asked about:

 

You should ALWAYS use mysql_real_escape_string() for any user input when using a query (saving, updating, searching, etc).

 

Almost all user input is displayed on a page in some way or another. When it is you need to decide how" it needs to be displayed. let me give you a few examples:

 

Let's say the user entered "<b>Some text</B>"

 

If you are going to display the input within the body of the HTML page you need to decide if you want the HTML code in the user input to be interpreted ("Some text") or if you want it displayed exactly as they entered it ("<b>Some text</B>"). There are valid reasons for wanting the HTMLin the user input to be validated - but they are rare because allowign the user's input to be interpreted as HTML can allow them to seriously harm your pages (which is why almost all forums use modified tags such as [ b ], for only tags they want to allow. They are then translated into their HTML equivalent when displayed). So, in most cases you want the text to be displayed verbatim. That is when you would use htmlentities() or htmlspecialchars() which will translate certain characters, such as the opening and closing HTML brackets, into the escaped character equivalents.

 

So, the above example would be translated into something like "<b>Some text</b>". But you would only want to use that when displaying the input on the page. It might seem simpler to just save it that way to the database so you don't have to "escape" it whenever you want to display it on the page. But, if you ever want the user to be able to edit their content you wouldn't be able to populate a text field in it's original form. So, save the content exactly how the user entered it. Then use one of the two functions above to escape when displaying within HTML, but display it as-is when repopulating a text field.

 

One other note. Although I state above that you should save the user content exactly how the user entered it, that is not a "global" statement. There are situations where you should remove certain characters or perform other validations. But, what characters you don't allow or changes you make will always depend on how that data is stored and saved (a date for example). You need to make that decision for each piece of data.

Link to comment
Share on other sites

Each function has it's particular purpose. It is your responsibility to decide what to use and when. Read the documentation for each function (including the user comments). You will learn a lot.

 

I'll give you a quick rundown of some of those you asked about:

 

You should ALWAYS use mysql_real_escape_string() for any user input when using a query (saving, updating, searching, etc).

 

Almost all user input is displayed on a page in some way or another. When it is you need to decide how" it needs to be displayed. let me give you a few examples:

 

Let's say the user entered "<b>Some text</B>"

 

If you are going to display the input within the body of the HTML page you need to decide if you want the HTML code in the user input to be interpreted ("Some text") or if you want it displayed exactly as they entered it ("<b>Some text</B>"). There are valid reasons for wanting the HTMLin the user input to be validated - but they are rare because allowign the user's input to be interpreted as HTML can allow them to seriously harm your pages (which is why almost all forums use modified tags such as [ b ], for only tags they want to allow. They are then translated into their HTML equivalent when displayed). So, in most cases you want the text to be displayed verbatim. That is when you would use htmlentities() or htmlspecialchars() which will translate certain characters, such as the opening and closing HTML brackets, into the escaped character equivalents.

 

So, the above example would be translated into something like "<b>Some text</b>". But you would only want to use that when displaying the input on the page. It might seem simpler to just save it that way to the database so you don't have to "escape" it whenever you want to display it on the page. But, if you ever want the user to be able to edit their content you wouldn't be able to populate a text field in it's original form. So, save the content exactly how the user entered it. Then use one of the two functions above to escape when displaying within HTML, but display it as-is when repopulating a text field.

 

One other note. Although I state above that you should save the user content exactly how the user entered it, that is not a "global" statement. There are situations where you should remove certain characters or perform other validations. But, what characters you don't allow or changes you make will always depend on how that data is stored and saved (a date for example). You need to make that decision for each piece of data.

 

ya, i have decided to save the user input exactly how the user entered it to my database, and use only htmlspecialchars() when i display it on the browser.

 

but, then i think about one problem...

for my form that accept user's input, i'll use $_SESSION variable to store the input and display it on the particular field, for the purpose that if user come back to the form due to the validation check, then they don't need to reenter those inputs once again. in this case, is that i need to use htmlspecialchars() to display the input? otherwise if the code got XSS attack then when it's displayed in the field, the attack will take effect...

 

and another case, if all the inputs from a form will be emailed to an email account, is that htmlspecialchars() should be used for all the inputs? otherwise for people who open the mail that contains user input with XSS attack...is being threaten...

 

thanks!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.