Jump to content

XSS prevention


chrisrulez001
Go to solution Solved by Jacques1,

Recommended Posts

Hi there,

 

It's been a few months since I've touched PHP.

 

I've read that you only use htmlspecialchars() when outputting data (for example from a database). Is that the correct way of doing it?

 

Put to prevent XSS from getting into the database from the form, could you not use preg_match() to whitelist what you can actually enter into the field? 

 

Thanks 

Edited by chrisrulez001
Link to comment
Share on other sites

  • Solution

XSS has nothing to do with databases or input validation. It's an output problem caused by programmers who naïvely insert data (from any source) into HTML contexts.

 

This cannot be solved with validation, because

  • formal validity doesn't mean that the data is safe in every possible context. For example, I could give you a perfectly valid e-mail address which is an XSS vector at the same time. Why? Because the format of e-mail addresses was never meant to protect web applications from XSS attacks. Why should it?
  • it's impossible to predict the context in which the data will be used. There's not just HTML. There are thousands of different languages and data formats with distinct syntax rules, and the data may be a threat to every single one of them.
  • a lot of data cannot be validated at all. For example, how would you “validate” the posts on this forum? We obviously have to write down HTML markup and JavaScript code all the time. That's the whole point of this site.

XSS must be prevented during the HTML rendering process. The best solution is to use a proper template engine like Twig which automatically applies HTML-escaping to all outbound data. The second-best solution is to write a wrapper for the htmlspecialchars() function. Using htmlspecialchars() directly is not recommended, because it's extremely error-prone. In my experience, almost nobody understands how to use it correctly.

 

In addition to HTML-escaping, you should use Content Security Policy. This allows you to define strict rules for JavaScript execution and block many attacks.

Link to comment
Share on other sites

  • 4 weeks later...

XSS has nothing to do with databases or input validation. It's an output problem caused by programmers who naïvely insert data (from any source) into HTML contexts.

 

This cannot be solved with validation, because

  • formal validity doesn't mean that the data is safe in every possible context. For example, I could give you a perfectly valid e-mail address which is an XSS vector at the same time. Why? Because the format of e-mail addresses was never meant to protect web applications from XSS attacks. Why should it?
  • it's impossible to predict the context in which the data will be used. There's not just HTML. There are thousands of different languages and data formats with distinct syntax rules, and the data may be a threat to every single one of them.
  • a lot of data cannot be validated at all. For example, how would you “validate” the posts on this forum? We obviously have to write down HTML markup and JavaScript code all the time. That's the whole point of this site.

XSS must be prevented during the HTML rendering process. The best solution is to use a proper template engine like Twig which automatically applies HTML-escaping to all outbound data. The second-best solution is to write a wrapper for the htmlspecialchars() function. Using htmlspecialchars() directly is not recommended, because it's extremely error-prone. In my experience, almost nobody understands how to use it correctly.

 

In addition to HTML-escaping, you should use Content Security Policy. This allows you to define strict rules for JavaScript execution and block many attacks.

 

Ok thank you for your informative post Jacques1 :)

 

I'll have a look at Twig and implementing a Content Security Policy.

 

With regards to htmlspecialchars(), I see from your other post you use ENT_QUOTES | ENT_SUBSITITUTE are these the best flags to use?

Link to comment
Share on other sites

The ENT_QUOTES flag is crucial for security. If you leave it out, only double quotes are escaped, so single-quoted attributes aren't safe at all.

 

ENT_SUBSTITUTE isn't security-related, but it's still important for Unicode encodings (like UTF-8). By default, htmlspecialchars() returns an empty string if the input contains a invalid byte sequence. That's usually not what you want. A more reasonable approach is to substitute the invalid bytes with the Unicode replacement character while leaving the rest of the input intact. And that's what ENT_SUBSTITUTE does.

Edited by Jacques1
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.