Jump to content

Properly sanitize HTML input


IchBin

Recommended Posts

I have a script that I'm creating which allows users to submit news articles. I realize the caveats of allowing users to input such data, but the couple users who I allow to post I trust. These two like to design in their own apps which use some type of wysiwig to create their HTML layout. What I don't know as a coder, is the proper method of doing my best to sanitize this data. I don't really want to try and include some library that does all of this stuff. I'd like just a run down on how it could be done with my own coding.

 

I've done a few searches, but I'm not sure I know the right search terms to get what I'm looking for. If you have any links to other topics, that would be great.

 

Should all the HTML characters be converted to entities? Is it necessary?  That would mean I'd have to decode the HTML for display. Do I gain any security from having to do that? It doesn't sound like it to me.

 

Should I use addslashes()?

 

Of course mysql_real_escape_string() would be used on the query to insert. But I'm thinking I need to do more with the data before it gets inserted.

 

Basic steps to protect myself is all I'm looking for. Thanks for any input.

 

 

--edit--

 

Sorry, posted in installation. Would a mod please move me to the PHP code board...

Link to comment
Share on other sites

That's just it, I don't really want to strip any tags. I want to allow these two people to post any html they want. I just want to make sure I am able to protect my server as much as possible.

 

I'm guessing as long as I lock it down to them and make sure I do the appropriate escaping/cleaning before putting it into the database, that might be all I can do.

Link to comment
Share on other sites

If you aren't worried about your users injecting harmful HTML into the news articles, than sanitizing isn't really a problem. html entities/decode might be good so you don't have to deal with quotes when inserting into the database (it turns quotes into entities also), but either way would be ok. One could argue that if you trust your users enough not to put harmful html into the news articles, you could trust them enough not to attempt to insert SQL injections.

 

Personally, I would strip all HTML tags and use a BBC code type system, but I don't trust anyone

Link to comment
Share on other sites

BBCode is the way to go.

 

At the very least, remove any script, iframe, etc tags that allow outside information to be loaded into your site.

 

In addition to this, also remember that people can use onXxx attributes (like onClick, onFocus, etc.) to insert javascript into your page. OnClick events may not seem very dangerous, but an event like onLoad with some javascript that did some bad stuff could potentially be devastating. However, this assumes that you do not trust the user inputting data. Simply stripping the tags that xyph suggested may very well suffice since you mentioned you trust the user input to a point

Link to comment
Share on other sites

Don't trust anyone. Use flexible bbcodes. Even the style attribute can be used for XSS attacks.

 

If you allow font options, paragraphs, tables, lists, images with alignment, possibly youtube or swf embeds (though swf embeds can be dangerous) - what else do you need?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.