marklarah Posted June 21, 2012 Share Posted June 21, 2012 Hi This is more of a general question - obviously when we have user input in our applications, the traditional defence against XSS and Injections and all that kind of thing is to validate user input (and add slashes) and then encode its output each time. This is as opposed to sanitisation. So the age-old question: What's really wrong with encoding the input, rather than the output? Then you don't have to encode data each time you call it! Say this is what my clean function would look like <?php function clean($dirty) { if ($dirty === FALSE) return ''; $dirty = htmlentities($dirty, ENT_QUOTES, "UTF-8"); return trim($dirty); } ?> and I run that on all user input before entering it into the DB. Why is this considered bad practise? Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/ Share on other sites More sharing options...
scootstah Posted June 21, 2012 Share Posted June 21, 2012 By doing that you are potentially limiting what you can do with the data, because you have mangled it. It's usually preferred to maintain data integrity and only adjust it when it's needed. For example there might be a format where you literally want a < instead of <. Sure you could just reverse the process, but you can't be sure that the data will be exactly as it was. Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1355785 Share on other sites More sharing options...
requinix Posted June 21, 2012 Share Posted June 21, 2012 Common example: HTML versus XML. You can't stick things like © or ´ in XML so your clean() will not work for it. Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1355795 Share on other sites More sharing options...
marklarah Posted June 21, 2012 Author Share Posted June 21, 2012 Hi; thanks for the responses! Just curious though... I agree there could be these situations, but then surely the inverse of the clean function would return it to its original formatting (minus any extraneous whitespace)? htmlentites is a 1to1 mapping, as is html_entity_decode, so any 'mangled' data can be returned to its original form, and then used in such situations. To my mind, it's just as easy (if not easier) to encode the data on input, rather than having to encode it each time on output, and simply running an unclean() function the odd occasion you may need to create an RSS feed or whatever. This is all of course, as matters appear to my na?ve mind.... I could perhaps be completely wrong haha. Essentially, all I'm trying to say is that for every argument made for NOT encoding data on input, the same can be said for the converse. Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1355982 Share on other sites More sharing options...
scootstah Posted June 21, 2012 Share Posted June 21, 2012 There are ways to automatically cleanse the data on output so that you don't have to explicitly do it all over the place. For example, if you use templating then you can cleanse any variable data sent to the template. Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1355985 Share on other sites More sharing options...
marklarah Posted June 21, 2012 Author Share Posted June 21, 2012 Hmm, I'm not entirely convinced that's a brilliant solution though - and what about those of us who don't template? idk, it seems there's not really any strong YOU MUST NOT DO THIS OR APACHE WILL EXPLODE kind of reasoning behind not sanitising. As long as you remember do it for every input, and have a method for getting the original data, you can't go wrong! Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1355989 Share on other sites More sharing options...
requinix Posted June 22, 2012 Share Posted June 22, 2012 Hmm, I'm not entirely convinced that's a brilliant solution though - and what about those of us who don't template? idk, it seems there's not really any strong YOU MUST NOT DO THIS OR APACHE WILL EXPLODE kind of reasoning behind not sanitising. As long as you remember do it for every input, and have a method for getting the original data, you can't go wrong! Yeah. As long as you remember to convert back from HTML every time you want to use something like strlen() or string replacing or RSS feeds or someone's API or... Because it's easier that way! Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1355994 Share on other sites More sharing options...
marklarah Posted June 22, 2012 Author Share Posted June 22, 2012 I earnestly believe it to be, yes. I find you only actually "need" the original data in those sorts of situations far less often than you do just output it. In which scenario, the same argument can be said that you need to remember to encode on every. single. output. each. time. In any case, if it's just a question of connivence, then imo both methods are valid, and its just down to preference, and what your application calls for. Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1355999 Share on other sites More sharing options...
Pikachu2000 Posted June 22, 2012 Share Posted June 22, 2012 If the original data has been manipulated before being stored, you can never be certain that anything you do will produce the same data as it was in its original form. Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1356000 Share on other sites More sharing options...
scootstah Posted June 22, 2012 Share Posted June 22, 2012 Most people like to keep user data as close to its original form as possible. Store it the way the user enters it, and then modify it as needed only when you need to. The only time you need to sanitize for XSS is when outputting the data as HTML. There are tons more ways to use data that don't require it to be sanitized for XSS. Ultimately, it is your decision whether you sanitize on input or output. It will work either way, but if you sanitize on input then you are only creating more work for yourself. Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1356002 Share on other sites More sharing options...
marklarah Posted June 22, 2012 Author Share Posted June 22, 2012 If the original data has been manipulated before being stored, you can never be certain that anything you do will produce the same data as it was in its original form. Why? If html_entity_decode is simply the exact reverse of htmlentities, the only changes from the original data that would be lost would be whitespace, or unsuitable characters...which we wouldn't want to store anyway. Of course if your application has to perform lots of operations on the stored unencoded data, then yes it would make sense to store it as such. But I think for most applications (forums, blogs etc), storing it encoded seems to be a much more hassle-free way of doing it. Anyway, I appreciate the arguments you've made, thanks. For large scale applications, I will store it unencoded, but as long as I know all I'm doing with the data is displaying it, I can't see any reason not to store it encoded. Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/264541-handling-xss/#findComment-1356005 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.