kratsg Posted September 14, 2009 Share Posted September 14, 2009 Recently, I've noticed that there can be workarounds to even a simple preg_replace: $blockedarray=array( '/alert=/i', '/alert\(/i', '/iframe/i', '/<me/i', '/<object/i', '/\.cookie/i', '/<app/i', '/mysql/i', '/document\.location/i', '/\0/', '/@import/i', '/<xml/i', '/<meta/i', '/s\nc\nr\ni/i', '/<emb/i', '/<java/i','/moz[^A-Z]/i','/onload/i', '/\+ document/i', '/\;p\?/i', '/\' \+ \'/', '/\' \+\'/', '/\'\+\'/', '/\'\+ \'/', '/-binding:/i', '/-binding :/i' ); $input = preg_replace($blockedarray, '*blocked*', $input); Where instead of normally filtering out the html, we're also looking at filtering javascript. It turns out that given the right combination of filtered words, you can create a string that itself isn't filtered, IE: $replace = array('apple','banana','cookie'); $input = preg_replace($replace,'',$input); Let's say my input/ouputs are below: $input = 'apple'; //output: '' $input = 'applebanana'; //output: '' $input = 'appbananale'; //output: 'apple' Let's re-arrange the array: $replace = array('banana','cookie','apple'); $input = preg_replace($replace,'',$input); Now, we have: $input = 'appbananale'; //output: '' That last case shows the problems with preg_replace! And in fact, this poses a problem for most situations. However, a simple workaround would be to replace anything filtered out with '***' or something similar or to even re-arrange the array.. but then one can do 'banappleana' and get it to not filter again. So I wonder, should we try to recursively replace our input until there is no change? Or should we just simply replace anything filtered out with '***'? IE: $replace = array('banana','cookie','apple'); $counter = 0; while($input != preg_replace($replace,'',$input)) $counter++; Then, with: $input = 'banappleana'; //output: '', $counter = 2 (0->1 after replacing 'apple', 1->2 after replacing 'banana') What's your view? Comments? Suggestions? Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/ Share on other sites More sharing options...
.josh Posted September 14, 2009 Share Posted September 14, 2009 so basically what you're saying is someone can type "asfuckshole" and you replace fuck with nothing and you wind up with asshole? I suppose if you really wanted to address that, a recursive function would do the trick. Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-917997 Share on other sites More sharing options...
kratsg Posted September 14, 2009 Author Share Posted September 14, 2009 so basically what you're saying is someone can type "asfuckshole" and you replace fuck with nothing and you wind up with asshole? I suppose if you really wanted to address that, a recursive function would do the trick. I wasn't sure if you could really curse on these forums >.< But yeah, I was just curious as to whether either way would be better. The main issue here with the javascript was that we wanted to filter all javascript perfectly, yet some people were able to workaround this by adding other curse words and etc... using the above methods. Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-917999 Share on other sites More sharing options...
corbin Posted September 14, 2009 Share Posted September 14, 2009 As far as the escaping JS goes, why not just use htmlentities? Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-918004 Share on other sites More sharing options...
kratsg Posted September 14, 2009 Author Share Posted September 14, 2009 As far as the escaping JS goes, why not just use htmlentities? So here's the fun part. We are allowing html, but we don't want javascript. >.< At least, that's what the site owner wants. So... that's where it is a tad difficult. Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-918008 Share on other sites More sharing options...
corbin Posted September 14, 2009 Share Posted September 14, 2009 Oh.... I had to do something like that recently.... I went with whitelisting where I parsed every tag then parsed the attributes and so on.... In your case though, you could probably just recursively strip out script tags, on<blah>= attributes and tag="javascript:" crap. Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-918016 Share on other sites More sharing options...
kratsg Posted September 14, 2009 Author Share Posted September 14, 2009 I went with whitelisting where I parsed every tag then parsed the attributes and so on.... Wow, that sounds painful to code. I assume you put this into one giant pattern or something and used preg_match/replace? Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-918018 Share on other sites More sharing options...
corbin Posted September 14, 2009 Share Posted September 14, 2009 Well it wasn't too bad since I only allowed certain elements.... It was like p, a, b, i, img, center, div and so on that were allowed. It was quite a pain though to parse the style="var: val; var2: val2;" pairs to make sure they were allowed lol. (There's a reason BBCode developed , but there were issues with it.) It was a pretty simple design actually.... I had an array of allowed tags, and for some tags I had handlers. Then I parsed everything with preg_replace() with an /e modifier. If the tag name was mapped to true in the array, it was blindly returned, if it was mapped to a string, the tag, its content and its attributes were passed to a method. The method would further parse and decide what to return and so on.... It was much simpler than that sounded lol. (I would share the class with you, but as I used it in a paid project.....) Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-918024 Share on other sites More sharing options...
kratsg Posted September 14, 2009 Author Share Posted September 14, 2009 Hmm.. wow. About that "e" modifier, how were you using that? I've never really seen an example with it, and seeing how I was able to do normal substitutions with $1, $2, etc.. I never got the point of it. For the most part, I assumed it was a stupid form of "eval($replacement)". Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-918029 Share on other sites More sharing options...
.josh Posted September 14, 2009 Share Posted September 14, 2009 kinda off-topic but here's an example of using the 'e' modifier with preg_replace: $string = "A AA AAA AAAA AAAAA AAAA AAA AA A"; $string = preg_replace("~(\w{3,})~e","strtolower('$1')",$string); echo $string; output: A AA aaa aaaa aaaaa aaaa aaa AA A Quote Link to comment https://forums.phpfreaks.com/topic/174145-solved-a-decent-effective-filter-using-preg_replace/#findComment-918061 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.