kratsg Posted August 7, 2009 Share Posted August 7, 2009 So, I was googling around looking for ideas on how to expand the filter already made for one of our AJAX chatrooms. It allows the moderators to dynamically add/delete words from the list of filtered words using a chatroom command inside the chatroom. So, in order to scan and cover more bad words, I was wondering how to do this (we already have it set-up using preg_replace() where all the words are matched case-insensitively.) The list looks something like: someword cursing swears whee The filter builds the list of words by putting it all into an array with each word surrounded by backslashes and the case-insensitive "i" at the end. IE: .....,'/someword/i',........ So basically, how can I go about expanding that scope such as having the filter not just look for someword, but s0meword, somew0rd, s0mew0rd, som3word, s0m3word, som3w0rd, s0m3w0rd. For example (looping through and switching out to similar characters)... Or with "cursing", the i can be L,1,!, etc... Any ideas? Quote Link to comment https://forums.phpfreaks.com/topic/169165-a-dynamic-word-filter/ Share on other sites More sharing options...
monkeytooth Posted August 7, 2009 Share Posted August 7, 2009 Unfortunately I personally think your going to have to build your own dictionary/array for the filter to work accordingly. However. I suppose (which this might not be the best concept) you could always expand the function that upon adding to the array the one word your looking to filter out, that it runs a str_replace() or similar like function multiple times on the original string to add commonly used expressions to 1337 Talk, or Hax0r style talk. Example your adding monkey for what ever reason.. using your filter and expanding upon it to look for other varations $string = "monkey"; $altversion = str_replace('e', '3'); like I said its not the best notion in the world, but theres bound to be a way to build an array and loop that can do that based on the concept. Quote Link to comment https://forums.phpfreaks.com/topic/169165-a-dynamic-word-filter/#findComment-892654 Share on other sites More sharing options...
kratsg Posted August 7, 2009 Author Share Posted August 7, 2009 Now, what about determining excessive quotations or spaces? IE: "m!o!n!k!e!y" and "m on key" perhaps? I would imagine that the loop itself wouldn't be too hard to manage, but having something literally recognize excessive quotations, etc..? Quote Link to comment https://forums.phpfreaks.com/topic/169165-a-dynamic-word-filter/#findComment-893248 Share on other sites More sharing options...
mikesta707 Posted August 7, 2009 Share Posted August 7, 2009 something like that would be very advanced. I think that moneytooth's method is the best so far (well I can't think of a better way) You could probably do some wicked regex and work it out, like for example, taking a bad word, exploding it into its specific chars, and making a regex expression that searches for all those characters with any number of spaces in between each character. and then of course you could expand that to any number of other characters Quote Link to comment https://forums.phpfreaks.com/topic/169165-a-dynamic-word-filter/#findComment-893255 Share on other sites More sharing options...
kratsg Posted August 11, 2009 Author Share Posted August 11, 2009 See, I'd imagine that for almost any type of web programming language, we encounter a seemingly overwhelming amount of iterations when we deal with filters. On the other hand, filtering out html tags (like <div> and <span> are quite easy cause you cannot replace the i with a 1, and capitalization doesn't matter with case-insensitive matches). Speaking of this as a side note, anyone know why there are some html tags that do not get stripped via strip_tags() function (I don't know all the exceptions, but <script> and <embed> do not get stripped out, for example). @mikesta707, thinking about your idea.. couldn't one just prepare a table of replacements ahead of time? I made the following off the top of my head as a basic... If there's any insight into a better way of how to do what's illustrated below, don't hesitate to post :-D Any idea or thought would be nice. User feedback and input = <3 <?php $str = "mixes"; $find = array("/i/i","/x/i","/e/i");//note, there's 26 letters in the alphabet, this cannot really be a giant array at all... if we looked and tried to filter everything $replace = array("(i|1|!|l)","(x|cks)","(e|3)"); //first, replace a word we want to filter with the appropriate regex stuff preg_replace($find,$replace,$string);//perform case-insensitive replace //next, add appropriate whitespace checkers and filter out whitespace in between each letter $arr = split("",$str);//break it up $str = explode("\s*",$arr);//put it back together with the checking of whitespace in between letters //at this point, our word should look like: m\s*(i|1|!|l)\s*(x|cks)\s*(e|3)\s*s //We can store this line either in the database (after escaping it) or put it in a text file //we can call it back among with other "smart" words and use this //assume it was in a file, each line = a new word $filter = array(); $filename = "filter_list.txt"; $file = fopen($filename,"r"); while($line = fgets($file)){ $line = trim($line);//remove line breaks and extra whitespace $filter[] = "/$line/i";//add our filter words with the backslash delimiters and a case-insensitive specification } fclose($file); $message = preg_replace($filter,"****",$message);//$message refers to a message the user posted ?> Quote Link to comment https://forums.phpfreaks.com/topic/169165-a-dynamic-word-filter/#findComment-895282 Share on other sites More sharing options...
oni-kun Posted August 11, 2009 Share Posted August 11, 2009 Speaking of this as a side note, anyone know why there are some html tags that do not get stripped via strip_tags() function (I don't know all the exceptions, but <script> and <embed> do not get stripped out, for example). Some multi-line tags may accidentally pass through, may be a bug.. you should do something such as this. $htmlstring = preg_replace("'<embed[^>]*>.*</embed>'siU",'',$htmlstring); All in all that suggestion to 'brute-force' alike characters is all you can do, you may want to use a match multiple characters such as 'daaaamn' etc: $content = "grrrrrrrrrrr arggggg loooool shiiiiiit"; $pattern = '{([a-zA-Z])\1+}'; $replacement = '$1$1'; $filtered = preg_replace($pattern, $replacement, $content); Another suggestion is to use 'iconv' to strip out characters such as 'ú' to be filtered into 'u' beforehand, so 'fú*k' can't pass through unfiltered. Quote Link to comment https://forums.phpfreaks.com/topic/169165-a-dynamic-word-filter/#findComment-895295 Share on other sites More sharing options...
kratsg Posted August 11, 2009 Author Share Posted August 11, 2009 Speaking of this as a side note, anyone know why there are some html tags that do not get stripped via strip_tags() function (I don't know all the exceptions, but <script> and <embed> do not get stripped out, for example). Some multi-line tags may accidentally pass through, may be a bug.. you should do something such as this. $htmlstring = preg_replace("'<embed[^>]*>.*</embed>'siU",'',$htmlstring); All in all that suggestion to 'brute-force' alike characters is all you can do, you may want to use a match multiple characters such as 'daaaamn' etc: $content = "grrrrrrrrrrr arggggg loooool shiiiiiit"; $pattern = '{([a-zA-Z])\1+}'; $replacement = '$1$1'; $filtered = preg_replace($pattern, $replacement, $content); Another suggestion is to use 'iconv' to strip out characters such as 'ú' to be filtered into 'u' beforehand, so 'fú*k' can't pass through unfiltered. I will try to test these tags using a textarea to see what really is going on, and I'll apply your suggestion. I was afraid of trying to match the multiple characters because there are double letter words (mississippi o sssshhhiiittt) which we could fix by using {2,} instead. I just think of regex as always being greedy xD Quote Link to comment https://forums.phpfreaks.com/topic/169165-a-dynamic-word-filter/#findComment-895305 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.