Chappers Posted December 29, 2008 Share Posted December 29, 2008 Hi everyone, Wonder if someone could help with my swear filter please, I've written it myself but now reached extent of my limited abilities... This is shortened version of code: <form method='post' action='<?php echo $_SERVER['PHP_SELF'] ?>'> <table cellpadding='0' cellspacing='0' border='0'> <tr><td><textarea name='comment' rows='4' cols='80'></textarea></td></tr> <tr><td colspan='2' align='center'><input class='submitbutton' type='submit' name='submit' value='Post'></td></tr> </table> </form> <?php function SwearFilter($str) { $swearwords = array("arse", "ass", "bitch"); $replacement = "<span class='badwords'>[censored]</span>"; foreach ($swearwords as $swearword) { $str = eregi_replace($swearword, $replacement, $str); } return $str; } if (isset($_POST['submit'])) { $comment = $_POST['comment']; $comment = SwearFilter($comment); echo $comment } ?> Trouble with this is that a word from the swearword list can be contained within another word and still be found, i.e. class becomes "cl[censored]". I wasn't overly bothered about that until I found one result of this is that should the first word in the swearlist checked against the comment data find a match (so it finds "arse"), the string then has <span class='badwords'>[censored]</span> added to it, so when the function moves to the next word in the swearlist, which is "ass", it finds a match in "class" from the previous match and changes "<span class='badwords'>[censored]</span>" into "[censored]='badwords'>[censored]</span>". I realise I could simply put "ass" as the first word in the swearlist and then only people submitting the form with ass in it at any point gets censored rather than my "span class..." but what would be even better is if I could add a list of permitted words. The other option is to ensure only whole words matching those in my list get grabbed, but I'd rather have a permitted words list if that's possible? Thanks for any help or advice! Quote Link to comment Share on other sites More sharing options...
opalelement Posted December 29, 2008 Share Posted December 29, 2008 Try replacing this: $str = eregi_replace($swearword, $replacement, $str); with this: $str = pregi_replace("/\b".$swearword."\b/is", $replacement, $str); That should mean its only checking things with a word boundary before and around it (such as space, newline, -, commas, etc.) Quote Link to comment Share on other sites More sharing options...
Chappers Posted December 29, 2008 Author Share Posted December 29, 2008 Thanks for that. Before I use it, is there no way to use a kind of 'permitted' list of words instead, so any words in the permitted list, whether they match parameters of the swearword list or not, get ignored? Another issue I've just come across is that although the text in the comment field seems to be stored in the MySQL database with line breaks where they should be, upon retrieving the text from the database and echoing it onto the page, all the line breaks have gone and the whole text is just one long, long sentence. How might I get around that? Thanks again. Quote Link to comment Share on other sites More sharing options...
Mark Baker Posted December 29, 2008 Share Posted December 29, 2008 Another issue I've just come across is that although the text in the comment field seems to be stored in the MySQL database with line breaks where they should be, upon retrieving the text from the database and echoing it onto the page, all the line breaks have gone and the whole text is just one long, long sentence. How might I get around that?nl2br() before echoing to page Quote Link to comment Share on other sites More sharing options...
opalelement Posted December 29, 2008 Share Posted December 29, 2008 If you want a permitted words list, what I would do is something like this (untested) <?php function SwearFilter($str) { $cleanwords = array("class", "assistant", "massive"); $cleanreps = array("REP***1", "REP***2", "REP***3") foreach ($cleanwords as $cleanword) { $str = eregi_replace($cleanword, $cleanreps, $str); } $swearwords = array("arse", "ass", "bitch"); $replacement = "<span class='badwords'>[censored]</span>"; foreach ($swearwords as $swearword) { $str = eregi_replace($swearword, $replacement, $str); } foreach ($cleanwords as $cleanword) { $str = eregi_replace($cleanrep, $cleanwords, $str); } return $str; } if (isset($_POST['submit'])) { $comment = $_POST['comment']; $comment = SwearFilter($comment); echo $comment } ?> That will mark all whitelisted words to hide them from the censoring. Quote Link to comment Share on other sites More sharing options...
DeanWhitehouse Posted December 29, 2008 Share Posted December 29, 2008 You could just use str_ireplace(); Saves on all that code e.g. <?php $bad_sentence = "bad bad words here"; $bad_words = array("bad"); echo str_ireplace($bad_words,"*****",$bad_sentence); ?> A better example <?php function SwearFilter($str) { $swearwords = array("arse", "ass", "bitch"); $replacement = "<span class='badwords'>[censored]</span>"; return str_ireplace($swearwords,$replacement,$str); } if (isset($_POST['submit'])) { $comment = $_POST['comment']; $comment = SwearFilter($comment); echo $comment; } ?> Quote Link to comment Share on other sites More sharing options...
HoTDaWg Posted December 29, 2008 Share Posted December 29, 2008 here is my interpretation lol <?php print <<<html <form method="post"> <input type="text" name="sentence"> <input type="submit"> </form> html; if (isset($_POST['sentence'])) { $swears = array("fuck","shit","cunt","motherfucker","ass","bitch"); $sentence = $_POST['sentence']; $replacement = "<b> [censored] </b>"; foreach ($swears as $word) { $sentence = ereg_replace($word,$replacement,$sentence); } print $sentence; } ?> just something i whipped up out of curiousity lol Quote Link to comment Share on other sites More sharing options...
DeanWhitehouse Posted December 29, 2008 Share Posted December 29, 2008 Can i ask why people like to solve simple problems with overcomplicated solutions ?? Mine works as good as any others and is shorter and simpler Quote Link to comment Share on other sites More sharing options...
HoTDaWg Posted December 29, 2008 Share Posted December 29, 2008 i just wanted to swear haha Quote Link to comment Share on other sites More sharing options...
corbin Posted December 29, 2008 Share Posted December 29, 2008 blade, your code doesn't use word boundries, like someone could use with regex. harassment assigned .... so on Quote Link to comment Share on other sites More sharing options...
DeanWhitehouse Posted December 29, 2008 Share Posted December 29, 2008 Ok, then you could just do This way it will only remove the word if it is on its own, this way you won't need a word list of allowed words <?php function SwearFilter($str) { $swearwords = array("arse", "ass", "bitch"); $replacement = "<span class='badwords'>[censored]</span>"; return str_ireplace(" ".$swearwords." ",$replacement,$str); } if (isset($_POST['submit'])) { $comment = $_POST['comment']; $comment = SwearFilter($comment); echo $comment; } ?> Quote Link to comment Share on other sites More sharing options...
corbin Posted December 29, 2008 Share Posted December 29, 2008 You're an ass. (Sorry if that insulted you. Meant it in a funny way.) Quote Link to comment Share on other sites More sharing options...
DeanWhitehouse Posted December 29, 2008 Share Posted December 29, 2008 Quote Link to comment Share on other sites More sharing options...
Chappers Posted December 29, 2008 Author Share Posted December 29, 2008 Thanks everyone, plenty to think about there and try. The main reason I liked it grabbing words like 'ass' wherever they were in a word was because I didn't then have to add every possible combination to the filter list, like 'dumbass', 'asshole' and so on. Of course, it then grabs things like class, assign, assistant... so that'd mean making a permitted word list containing every possible word containing ass that's alright to let through. Either way, it's a task. Of course, if someone is set upon swearing, they'll get around it with things like 'a sshole' and so on, but I'm hoping just to limit the amount of opportunistic swearers who can't be bothered trying to outsmart the filter. It's either that or create a word list hundreds or thousand of words long, trying to consider every permutation of a word like 'b itch', 'bi tch', 'b i t c h' and ever on. So I think just trying to limit it is the only viable solution. I may try implementing the permitted words filter and just add words as they come up that are getting blocked but shouldn't be. Thanks again for so much help and some great suggestions! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.