Jump to content

Recommended Posts

Hi everyone,

 

Wonder if someone could help with my swear filter please, I've written it myself but now reached extent of my limited abilities...

 

This is shortened version of code:

<form method='post' action='<?php echo $_SERVER['PHP_SELF'] ?>'>
<table cellpadding='0' cellspacing='0' border='0'>
<tr><td><textarea name='comment' rows='4' cols='80'></textarea></td></tr>
<tr><td colspan='2' align='center'><input class='submitbutton' type='submit' name='submit' value='Post'></td></tr>
</table>
</form>

<?php
function SwearFilter($str) {
$swearwords = array("arse", "ass", "bitch");
$replacement = "<span class='badwords'>[censored]</span>";
foreach ($swearwords as $swearword) {
$str = eregi_replace($swearword, $replacement, $str);
}
return $str;
}

if (isset($_POST['submit'])) {
$comment = $_POST['comment'];
$comment = SwearFilter($comment);
echo $comment
}
?>

Trouble with this is that a word from the swearword list can be contained within another word and still be found, i.e. class becomes "cl[censored]". I wasn't overly bothered about that until I found one result of this is that should the first word in the swearlist checked against the comment data find a match (so it finds "arse"), the string then has <span class='badwords'>[censored]</span> added to it, so when the function moves to the next word in the swearlist, which is "ass", it finds a match in "class" from the previous match and changes "<span class='badwords'>[censored]</span>" into "[censored]='badwords'>[censored]</span>".

 

I realise I could simply put "ass" as the first word in the swearlist and then only people submitting the form with ass in it at any point gets censored rather than my "span class..." but what would be even better is if I could add a list of permitted words. The other option is to ensure only whole words matching those in my list get grabbed, but I'd rather have a permitted words list if that's possible?

 

Thanks for any help or advice!

Link to comment
https://forums.phpfreaks.com/topic/138678-swear-filter-help-pls/
Share on other sites

Try replacing this:

$str = eregi_replace($swearword, $replacement, $str);

with this:

$str = pregi_replace("/\b".$swearword."\b/is", $replacement, $str);

 

That should mean its only checking things with a word boundary before and around it (such as space, newline, -, commas, etc.)

Thanks for that. Before I use it, is there no way to use a kind of 'permitted' list of words instead, so any words in the permitted list, whether they match parameters of the swearword list or not, get ignored?

 

Another issue I've just come across is that although the text in the comment field seems to be stored in the MySQL database with line breaks where they should be, upon retrieving the text from the database and echoing it onto the page, all the line breaks have gone and the whole text is just one long, long sentence. How might I get around that?

 

Thanks again.

Another issue I've just come across is that although the text in the comment field seems to be stored in the MySQL database with line breaks where they should be, upon retrieving the text from the database and echoing it onto the page, all the line breaks have gone and the whole text is just one long, long sentence. How might I get around that?
nl2br() before echoing to page

If you want a permitted words list, what I would do is something like this (untested)

 

<?php
function SwearFilter($str) {
$cleanwords = array("class", "assistant", "massive");
$cleanreps = array("REP***1", "REP***2", "REP***3")
foreach ($cleanwords as $cleanword) {
$str = eregi_replace($cleanword, $cleanreps, $str);
}
$swearwords = array("arse", "ass", "bitch");
$replacement = "<span class='badwords'>[censored]</span>";
foreach ($swearwords as $swearword) {
$str = eregi_replace($swearword, $replacement, $str);
}
foreach ($cleanwords as $cleanword) {
$str = eregi_replace($cleanrep, $cleanwords, $str);
}
return $str;
}

if (isset($_POST['submit'])) {
$comment = $_POST['comment'];
$comment = SwearFilter($comment);
echo $comment
}
?>

 

That will mark all whitelisted words to hide them from the censoring.

You could just use str_ireplace();

 

Saves on all that code :)

 

e.g.

 

<?php
$bad_sentence = "bad bad words here";
$bad_words = array("bad");
echo str_ireplace($bad_words,"*****",$bad_sentence);
?>

 

A better example

 

<?php
function SwearFilter($str)
{
  $swearwords = array("arse", "ass", "bitch");
  $replacement = "<span class='badwords'>[censored]</span>";

return str_ireplace($swearwords,$replacement,$str);
}

if (isset($_POST['submit'])) 
{
  $comment = $_POST['comment'];
  $comment = SwearFilter($comment);

  echo $comment;
}
?>

here is my interpretation lol

<?php
print <<<html
<form method="post">
<input type="text" name="sentence">
<input type="submit">
</form>
html;
if (isset($_POST['sentence']))
{
$swears = array("fuck","shit","cunt","motherfucker","ass","bitch");
$sentence = $_POST['sentence'];
$replacement = "<b> [censored] </b>";
foreach ($swears as $word)
{
	$sentence = ereg_replace($word,$replacement,$sentence);
}
print $sentence;
}
?>

just something i whipped up out of curiousity lol

 

 

Ok, then you could just do

This way it will only remove the word if it is on its own, this way you won't need a word list of allowed words

<?php
function SwearFilter($str)
{
  $swearwords = array("arse", "ass", "bitch");
  $replacement = "<span class='badwords'>[censored]</span>";

return str_ireplace(" ".$swearwords." ",$replacement,$str);
}

if (isset($_POST['submit'])) 
{
  $comment = $_POST['comment'];
  $comment = SwearFilter($comment);

  echo $comment;
}
?>

Thanks everyone, plenty to think about there and try. The main reason I liked it grabbing words like 'ass' wherever they were in a word was because I didn't then have to add every possible combination to the filter list, like 'dumbass', 'asshole' and so on. Of course, it then grabs things like class, assign, assistant... so that'd mean making a permitted word list containing every possible word containing ass that's alright to let through. Either way, it's a task. Of course, if someone is set upon swearing, they'll get around it with things like 'a sshole' and so on, but I'm hoping just to limit the amount of opportunistic swearers who can't be bothered trying to outsmart the filter. It's either that or create a word list hundreds or thousand of words long, trying to consider every permutation of a word like 'b itch', 'bi tch', 'b i t c h' and ever on. So I think just trying to limit it is the only viable solution. I may try implementing the permitted words filter and just add words as they come up that are getting blocked but shouldn't be.

 

Thanks again for so much help and some great suggestions!

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.