Jump to content

bad word filter for guestbook


staples27

Recommended Posts

So I have a guestbook, everything's working fine but I'd like to include a filter to replace all the bad words posted by users, I've tried various tutorials but I just can't get them to work, can anyone recommend a code or tutorials that I could use? I want the badwords to relate to those contained in a MySQL database table, which I have already set up.

 

The code for the guestbook is as follows:

 

<?php
// include the database configuration and
// open connection to database
include ('dbc.php'); 


if(isset($_POST['btnSign']))
{
// get the input from $_POST variable
// trim all input to remove extra spaces
$name    = trim($_POST['txtName']);
$email   = trim($_POST['txtEmail']);
$url     = trim($_POST['txtUrl']);
$message = trim($_POST['mtxMessage']);

// escape the message ( if it's not already escaped )
if(!get_magic_quotes_gpc())
{
	$name    = addslashes($name);
	$message = addslashes($message);
}

// if the visitor do not enter the url
// set $url to an empty string
if ($url == 'http://')
{
	$url = '';
}

// prepare the query string
$query = "INSERT INTO guestbook2 (name, email, url, message, entry_date) " .
         "VALUES ('$name', '$email', '$url', '$message', current_date)";

// execute the query to insert the input to database
// if query fail the script will terminate		 
mysql_query($query) or die('Error, query failed. ' . mysql_error());

// redirect to current page so if we click the refresh button 
// the form won't be resubmitted ( as that would make duplicate entries )

header('Location: ' . $_SERVER['REQUEST_URI']);

// force to quite the script. if we don't call exit the script may
// continue before the page is redirected
exit;
}
?>

<?php


// =======================
// Show guestbook entries
// =======================


// how many guestbook entries to show per page
$rowsPerPage = 10;

// by default we show first page
$pageNum = 1;

// if $_GET['page'] defined, use the value as page number
if(isset($_GET['page']))
{
$pageNum = $_GET['page'];
}

// counting the offset ( where to start fetching the entries )
$offset = ($pageNum - 1) * $rowsPerPage;

// prepare the query string
$query = "SELECT id, name, email, url, message, DATE_FORMAT(entry_date, '%d.%m.%Y') ".
         "FROM guestbook2 ".
	 "ORDER BY id DESC ".            // using ORDER BY to show the most current entry first
	 "LIMIT $offset, $rowsPerPage";  // LIMIT is the core of paging

$result=mysql_query($query);
// execute the query 
$result = mysql_query($query) or die('Error, query failed. ' . mysql_error());

// if the guestbook is empty show a message
if(mysql_num_rows($result) == 0)
{
?>
<div id="empty">
The guestbook is empty, be first to sign it!</div>
<?php
}
else
{
// get all guestbook entries
while($row = mysql_fetch_array($result))
{
	// list() is a convenient way of assign a list of variables
	// from an array values 
	list($id, $name, $email, $url, $message, $date) = $row;

	// change all HTML special characters,
	// to prevent some nasty code injection
	$name    = htmlspecialchars($name);
	$message = htmlspecialchars($message);		

	// convert newline characters ( \n OR \r OR both ) to HTML break tag ( <br> )
	$message = nl2br($message);
?>

<div class="messagec">
<div class="userinfo">
Post By <a href="mailto:<?=$email;?>" class="email">

   <?=$name;?>
   </a> <br /><br />
   Date Posted: <?=$date;?><br /><br />
   
   
   <?php
   		// if the visitor input homepage url show it
	if($url != '')
	{   
		// make the url clickable by formatting it as HTML link
		$url = "<a href='$url' target='_blank'>$url</a>";
?>
Website : <?=$url;?>
   <?php
	}
?></div>

   <div class="msg"><?=$message;?></div>




</div>

<?php
} // end while

// below is the code needed to show page numbers

// count how many rows we have in database
$query   = "SELECT COUNT(id) AS numrows FROM guestbook2";
$result  = mysql_query($query) or die('Error, query failed. ' . mysql_error());
$row     = mysql_fetch_array($result, MYSQL_ASSOC);
$numrows = $row['numrows'];

// how many pages we have when using paging?
$maxPage  = ceil($numrows/$rowsPerPage);
$nextLink = '';

// show the link to more pages ONLY IF there are 
// more than one page
if($maxPage > 1)
{
// this page's path
$self     = $_SERVER['PHP_SELF'];

// we save each link in this array
$nextLink = array();

// create the link to browse from page 1 to page $maxPage
for($page = 1; $page <= $maxPage; $page++)
{
	$nextLink[] =  "<a href=\"$self?page=$page\">$page</a>";
}

// join all the link using implode() 
$nextLink = "Go to page : " . implode(' » ', $nextLink);
}

// close the database connection since
// we no longer need it


?>
<div id="next">
   <?=$nextLink;?>

<?php
}
?></div>

<div id="mform">
<p>Enter a message.</p>
<form method="post" name="guestform" action="gbook.php">

<div class="fieldc"> <div class="fieldname"> Name * </div>

<div class="input"><input name="txtName" type="text" id="txtName" value="<?php if (isset($_SESSION['user'])) { ?> <?php echo $_SESSION['user']; ?> <?php } ?>" size="30" maxlength="30">
</div> </div>

<div class="fieldc"><div class="fieldname">Email</div>
  <div class="input"><input name="txtEmail" type="text" id="txtEmail" size="30" maxlength="50"></div> </div>
  
<div class="fieldc"><div class="fieldname">Website URL</div>
<div class="input"><input name="txtUrl" type="text" id="txtUrl" value="http://" size="30" maxlength="50"></div> </div>

<div id="fieldmc">
<div class="fieldname">Message</div>
<div id="minput"><textarea name="mtxMessage" cols="30" rows="5" id="mtxMessage"></textarea>
</div>

<div id="sub"> <input name="btnSign" type="submit" id="btnSign" value="Sign Guestbook" onClick="return checkForm();"></div> </div>

</form>

 

Would appreciate all help, thanks!

Link to comment
Share on other sites

You can achieve this by using str_replace();  (see herefor info)

 

Simply create a while loop to step through all your bad words and use str_replace to remove them

 

e.g

$qh = mysql_query("SELECT * FROM badwords WHERE active='1'");
while($retd = mysql_fetch_array($qh)){
$message = str_replace($retd['badword'],"****",$message);
}

 

Stuie

Link to comment
Share on other sites

<?php

$badWords = array(
'badWord1',
'badWord2',
'badWord3',
'badWord4',
'badWord5'
);

$cleanReplacements = array();

foreach ($badWords as $word) {

     $cleanReplacements[] = '****';

}

$messageContent = str_replace($badWords, $cleanReplacements, $messageContent);

?>

 

That is a very rudimentary 'bad word filter' but it shows the basic principles.  Using str_replace will work in a lot of instances but it is more of an exacto knife than a broad sword.

 

If your looking to create a type of filter that will find words like 'ass' inside of words like 'passport' then you need to start learning something called Regular Expressions.

Link to comment
Share on other sites

Here is a word filter that I made, it works well: http://beta.phpsnips.com/snippet.php?id=42

 

The Little Guy, there's only one problem with the snippet you proposed... it will cause 'The Clbuttic Mistake'.

 

Case in point... I modified that snippet as such (adding the word ass to the mix, with the replacement of butt (which is the whole illustrative point of the Clbuttic Mistake) as well as changing the string being checked:

 

function censorWords($text){

$find = array(
	'/damn/i',
	'/shit/i',
	'/fuck/i',
	'/ass/i'
);
$replace = array(
	'dang',
	'shoot',
	'frick',
	'butt',
);
return preg_replace($find,$replace,$text);
}
$text = 'That asshole is an assassin!';
echo censorWords($text);

 

Do you see the problem? If going this route, I would consider making use of regex word boundaries as an extra precautionary measure. When dealing with words, one must be careful about partial and full word replacement.. incorrect calculations can really botch things. Sometimes partial words could probably be filtered out.. but extra care must be exercised as to which portions of which words get effected.

Link to comment
Share on other sites

I modified it some...

 

<?php
function censorWords($text){
$find = array(
	'/damn\s/i',
	'/shit\s/i',
	'/fuck\s/i',
	'/ass\s/i',
	'/asshole\s/i'
);
$replace = array(
	'dang',
	'shoot',
	'frick',
	'butt',
	'butthole'
);
return preg_replace($find,$replace,$text);
}
$text = 'That asshole is an assass!';
echo censorWords($text);
?>

Link to comment
Share on other sites

Instead of using a white space short hand character class as you have done, you should start considering using word bounderies \b.

The problem with say '/ass\s/i' is that this will only catch the 'ass' part if it is followed by a space. But what happens if the ass if the last word in a sentence?

 

'You're such an ass!'

 

Guess what? Your pattern won't find and replace that, because it is looking for ass followed by a space specifically.

I dug up my regex from my contact page and posted the gist of it here (this code is within about 2 years of age, so I should probably revise this at some point):

 

$text = preg_replace(array('#(\b)?fuck(?:ers?)?(?(1)\b)#i', '#\bbitch(?:y|es)?\b#i', '#\bbastard\b#i', '#(\b)?asshole(?(1)\b)#i', '#(\b)?whore(?(1)\b)#i', '#(\b)?cock(?(1)\b)#i',
			'#\banal\b#i', '#\banus\b#i', '#(\b)?cunt(?(1)\b)#i', '#\bpenis\b#i', '#\bshit(?:s|ty|tier|tiest)?\b#i', '#\bslut\b#i','#(\b)?dicks?(?(1)\b)#i', '#\bpuss(?:y|ies?)\b#i', '#\bcum\b#i',
			'#\bfag\b#i', '#\bfaggot\b#i', '#\bass\b#i', '#\bbullshit\b#i', '#\bprick\b#i', '#\btit(?:s|ty|ties?)?\b#i', '#\basswipes?\b#i'), '*', $text, -1 , $profanityCount);

 

In my contact page's case, I don't replace one word with another, but rather replace the offender outright with an asterisk. But basically what I've done here is look for complete (or in some case partial words) and replace them if it finds it (in the case of partial, it only replaces the offending part, not the complete word).

 

It is by no means a perfect solution, I'll be the first to admit it. While I did test it out *somewhat*, I could have perhaps spent some more time on it (so there will in all likelyhood be some words I didn't even think about, let alone unintentional traps for partial bad words - hence not bulletproof). But you'll get the idea.. so you'll notice that words start and end with \b (that is a word boundery in regex, which looks at a position between characters and checks to see if one side is a word character and not a word character on the other side... you can read about that in the pcre part of the manual) So stuff like 'ass!' will be found and replaced with '*!'. You'll also notice come words like bitch contain additional alternations like (?:y|es) to cover stuff like bitchy or bitches for example. Some words have captured conditionals like (\b)?whore(?(1)\b), what this basically does is treats it as either a full word on it's own, or checks to see if 'whore' is part of another word like 'whoreface!', in which case, it will still find and replace and as a result would become '*face!'.

 

Again, this is older code, and certainly not bullet proof, but may offer some more flexibility / insight into building something that will catch things that patterns like '/badword\s/i' wouldn't.

 

EDIT - I'm well aware of 'dick' and what happens if someone named Dick signs his name at the end of his message that it will replace it. He should use Richard instead!  :tease-03:

Link to comment
Share on other sites

  • 3 years later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.