Jump to content

PHP Bad-Words Filter


php_tom

Recommended Posts

Hey, everyone. I've been working on Aquarium, a PHP badwords filter.

Check out the site at http://aquarium-filter.sourceforge.net

On that page, you can try out the filter, or download the source and

check that out...

 

I realize there are other text filters out there, the reasons for making it were

  1. I can never find a bad words file to use, so I finally made one, it's

      included (encrypted so no one can read it easily) in the source.

      Maybe others will find this useful.

  2. I was trying to make something that filters words similar to

      badwords, not just bad words. So, for example, if 'badword' was

      in the bad-words list, it would be nice to have 'b@dword', 'bdword',

      and 'badwrd' filtered.

 

I'd love to hear some feedback from you all on the engine. I realize that it

does not filter words with whitespace in them, e.g. 'ba dw ord', that's

something I'm working on...

 

Try out the demo on the sourceforge page, post your comments here. If

you find a bad word that doesn't get filtered, I'd like to know... maybe you

can run PHP's base64_encode() on it <-- [so the forum admins don't get

angry] and post it here. Thanks!

Link to comment
Share on other sites

A small bug where capitalized words get through.

Should be fixed now...

Two of the words you posted ??? are still not filtered because they aren't in the library...

One will certainly be added, the other I'm not sure, I could see innocent uses of the same word.

Maybe I'll add a 'filter strength' setting...

Anyway, thanks.

Link to comment
Share on other sites

"Aquarium filtered 0 words in 0.01 seconds and found 0 bad words.

Stats: 0 words per second,

Warning: Division by zero in /home/groups/a/aq/aquarium-filter/htdocs/process.php on line 18

0% bad words.

 

Filtered Text:"

 

when I enter <"

Link to comment
Share on other sites

Filtered Text:

****

fucktard

*****

 

But then i inserted it like this:

 

****

****tard

*****

fuck

*****

 

And the previous word got filtered, but not the "simple" one.

 

Filtered Text:

****g

fuck

 

Filtered Text:

****g

****

**** fuck *****

 

Used F word in all, last one had an "i" in the end, just like the first had an "g".

Link to comment
Share on other sites

  • 3 weeks later...

uh mm how come u encrypted the bad words file, instead of keeping it open for adjustments, one option can be a request for addition of a bad word, which gets submitted for your approval and when it gets approved it gets added on the list so the more u use it the more it filters.

Link to comment
Share on other sites

Hey, thanks for all the suggestions guys. I'm working on a new version which can handle 1337, and things like "ba dw or d" or "word1word2". It also will look at the context a word is in, e.g. words in the sentence "he's a badwording badword, I hate the badword!" would get filtered, but the sentence "Jesus rode into Jerusalem on an ass" would not (because of word strength and frequency).

 

About the bad words file:

I keep it encrypted because I don't want someone to find a list of filthy language on my server in plain text. The 'encryption' (in case you haven't figured out from the code) is simply base64_encode(base64_encode(theEntireFileAsAString)); I'd like to make the filter smarter, rather than the dictionary larger, because even though I'm using a hashtable-type lookup with the dictionary, more words in the dictionary will still slow down the algorithm...

 

Please keep the suggestions coming, having input from the kind of people who might use this code is useful. Thanks!

Link to comment
Share on other sites

like i said in phphelp you can filter bad words that easy just think like your the user

 

you can do something like the ff:

 

Shi_@

F_U_C_ then you know

A*S* then you now H*O*L* then you know

 

or something like combination of numbers or letter

Link to comment
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.