Jump to content

Cleaning this string


TEENFRONT

Recommended Posts

Hey All,

 

Just in need of some regex help.  i have a chat system, in which i want to clean the string before outputting it to the browser. Its been working but recently people have started flooding the chat with "blank messages" using special characters and/or things like this

 

"
"  which when copied a pasted into the chat box a few times will flood the chat with a blank message.

 

Iv just tried adding something in to stop this from happening am i think im halfway there. i have this

 

$str = preg_replace("/[^a-z\d.,-@?*()_]/i",'',$str);

 

when i do this

 

$str = "
";
$str = preg_replace("/[^a-z\d.,-@?*()_]/i",'',$str);

 

It returns this

 

10;

 

most concerning of the above is that it allows ; although i dont say its an allowed character???

 

so its only cleaning half of it.

 

Basically, i only ever want to allow these characters.

 

a-Z  (upper and lower)

0-9

.,@()!*-_

 

How would i apply the above to a propper regex? or does anyone else have anything to clean input for chat systems and only allow the basic characters.

 

Many thanks!

Link to comment
Share on other sites

Everything is OK except you're tripping over a common mistake with hypens (-) in character classes ([...]). A similar issue occurred in another thread today, see my reply there.

 

After reading that, note that the range ,-@ includes (amongst others) the semicolon character.

Link to comment
Share on other sites

Ah man so simple lolol. Thank you very much.

 

just tested and it works. Great stuff.

 

Quick addition if i may.

 

it still returns "10"  if i pass it the annoying blank space character "
"  ... which obviosuly is correct, but how do i just get it to not show anything if this is found in the string? or remove the entire "
"

 

so if the string was " hello everybody 
 
 
 
 hello again" .. how would i get it to just return "hello everybody hell again" instead of "hellow everybody 10 10 10 10 hello again" ?

 

Many thanks!

 

 

Link to comment
Share on other sites

You will probably want to convert the HTML entities (&...;) into actual characters (
 is the newline character) then your regex will work with them without any changes.  One way to do this conversion is to use html_entity_decode. Also, the chances are that the values are not being submitted as 
 but as newlines (i.e. the character you get by pressing the carriage return or enter keys) so depending on how your script is written, conversion back to non-entities might not be necessary.

 

Edits: forum software does not seem to like my typing the HTML entities out...hmm.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.