TEENFRONT Posted April 1, 2010 Share Posted April 1, 2010 Hey All, Just in need of some regex help. i have a chat system, in which i want to clean the string before outputting it to the browser. Its been working but recently people have started flooding the chat with "blank messages" using special characters and/or things like this " " which when copied a pasted into the chat box a few times will flood the chat with a blank message. Iv just tried adding something in to stop this from happening am i think im halfway there. i have this $str = preg_replace("/[^a-z\d.,-@?*()_]/i",'',$str); when i do this $str = " "; $str = preg_replace("/[^a-z\d.,-@?*()_]/i",'',$str); It returns this 10; most concerning of the above is that it allows ; although i dont say its an allowed character??? so its only cleaning half of it. Basically, i only ever want to allow these characters. a-Z (upper and lower) 0-9 .,@()!*-_ How would i apply the above to a propper regex? or does anyone else have anything to clean input for chat systems and only allow the basic characters. Many thanks! Quote Link to comment Share on other sites More sharing options...
salathe Posted April 1, 2010 Share Posted April 1, 2010 Everything is OK except you're tripping over a common mistake with hypens (-) in character classes ([...]). A similar issue occurred in another thread today, see my reply there. After reading that, note that the range ,-@ includes (amongst others) the semicolon character. Quote Link to comment Share on other sites More sharing options...
TEENFRONT Posted April 1, 2010 Author Share Posted April 1, 2010 Ah man so simple lolol. Thank you very much. just tested and it works. Great stuff. Quick addition if i may. it still returns "10" if i pass it the annoying blank space character " " ... which obviosuly is correct, but how do i just get it to not show anything if this is found in the string? or remove the entire " " so if the string was " hello everybody hello again" .. how would i get it to just return "hello everybody hell again" instead of "hellow everybody 10 10 10 10 hello again" ? Many thanks! Quote Link to comment Share on other sites More sharing options...
salathe Posted April 1, 2010 Share Posted April 1, 2010 You will probably want to convert the HTML entities (&...;) into actual characters ( is the newline character) then your regex will work with them without any changes. One way to do this conversion is to use html_entity_decode. Also, the chances are that the values are not being submitted as but as newlines (i.e. the character you get by pressing the carriage return or enter keys) so depending on how your script is written, conversion back to non-entities might not be necessary. Edits: forum software does not seem to like my typing the HTML entities out...hmm. Quote Link to comment Share on other sites More sharing options...
TEENFRONT Posted April 1, 2010 Author Share Posted April 1, 2010 ah yes yet again mate thank you very much. I added the html_entities check and it now works as it should do, coz i already clean the \r return character. Thanks man. Wonderful help. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.