billo Posted September 17, 2009 Share Posted September 17, 2009 I have an online form where my users paste long text that they have written in MS Word. The text needs to be "cleaned" of any and all weird non standard characters which somehow find their way in. I must allow some things like accents eg. é and ê but any unprintable characters need to be weeded out. I have the following statement which mostly works: if(preg_match_all("/[^a-z0-9 \$_.,'()?!:é\/;\&\#…\%\–]/i",$fldVal,$invalid)) However, if I look at data which has successfully been posted to the database, I found this text when viewing the data in an editable textbox via phpMyAdmin: deep–fried eggplant with miso ‎sauce I think phpMyAdmin is converting a strange (arabic perhaps?) character into it's html special char code just for display purposes and the actual single character is sitting in the database. My preg statement doesn't seem to detect it tho? So, all I am looking to do is check the text for valid/normal characters and reject it if it contains anything that you don't see on a standard keyboard (with a few exceptions like those accented characters). Appreciate any help/advice you might offer. Link to comment https://forums.phpfreaks.com/topic/174526-cleaning-mysql-data/ Share on other sites More sharing options...
Handy PHP Posted September 18, 2009 Share Posted September 18, 2009 Well, the regular expression you are looking for is this: (&#[0-9]+ Which should match 〹 However, this: (&#[^;]+ should match &#almost_anything; Hope this helps, Handy PHP Link to comment https://forums.phpfreaks.com/topic/174526-cleaning-mysql-data/#findComment-920511 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.