spookztar Posted October 16, 2007 Share Posted October 16, 2007 Hi guys, I trying to create at piece of regex to validate filenames of users' multimedia files. What symbols do I need to ban to be on the safe side? So far, I have: preg_match('/^.{1,80}(\.[[:alpha:]]{1,5})$/', $filename) I need to somehow implement banning of spaces and slashes, but how, and what else? I assume it's at least something like this... [^[:blank:]|/] Bye, Quote Link to comment Share on other sites More sharing options...
effigy Posted October 16, 2007 Share Posted October 16, 2007 What character set are you using? Quote Link to comment Share on other sites More sharing options...
spookztar Posted October 16, 2007 Author Share Posted October 16, 2007 I suppose that will vary as different users will be running this application on different servers. Quote Link to comment Share on other sites More sharing options...
effigy Posted October 16, 2007 Share Posted October 16, 2007 You're not funneling them all through one <meta> tag? And if you expect it to vary, I'm assuming you're using UTF-8? Quote Link to comment Share on other sites More sharing options...
spookztar Posted October 16, 2007 Author Share Posted October 16, 2007 I have a meta in the document itself saying: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> I just assume that's what you're fishing for? Quote Link to comment Share on other sites More sharing options...
effigy Posted October 17, 2007 Share Posted October 17, 2007 In that case I would use Unicode properties, unless you're going to be exclusive rather than inclusive (which is probably the case). What symbols do I need to ban to be on the safe side? I would assume whichever characters your operating system deems as improper: Filenames. Quote Link to comment Share on other sites More sharing options...
spookztar Posted October 18, 2007 Author Share Posted October 18, 2007 Ok. So "/" and spaces are bad in UNIX filenames. Anything else? Quote Link to comment Share on other sites More sharing options...
effigy Posted October 18, 2007 Share Posted October 18, 2007 Actually, anything is valid except a /, but using certain characters may make working in the shell a cumbersome task. According to "The Complete Unix Reference" (1999), the following should be avoided: ! # & ( ) ' " ; | < > @ $ ^ { } * ? \ (space) (tab) (backspace) Quote Link to comment Share on other sites More sharing options...
spookztar Posted October 18, 2007 Author Share Posted October 18, 2007 So to make a symbol ban-list, I would do something like... this?: [^[:blank:]\.|!|#|&|(|)|'|"|;|\||<|>|@|\$|\^|\{|\}|\*|\?|\] An OR (|) between each undesired symbol, backslashing all regex special characters? Quote Link to comment Share on other sites More sharing options...
effigy Posted October 18, 2007 Share Posted October 18, 2007 Not quite. PREG does not support the [[:...:]] syntax. Also, when you're within a character class, OR is implied. <pre> <?php $chars = array( '!', '#', '&', '(', ')', "'", '"', ';', '|', '<', '>', '@', '$', '^', '{', '}', '*', '?', '\\', ' ', "\t", pack('C', 0x08) ); foreach ($chars as $char) { echo "$char => ", preg_match('~[\s\b/!#&()\'";|<>@$^{}*?\\\]~', $char) ? 'Bad' : 'OK' ; echo '<br>'; } ?> </pre> Quote Link to comment Share on other sites More sharing options...
kratsg Posted October 19, 2007 Share Posted October 19, 2007 Why not just do it backwards? Search for everything except what's allowed. IE: $pattern = "/[^a-zA-Z0-9]*/" If that pattern matches any part of it, it contains unacceptable characters :-D (matches everything except a-zA-Z0-9) If that makes sense? This way, you don't have to worry about unicode settings as if you're just allowing alphanumeric characters only (which I prefer, you could also allow the underscore I believe) Quote Link to comment Share on other sites More sharing options...
effigy Posted October 19, 2007 Share Posted October 19, 2007 Because you'll miss characters like -, ß, :, ½, and +. Quote Link to comment Share on other sites More sharing options...
spookztar Posted October 19, 2007 Author Share Posted October 19, 2007 I think doing a check with an IF such as this; if (!preg_match('/[A-Za-z0-9-_]{1,80}(\.[A-Za-z]{1,5})/', $filename)) Die('your filename contains forbidden characters'); - Just might have to do then... I'm also considering banning certain extensions by using substr(). Apart from .js and .jse, what other extensions would be wise to ban on a Linux server? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.