spookztar Posted October 16, 2007 Share Posted October 16, 2007 Hi guys, I trying to create at piece of regex to validate filenames of users' multimedia files. What symbols do I need to ban to be on the safe side? So far, I have: preg_match('/^.{1,80}(\.[[:alpha:]]{1,5})$/', $filename) I need to somehow implement banning of spaces and slashes, but how, and what else? I assume it's at least something like this... [^[:blank:]|/] Bye, Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/ Share on other sites More sharing options...
effigy Posted October 16, 2007 Share Posted October 16, 2007 What character set are you using? Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-371054 Share on other sites More sharing options...
spookztar Posted October 16, 2007 Author Share Posted October 16, 2007 I suppose that will vary as different users will be running this application on different servers. Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-371061 Share on other sites More sharing options...
effigy Posted October 16, 2007 Share Posted October 16, 2007 You're not funneling them all through one <meta> tag? And if you expect it to vary, I'm assuming you're using UTF-8? Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-371062 Share on other sites More sharing options...
spookztar Posted October 16, 2007 Author Share Posted October 16, 2007 I have a meta in the document itself saying: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> I just assume that's what you're fishing for? Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-371074 Share on other sites More sharing options...
effigy Posted October 17, 2007 Share Posted October 17, 2007 In that case I would use Unicode properties, unless you're going to be exclusive rather than inclusive (which is probably the case). What symbols do I need to ban to be on the safe side? I would assume whichever characters your operating system deems as improper: Filenames. Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-371523 Share on other sites More sharing options...
spookztar Posted October 18, 2007 Author Share Posted October 18, 2007 Ok. So "/" and spaces are bad in UNIX filenames. Anything else? Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-372439 Share on other sites More sharing options...
effigy Posted October 18, 2007 Share Posted October 18, 2007 Actually, anything is valid except a /, but using certain characters may make working in the shell a cumbersome task. According to "The Complete Unix Reference" (1999), the following should be avoided: ! # & ( ) ' " ; | < > @ $ ^ { } * ? \ (space) (tab) (backspace) Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-372472 Share on other sites More sharing options...
spookztar Posted October 18, 2007 Author Share Posted October 18, 2007 So to make a symbol ban-list, I would do something like... this?: [^[:blank:]\.|!|#|&|(|)|'|"|;|\||<|>|@|\$|\^|\{|\}|\*|\?|\] An OR (|) between each undesired symbol, backslashing all regex special characters? Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-372675 Share on other sites More sharing options...
effigy Posted October 18, 2007 Share Posted October 18, 2007 Not quite. PREG does not support the [[:...:]] syntax. Also, when you're within a character class, OR is implied. <pre> <?php $chars = array( '!', '#', '&', '(', ')', "'", '"', ';', '|', '<', '>', '@', '$', '^', '{', '}', '*', '?', '\\', ' ', "\t", pack('C', 0x08) ); foreach ($chars as $char) { echo "$char => ", preg_match('~[\s\b/!#&()\'";|<>@$^{}*?\\\]~', $char) ? 'Bad' : 'OK' ; echo '<br>'; } ?> </pre> Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-372701 Share on other sites More sharing options...
kratsg Posted October 19, 2007 Share Posted October 19, 2007 Why not just do it backwards? Search for everything except what's allowed. IE: $pattern = "/[^a-zA-Z0-9]*/" If that pattern matches any part of it, it contains unacceptable characters :-D (matches everything except a-zA-Z0-9) If that makes sense? This way, you don't have to worry about unicode settings as if you're just allowing alphanumeric characters only (which I prefer, you could also allow the underscore I believe) Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-372929 Share on other sites More sharing options...
effigy Posted October 19, 2007 Share Posted October 19, 2007 Because you'll miss characters like -, ß, :, ½, and +. Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-373166 Share on other sites More sharing options...
spookztar Posted October 19, 2007 Author Share Posted October 19, 2007 I think doing a check with an IF such as this; if (!preg_match('/[A-Za-z0-9-_]{1,80}(\.[A-Za-z]{1,5})/', $filename)) Die('your filename contains forbidden characters'); - Just might have to do then... I'm also considering banning certain extensions by using substr(). Apart from .js and .jse, what other extensions would be wise to ban on a Linux server? Link to comment https://forums.phpfreaks.com/topic/73544-banning-symbols/#findComment-373484 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.