NotionCommotion Posted December 16, 2021 Share Posted December 16, 2021 When a file is uploaded, $_FILES will be populated with the name, type, and size (which are all provided by the browser and in the body and not headers, right?) as well as the tmp_name and errors (which is presumably set by PHP). If browser provided size is different than what filesize() reports, should I care or just go with filesize()? What about similar question but for mime type? Some file types result in false positives such as the following and I will want to accept those as being valid, but should I reject them as being invalid if if they are actually different? Regarding detecting these multiple valid mime types, is there a PHP function to do so or any good composer/etc packages? Also, I am thinking I should never bother saving the browser provided mime type because it is based on the individual browser and/or operating system the user happened to be using at the time, agree? printf('extention: %s type: %s (provided) %s (fileinfo) FILEINFO_EXTENSION: %s<br>'.PHP_EOL, pathinfo($_FILES['expenseFile']['name'])['extension'], $_FILES['expenseFile']['type'], (new \finfo(FILEINFO_MIME_TYPE))->file($_FILES['expenseFile']['tmp_name']), (new \finfo(FILEINFO_EXTENSION))->file($_FILES['expenseFile']['tmp_name']) ); extention: csv type: application/vnd.ms-excel (provided) application/csv (fileinfo) FILEINFO_EXTENSION: ??? extention: gz type: application/x-gzip (provided) application/gzip (fileinfo) FILEINFO_EXTENSION: ??? extention: js type: text/javascript (provided) text/plain (fileinfo) FILEINFO_EXTENSION: ??? extention: css type: text/css (provided) text/plain (fileinfo) FILEINFO_EXTENSION: ??? extention: yaml type: application/octet-stream (provided) text/plain (fileinfo) FILEINFO_EXTENSION: ??? extention: ini type: application/octet-stream (provided) text/plain (fileinfo) FILEINFO_EXTENSION: ??? There is also the issue of having file extensions that matches the actual file type and I wish to reject those that do not. finfo's FILEINFO_EXTENSION constant provides solutions for some but very few at least with my version of magic.mime database. Any good approaches or 3rd party packages that can manage this? extention: ods type: application/vnd.oasis.opendocument.spreadsheet (provided) application/vnd.oasis.opendocument.spreadsheet (fileinfo) FILEINFO_EXTENSION: ods extention: png type: image/png (provided) image/png (fileinfo) FILEINFO_EXTENSION: png extention: jpg type: image/jpeg (provided) image/jpeg (fileinfo) FILEINFO_EXTENSION: jpeg/jpg/jpe/jfif Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/314325-validating-file-uploads/ Share on other sites More sharing options...
requinix Posted December 16, 2021 Share Posted December 16, 2021 4 hours ago, NotionCommotion said: If browser provided size is different than what filesize() reports, should I care or just go with filesize()? It can't be: the size is not just the size of the file but the amount of content that the browser sent to the server. If this did not match what the request actually had then there would have been problems. 4 hours ago, NotionCommotion said: What about similar question but for mime type? That's the big one. MIME type detection is naive and optimistic: it assumes that if the file has a few bytes in a certain location then the entire file is that one type. It won't be able to detect files with mixed content (think PHP code buried in the middle of some HTML) or files using containers (OpenDocument files are ZIP archives) or many types of text file formats. It can accurately detect audio and video data as well as "unique" binary formats. That's where you have to enter with some specific knowledge to make decisions. 4 hours ago, NotionCommotion said: Some file types result in false positives such as the following and I will want to accept those as being valid, but should I reject them as being invalid if if they are actually different? The detected types are correct, they're just not what you expected or wanted. Windows particularly tends to identify files by extension, then equate those extensions with MIME types according to whatever software is installed. For example, having Office/Excel will tell the system that .csv files are vnd-ms.excel because... well, because that's what it's been doing for a very long time, but point is that a Windows browser will happily report vnd.ms-excel because that's what it knows the file as. That's especially useful for text files. Linux too will frequently deem a file a certain type according to the extension and only use MIME detection as a fallback. And I agree with that. It's a huge pain to try to deduce MIME type or the correct file extension just from the contents. So don't do that. Instead, in the general case, validate that the MIME type you detect is consistent with the extension - and optionally with the reported MIME type. (That's the general case. For more specific cases, like you only want to support images, sometimes it can be done reliably with only MIME types.) And above all else, if you want to store arbitrary files, install a virus scanner or two. 4 hours ago, NotionCommotion said: Also, I am thinking I should never bother saving the browser provided mime type because it is based on the individual browser and/or operating system the user happened to be using at the time, agree? Mostly disagree. While you should assume the client is malicious, in the real world that's very often not the case, and throwing away data because it might be incorrect is hurting youself. 4 hours ago, NotionCommotion said: There is also the issue of having file extensions that matches the actual file type and I wish to reject those that do not. finfo's FILEINFO_EXTENSION constant provides solutions for some but very few at least with my version of magic.mime database. Any good approaches or 3rd party packages that can manage this? extention: ods type: application/vnd.oasis.opendocument.spreadsheet (provided) application/vnd.oasis.opendocument.spreadsheet (fileinfo) FILEINFO_EXTENSION: ods extention: png type: image/png (provided) image/png (fileinfo) FILEINFO_EXTENSION: png extention: jpg type: image/jpeg (provided) image/jpeg (fileinfo) FILEINFO_EXTENSION: jpeg/jpg/jpe/jfif But how do you know it does not match? It's easy to pick examples like images, but what about HTML with some PHP code buried in the middle? You'll receive a .php extension but detection will say it's .htm/html. Quote Link to comment https://forums.phpfreaks.com/topic/314325-validating-file-uploads/#findComment-1592786 Share on other sites More sharing options...
NotionCommotion Posted December 17, 2021 Author Share Posted December 17, 2021 2 hours ago, requinix said: It can't be: the size is not just the size of the file but the amount of content that the browser sent to the server. If this did not match what the request actually had then there would have been problems. There is both the Content-Length in the request header and the size value in $_FILES. Aren't they two separate things? 2 hours ago, requinix said: That's the big one. MIME type detection is naive and optimistic: it assumes that if the file has a few bytes in a certain location then the entire file is that one type. It won't be able to detect files with mixed content (think PHP code buried in the middle of some HTML) or files using containers (OpenDocument files are ZIP archives) or many types of text file formats. It can accurately detect audio and video data as well as "unique" binary formats. The detected types are correct, they're just not what you expected or wanted. And I agree with that. It's a huge pain to try to deduce MIME type or the correct file extension just from the contents. So don't do that. Instead, in the general case, validate that the MIME type you detect is consistent with the extension - and optionally with the reported MIME type. (That's the general case. For more specific cases, like you only want to support images, sometimes it can be done reliably with only MIME types.) But how do you know it does not match? It's easy to pick examples like images, but what about HTML with some PHP code buried in the middle? You'll receive a .php extension but detection will say it's .htm/html. My purpose is to allow a user (organization) to limit the types of files outside users can upload based on the software the user/organization has. Almost everyone has software for PDF's, various images, various Microsoft documents, etc, but there is also file types such as AutoCAD, various BIM formats, and others. ZIP archives will need to be supported to allow OpenDOcument files and they add some complexity as they contain other files, but suppose they can be opened and inspected prior to saving. Regarding validating that the detected MIME type is consistent with the extension, seems like this is a common need and there would be some de facto standard opensource package but I haven't found it. Will have to give this one more thought... 2 hours ago, requinix said: Mostly disagree. While you should assume the client is malicious, in the real world that's very often not the case, and throwing away data because it might be incorrect is hurting youself. Guess I can store it but don't know what to do with it. When later providing the file for download, would I want to use this value or the detected value? What if two identical files were uploaded but with different clients and were given different MIME types? Would I return them with different MIME types? Quote Link to comment https://forums.phpfreaks.com/topic/314325-validating-file-uploads/#findComment-1592790 Share on other sites More sharing options...
requinix Posted December 17, 2021 Share Posted December 17, 2021 24 minutes ago, NotionCommotion said: There is both the Content-Length in the request header and the size value in $_FILES. Aren't they two separate things? The Content-Length in the request header (if there even is one) does not describe the file. It describes the entire request. Take a look at how multipart/form-data requests are structured and that might help explain what's going on.https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST 24 minutes ago, NotionCommotion said: Regarding validating that the detected MIME type is consistent with the extension, seems like this is a common need and there would be some de facto standard opensource package but I haven't found it. Could very well be. But these things are also frequently dependent upon the application itself. Maybe what you need is not so much a library but a curated database you can read. 24 minutes ago, NotionCommotion said: Guess I can store it but don't know what to do with it. When later providing the file for download, would I want to use this value or the detected value? Assuming you validated that the provided type was correct, because if not then you shouldn't be storing it at all, then you would use it instead of whatever type you tried to guess it was. 32 minutes ago, NotionCommotion said: What if two identical files were uploaded but with different clients and were given different MIME types? Would I return them with different MIME types? Sure. Why would it matter if they were different? Quote Link to comment https://forums.phpfreaks.com/topic/314325-validating-file-uploads/#findComment-1592795 Share on other sites More sharing options...
NotionCommotion Posted December 19, 2021 Author Share Posted December 19, 2021 On 12/16/2021 at 5:49 PM, requinix said: Maybe what you need is not so much a library but a curated database you can read. Yes, I think so. Any suggestions on where to find one? On 12/16/2021 at 5:49 PM, requinix said: Assuming you validated that the provided type was correct, because if not then you shouldn't be storing it at all, then you would use it instead of whatever type you tried to guess it was. Sure. Why would it matter if they were different? Thank you, I was originally thinking differently, but now fully agree. Quote Link to comment https://forums.phpfreaks.com/topic/314325-validating-file-uploads/#findComment-1592838 Share on other sites More sharing options...
requinix Posted December 21, 2021 Share Posted December 21, 2021 On 12/19/2021 at 6:49 AM, NotionCommotion said: Yes, I think so. Any suggestions on where to find one? No clue. Quote Link to comment https://forums.phpfreaks.com/topic/314325-validating-file-uploads/#findComment-1592884 Share on other sites More sharing options...
NotionCommotion Posted December 31, 2021 Author Share Posted December 31, 2021 On 12/16/2021 at 5:49 PM, requinix said: Maybe what you need is not so much a library but a curated database you can read. On 12/19/2021 at 6:49 AM, NotionCommotion said: Yes, I think so. Any suggestions on where to find one? On 12/20/2021 at 4:46 PM, requinix said: No clue. While not curated, I suppose I could build my own using https://www.iana.org/assignments/media-types/media-types.xhtml as a reference, or perhaps https://github.com/jshttp/mime-db will be a better starting point. I seems, however, that this would be a fairly common need in PHP applications and there would be some composer package which would be easier to maintain. Quote Link to comment https://forums.phpfreaks.com/topic/314325-validating-file-uploads/#findComment-1593090 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.