Jump to content

Document upload security?


SteamingAlong

Recommended Posts

Hi there,

 

I have a form that allows the user to upload doc and pdf formatted files. I was hoping that I didn't have to go this route but it has to be done. What I have done for the upload is

  1. check it's extension is a doc or pdf
  2. check it's mime type via finfo_open
  3. check it's size
  4. renamed the file completely with proper extension added
  5. move file to outside of root folder

Now, the file is displayed back to the user from a file_get_contents function and using the following code.

<?php
// $doc_1 is coming from the database
$fullpath = '/home/myfolder/';
$doc_2 = filter_var($doc_1, FILTER_VALIDATE_INT);
if (!$doc_2 || $doc_2 === '') {
	header("Location: /myerror/");
	die;
} else {
	$doc_path = $fullpath.'documents/mydocument'.$doc_2.'.doc';
	$doc_path_realpath = realpath($doc_path);
	if (file_exists($doc_path_realpath) && is_readable($doc_path_realpath)) {
		$doc_data = file_get_contents($doc_path_realpath);
	} else {
		header("Location: /myerror/");
		die;
	}
}
?>

Is there a better security for displaying this back to the user or is this enough?

Link to comment
Share on other sites

You may also want to do the same checks (extension, mime type) before outputting content of files stored on server. It will add a bit extra of security in case files that were validated previously (and stored on your server now) were somehow modified/replaced by 3rd parties later.

Link to comment
Share on other sites

You may also want to do the same checks (extension, mime type) before outputting content of files stored on server. It will add a bit extra of security in case files that were validated previously (and stored on your server now) were somehow modified/replaced by 3rd parties later.

 

That is interesting phpmillion, I never thought of that check but you are right ... it is useful just in case it was modified without me knowing. I will add that in, thanks for the heads up.

 

 

If the user has to be a "registered user", you do have checks in place for that? And for CSRF, etc?

 

dalecosp, I do have that addition check in place of the form like the token validation for both the id and value of the hidden input.

 

Apart from those 2 answers, I was hoping there would be something before the file is uploaded. Maybe something similar like a virus scan of the file if possible before it is uploaded. This is to stop any vulnerability being added to the server. As with an image, you can stop this with a copy resampled to clear any hidden executions behind the old image and having it renamed. But with a doc or pdf I am hoping there would be something similar to the image way. That is where I am stuck as I feel what I have is not enough.

Link to comment
Share on other sites

I don't think PHP has more other options to check file security. However, you can always send file (before it's stored on your server) to Vrus Total using their API - https://developers.virustotal.com/v2.0/reference

 

If your server has something like ClamAV installed, it also helps. ClamAV has a PHP extension too, but I think it only works with PHP 5.x

Link to comment
Share on other sites

I don't think PHP has more other options to check file security. However, you can always send file (before it's stored on your server) to Vrus Total using their API - https://developers.virustotal.com/v2.0/reference

 

If your server has something like ClamAV installed, it also helps. ClamAV has a PHP extension too, but I think it only works with PHP 5.x

You could call the CLI version via system(), etc.

Link to comment
Share on other sites

Word doc and pdf files are definitely problematic because they can (by intention) have embedded code and/or links to external resources. I don't think any of the above measures would protect against any malicious such data. I suppose some virus scanners may catch some, but definitely not all. For example, a virus scanner is not going to check every hyperlink in a document to see if it is pointing to a malicious site.

 

One "possible" option to be 100% certain, would be to use a plug-in or COM object (or even something you can call from a command line) to "create" PDF files. When someone uploads a doc or pdf file, send the document to the process to create a PDF with the appropriate parameters to ensure it creates a non-interactive PDF (no links, no fillable forms, etc.). You should be left with a simple, flat PDF that has no code. The only problem might be licensing. Many years ago I was involved with a web application where we used a server-side component to generate PDF files from PostScript code. It was quite expensive. If this is for non-commercial purposes you may be able to find something free or cheap.

Link to comment
Share on other sites

Word doc and pdf files are definitely problematic because they can (by intention) have embedded code and/or links to external resources. I don't think any of the above measures would protect against any malicious such data. I suppose some virus scanners may catch some, but definitely not all. For example, a virus scanner is not going to check every hyperlink in a document to see if it is pointing to a malicious site.

 

This was my biggest worry because I haven't found anything that is actually secure for these documents uploaded.

 

 

One "possible" option to be 100% certain, would be to use a plug-in or COM object (or even something you can call from a command line) to "create" PDF files. When someone uploads a doc or pdf file, send the document to the process to create a PDF with the appropriate parameters to ensure it creates a non-interactive PDF (no links, no fillable forms, etc.). You should be left with a simple, flat PDF that has no code. The only problem might be licensing. Many years ago I was involved with a web application where we used a server-side component to generate PDF files from PostScript code. It was quite expensive. If this is for non-commercial purposes you may be able to find something free or cheap.

 

 

Are you looking at this towards the likes of a tcpdf plugin? That is something I have for a pdf viewer. The upload renames and sends the file outside the root folder. Then to display it, it uses a tcpdi and includes the files via a file_get_contents. Thank you for the guidance, I needed that and will see if I can create something to read the pdf document and delete all the dodgy stuff if possible. That way, i'd feel happier that it is safer with some validation inside the pdf's.

 

On another note, I do remember seeing that it's possible to create a html from a pdf. And vice versa, wouldn't that be wiser as then you can validate the html inputs and delete the dodgy tags before it's converted back to a pdf. I may be talking outta my arse for now :-\

Link to comment
Share on other sites

Actually, correct me if i am wrong. But converting each pdf pages to each image (per page) and then converting it back to a pdf would be very secure in my eyes. Is it possible in php so the original file uploaded does not get stored on the server? Therefore, it is 100% secure.

Link to comment
Share on other sites

Actually, correct me if i am wrong. But converting each pdf pages to each image (per page) and then converting it back to a pdf would be very secure in my eyes. Is it possible in php so the original file uploaded does not get stored on the server? Therefore, it is 100% secure.

 

That would work (and is close to what I was proposing) but would create a different problem. A user would not be able to select text or search within the document since there is no longer any "text".

 

What I was specifically suggesting was, in essence, "printing" the document with less "fidelity". For example, think about printing a color document on a B&W printer. You lose the information related to the colors on the document. So, you could print/create a new PDF using the original document as the source and ensure the process is configured to not create links or other rich content - i.e. it just creates a PDF with text and images. I would assume that many PDF generators (especially free/low cost ones) do not even have the ability to create PDFs with rich content.

Link to comment
Share on other sites

I would store the document, setting a "verified" flag in your database table. Your query should not return unverified files. Then use an asynchronous process to scan the file for viruses as suggested previously.

Link to comment
Share on other sites

That would work (and is close to what I was proposing) but would create a different problem. A user would not be able to select text or search within the document since there is no longer any "text".

 

What I was specifically suggesting was, in essence, "printing" the document with less "fidelity". For example, think about printing a color document on a B&W printer. You lose the information related to the colors on the document. So, you could print/create a new PDF using the original document as the source and ensure the process is configured to not create links or other rich content - i.e. it just creates a PDF with text and images. I would assume that many PDF generators (especially free/low cost ones) do not even have the ability to create PDFs with rich content.

 

Regarding the different problem. That is all ok because all that is required is to make it secure, viewable (even if it's just text images on each page) and printable. Ideally the text search is useful but not a requirement ahead of security which is very important in my case. I will look into how I can create a pdf to another cleaner pdf before I begin to convert it to images. Thanks a million.

 

 

I would store the document, setting a "verified" flag in your database table. Your query should not return unverified files. Then use an asynchronous process to scan the file for viruses as suggested previously.

 

That is a nice option but some files can still go undetected as psycho did claim. I'd love to not have that worry as I could do without some sleepless nights thinking of solutions for something not fully corrected. You may think I am a freak but I just can't put something online that's not 100% safe and trusting my users actions. This will leave me with 2 options. Either change the pdf pages to images and then convert those images to a pdf. Or do it psycho's way.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.