Jump to content

Database or files: which is better for lots of little documents?


jhsachs

Recommended Posts

My client's application creates three small documents for each user transaction. One is 200 to 300 KB; the other two are about 3 KB each. They will accumulate at a rate of about 5000/year and will have to be retained for several years, maybe forever.

 

From the application's point of view these documents are files. The simplest thing I can do is store them all in one directory. At a small cost in added complexity I can distribute them over multiple directories, or I can store them in a database table and copy them to files only when they are needed (which will be rare).

 

The server runs Linux. The database, if I use it, is MySQL.

 

For people who have dealt with large numbers of small documents: what's the best type of storage to use in this situation from the standpoint of reliability, performance, capacity, and any other factors you consider important?

Link to comment
Share on other sites

That's a valid concern, but not the determining factor here, for a couple of reasons.

 

One is that I can protect the documents by setting their directory's permission to <700>. In fact, I need to do that even if I store the documents in a database, because some of them can't be used unless they are copied to files at least temporarily.

 

Another reason is that they're low-value targets. The information in them has no monetary or competitive value. It could be used to commit fraud, but the benefit would be small, the consequences would be large if the perpetrator was caught, and the misuse would be detectable and traceable in every case. Thus security is not an overriding concern.

 

 

Link to comment
Share on other sites

Hi

 

If they are stored as files then you can store them outside of the web accessible directories.

 

If you just need to return the files to a user then you should just be able to read the file straight out with a suitable header.

 

Storing them in files entails coming up with unique names.

 

I think I would be tempted with both ways, but probably the db is the easiest way.

 

All the best

 

Keith

Link to comment
Share on other sites

If they are stored as files then you can store them outside of the web accessible directories.

 

The site is hosted on a GoDaddy shared server, and it doesn't appear to be configured that way. (I know what you mean; I've used other servers that were.) I believe that <700> permission will give me adequate protection.

 

For the future, though, I'd be interested to know whether storing the files outside the web root is really more secure, and if so, how.

 

If you just need to return the files to a user then you should just be able to read the file straight out with a suitable header.

 

True. I need to let users download the files, and I didn't realize PHP let me do that directly from a database, but I see it does.

 

Storing them in files entails coming up with unique names.

 

True, but not a problem. They're generated from information in a table row, and the table's primary index is a natural basis for a unique set of filenames.

 

I think I would be tempted with both ways, but probably the db is the easiest way.

 

I'm inclining that way too, simply because databases are designed to hold large numbers of small objects, and file systems aren't. I understand that Linux is pretty good at it, but still... why use a file to drive a screw when a screwdriver is available? (The pun was unintentional.)

Link to comment
Share on other sites

Storing files in blobs is not a great solution in most cases, but if you're only talking 5k entries in a year, it really doesn't matter.  In 10 years you'd only have 50,000 rows.  These days, what most people would probably use is a nosql database.  The one that immediately comes to mind for me is membase.  You might wonder who is using membase?  One company many people have heard of is Zynga, who uses it to store lots of small files associated with their games.

 

Membase is extremely fast, scalable to multiple servers, and was built specifically for storing and retrieving files.  For example, you could have 2 or 3 membase servers, and configure them to replicate, and this would insure that if one server went down you'd have a backup.  O

 

With that said, this application, if security is at all of importance, shouldn't be running on a godaddy shared server in the first place. Forgetting the impracticality of 700 on a web accessible file ... if the file is available in webspace it doesn't really matter what the perms are.  It is already public.  If however, these are files that should only be delivered to certain people who authenticate, then you really do have to store them outside the webroot or in a database to have any sort of basic security.  That should be doable even on a shared server, where your webroot is typically going to be:  /home/username/public_html or something similar.  Your home dir of /home/username can have directories underneath it where you can store the files, so that they are not under the webroot.

 

 

 

 

Link to comment
Share on other sites

Interesting stuff, and it may be useful in the future. It's not useful now because the hosting service doesn't support it, and won't let us add it unless we subscribe to a much more expensive dedicated server, which we would then have to pay them to maintain. Again, the magnitude of the problem doesn't justify that type of solution.

Link to comment
Share on other sites

Why outside the web root?

 

I considered it adequate to set the directory's permissions to <700>, and in this case I have no choice, because the host has no directories outside the web root. For future work on servers that do, however, I'd like to understand if/why storing files outside the web root is more secure.

Link to comment
Share on other sites

Hi

 

If they are outside the web route then there is no way that a user can just request them to be served by the server.

 

As far as I know using 700 will still give the web server access to the files, and it is that which would read it if requested by the user.

 

All the best

 

Keith

Link to comment
Share on other sites

Hi

 

If it is outside the web directories then only via a script that presents the file. And you can code the script to provide the protection you want.

 

If it is in the web accessible directories then they just need to type the url with the file name on the end of it.

 

All the best

 

Keith

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.