Jump to content

Saving files for users


NotionCommotion

Recommended Posts

I have a bunch of users in a database (id, name, etc).

 

I have a bunch of documents which belong to users (id, filename, users_id, etc), and expect 500 or less per user.

 

The documents will be renamed to the document_id, and X-Sendfile (since they are stored under the document root) will be used to retrieve them and a header will be used to return them to their original name.

 

Is it recommended to make a separate folder for each user and store each individual user's documents in that folder, or create one folder for all documents?

 

If I go with the one folder approach, I will need some method from keeping the total files per folder below some reasonable limit (1,000?).  My thought is to estimate the maximum potential number of folders, and creating subfolders under the main document folder.  I will likely hash the ID, and use the first character to create the first subfolder, the second character to create a second subfolder in the first subfolder, and continue as long as needed to accommodate the maximum potential documents (if there are 1,000,000 potential folders, then three levels will keep the maximum per folder under 244).

 

Please provide rational for one approach over the other.

 

Thank you

Link to comment
Share on other sites

Naturally, putting all the files in a single folder isn't that great.

 

I'd go with more than one character from the hash. Like two. At three levels that's 16M leaf nodes, and 1M users with 500 files each is only ~31 files per leaf.

 

Remember that you can change the folder scheme at any point too (if you don't mind leaving the files where they are now). If you store the full path to the file, rather than just the hash, then it doesn't really matter where they are.

Link to comment
Share on other sites

Naturally, putting all the files in a single folder isn't that great.

 

I'd go with more than one character from the hash. Like two. At three levels that's 16M leaf nodes, and 1M users with 500 files each is only ~31 files per leaf.

 

Remember that you can change the folder scheme at any point too (if you don't mind leaving the files where they are now). If you store the full path to the file, rather than just the hash, then it doesn't really matter where they are.

 

So, you would not create a separate folder for each user, correct?  Why or why not?

 

Two character will result in 256 sub-folders per folder instead of 16, however, I guess this is better and agree.

 

I actually wasn't planning on storing the hash, just the following four fields, and and the fifth if the full path was saved.  Saving the full path just seems anti-normalized, however, maybe it makes sense.

id: 5
name: blabla.pdf
users_id: 27
date_uploaded: 2014-11-15 13:59:59
full_path_to_file: /bla/bla/user_files/e4/da/3b7fbbce2345d7772b0674a318d5
Link to comment
Share on other sites

So, you would not create a separate folder for each user, correct?  Why or why not?

Could. But you'd still have to partition them into additional directories.

It's not my first choice simply because that's generally not how I do it - I like even spreads and users don't upload evenly.

 

I actually wasn't planning on storing the hash, just the following four fields, and and the fifth if the full path was saved.  Saving the full path just seems anti-normalized, however, maybe it makes sense.

id: 5
name: blabla.pdf
users_id: 27
date_uploaded: 2014-11-15 13:59:59
full_path_to_file: /bla/bla/user_files/e4/da/3b7fbbce2345d7772b0674a318d5

 

1. Perfect normalization is not always the best thing in software development. Some redundancy helps.

2. It's only redundant as long as you can compute the path based on other data. If you change the naming scheme and don't move the files to match then it's no longer redundant.

 

Another option is splitting on upload date: Y/m/d. Maybe /h too. It's less arbitrary than the hash.

Link to comment
Share on other sites

Could. But you'd still have to partition them into additional directories.

It's not my first choice simply because that's generally not how I do it - I like even spreads and users don't upload evenly.

 

1. Perfect normalization is not always the best thing in software development. Some redundancy helps.

2. It's only redundant as long as you can compute the path based on other data. If you change the naming scheme and don't move the files to match then it's no longer redundant.

 

Another option is splitting on upload date: Y/m/d. Maybe /h too. It's less arbitrary than the hash.

Thanks requinix,

 

On my original post, I said less than 500 documents per user.  But what if I am wrong?  I will change to something that spreads things out evenly.

 

For now, I won't save the full path.  If I ever need to, the paths can be derived from the ID.

 

Thanks for your help

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.