NotionCommotion Posted November 13, 2014 Share Posted November 13, 2014 I have a bunch of users in a database (id, name, etc). I have a bunch of documents which belong to users (id, filename, users_id, etc), and expect 500 or less per user. The documents will be renamed to the document_id, and X-Sendfile (since they are stored under the document root) will be used to retrieve them and a header will be used to return them to their original name. Is it recommended to make a separate folder for each user and store each individual user's documents in that folder, or create one folder for all documents? If I go with the one folder approach, I will need some method from keeping the total files per folder below some reasonable limit (1,000?). My thought is to estimate the maximum potential number of folders, and creating subfolders under the main document folder. I will likely hash the ID, and use the first character to create the first subfolder, the second character to create a second subfolder in the first subfolder, and continue as long as needed to accommodate the maximum potential documents (if there are 1,000,000 potential folders, then three levels will keep the maximum per folder under 244). Please provide rational for one approach over the other. Thank you Quote Link to comment Share on other sites More sharing options...
requinix Posted November 14, 2014 Share Posted November 14, 2014 Naturally, putting all the files in a single folder isn't that great. I'd go with more than one character from the hash. Like two. At three levels that's 16M leaf nodes, and 1M users with 500 files each is only ~31 files per leaf. Remember that you can change the folder scheme at any point too (if you don't mind leaving the files where they are now). If you store the full path to the file, rather than just the hash, then it doesn't really matter where they are. Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted November 14, 2014 Author Share Posted November 14, 2014 Naturally, putting all the files in a single folder isn't that great. I'd go with more than one character from the hash. Like two. At three levels that's 16M leaf nodes, and 1M users with 500 files each is only ~31 files per leaf. Remember that you can change the folder scheme at any point too (if you don't mind leaving the files where they are now). If you store the full path to the file, rather than just the hash, then it doesn't really matter where they are. So, you would not create a separate folder for each user, correct? Why or why not? Two character will result in 256 sub-folders per folder instead of 16, however, I guess this is better and agree. I actually wasn't planning on storing the hash, just the following four fields, and and the fifth if the full path was saved. Saving the full path just seems anti-normalized, however, maybe it makes sense. id: 5 name: blabla.pdf users_id: 27 date_uploaded: 2014-11-15 13:59:59 full_path_to_file: /bla/bla/user_files/e4/da/3b7fbbce2345d7772b0674a318d5 Quote Link to comment Share on other sites More sharing options...
requinix Posted November 14, 2014 Share Posted November 14, 2014 So, you would not create a separate folder for each user, correct? Why or why not?Could. But you'd still have to partition them into additional directories.It's not my first choice simply because that's generally not how I do it - I like even spreads and users don't upload evenly. I actually wasn't planning on storing the hash, just the following four fields, and and the fifth if the full path was saved. Saving the full path just seems anti-normalized, however, maybe it makes sense. id: 5 name: blabla.pdf users_id: 27 date_uploaded: 2014-11-15 13:59:59 full_path_to_file: /bla/bla/user_files/e4/da/3b7fbbce2345d7772b0674a318d5 1. Perfect normalization is not always the best thing in software development. Some redundancy helps.2. It's only redundant as long as you can compute the path based on other data. If you change the naming scheme and don't move the files to match then it's no longer redundant. Another option is splitting on upload date: Y/m/d. Maybe /h too. It's less arbitrary than the hash. Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted November 14, 2014 Author Share Posted November 14, 2014 Could. But you'd still have to partition them into additional directories. It's not my first choice simply because that's generally not how I do it - I like even spreads and users don't upload evenly. 1. Perfect normalization is not always the best thing in software development. Some redundancy helps. 2. It's only redundant as long as you can compute the path based on other data. If you change the naming scheme and don't move the files to match then it's no longer redundant. Another option is splitting on upload date: Y/m/d. Maybe /h too. It's less arbitrary than the hash. Thanks requinix, On my original post, I said less than 500 documents per user. But what if I am wrong? I will change to something that spreads things out evenly. For now, I won't save the full path. If I ever need to, the paths can be derived from the ID. Thanks for your help Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.