Jump to content

Proper Directory Structure And How To Loop Through It


MockY

Recommended Posts

I'm building a software where the user creates various PDF documents (invoices, forms, etc.). I need a way to store these documents properly, and my initial thought is to store all these PDF files in a directory named by the customer number. I also like to future proof this and by allowing growth to more than a million customers, I'm sure I wont reach the limit.

 

But how would I go about to to properly store this? I'm padding every customer number with preceding zeroes, so customer 1 will be 00000001 and customer 405 will be 00000405. But this is where I'm stuck. Should I do /0/00/000/01 and store it there, but if so, what do I do with the 405 customer?

As you can tell, I need some guidance.

 

Once I figure this out, how would I go about to properly loop through the folders to see if they are already created, and if not, create it? I played around with this a bit but I end up with way to much code, but I also started in the wrong end. First I need to figure out what directory structure I should use before attempting the code, but what I came up with looks something like this

 

$customer_id = '1002';


$end_directory = str_pad($customer_id, 10, '0', STR_PAD_LEFT);
$zeros = 8-(strlen($customer_id));
$base = 'customer_files/';
$structure = '';
for ($i=1; $i<$zeros+1; $i++) {
 $structure .= str_pad('0', $i, '0', STR_PAD_LEFT) . '/';
 if (!is_dir($base . $structure)) {
	 mkdir($base . $structure);
 }

}
if (!is_dir($base . $structure . $end_directory)) {
 mkdir($base . $structure . $end_directory);
}

Link to comment
Share on other sites

The way I typically store them is just a numbering system based on the autoid. Something like this...

 

$increment = 100;
$customerID = 1205;

$int = intval($customerID/$increment);

//the folder will now be 1200_1299
$folder = $int*$increment."_".(($int+1)*$increment-1);

 

You can layer this if you expect 10s of thousands of customers or change the increment level to store more than 100 customers in a single folder. Otherwise, you just check if the folder exists and then store your file in there with whatever naming convention you want. Like customer_1205. Just one of many options.

Edited by akphidelt2007
Link to comment
Share on other sites

Thanks for your time/reply.

 

Interesting way of handling this. However, if I want to future proof this by expecting 1 million customers, that would give me 10,000 subdirectories in that one directory. Like you said, the number will decrease with increased increment, but what if I just want 10, or 20 subdirectories per directory, how would I go about and do this without ending up with a huge amount of subdirectories in one single root? You mentioned do layers if I expect 10s of thousands of customers...could you elaborate further on that please.

Link to comment
Share on other sites

I've never done anything past 10,000 items stored... so there might be some other experts here that can provide solutions for handling that much data. But say you wanted only 100 files in a single subdirectory. Then you just expand your increments... for example

 

//10000 is layer 1, 1000 is layer 2, 100 is layer 3
$increments = Array(10000,1000,100);
$customerID = 1035;

//store path
$path = 'customer_files/';

//get the path
foreach($increments as $inc)
{
   $int = intval($customerID/$inc);

   $dir = $int*$inc."_".(($int+1)*$inc-1);

   $path .= $dir.'/';
}

//add on the customer
$path .= 'customer_'.$customerID.'/';

 

So for customer 1035 the path would be...

 

customer_files/0_9999/1000_1999/1000_1099/customer_1035/

 

So you would never have more than 100 customers in any subdirectory and it would be layered appropriately.

Link to comment
Share on other sites

Oh wait, I see you only said 10 in a subdirectory. In that case just adjust the increments to whatever you want.

 

For example... if you want no more than 10 directs in a subdirectory and expect a million customers, the increments would be

 

$increments = Array(1000000,100000,10000,1000,100,10);

 

So customer id #35005 would be

 

customer_files/0_999999/0_99999/30000_39999/35000_35999/35000_35099/35000_35009/customer_35005/

 

I honestly don't think you need to go to that level but it's just showing the possibilities by simply adjusting your increments.

Link to comment
Share on other sites

This depends a lot on the file system used by the server, and the capabilities of both it and the OS.

I'm going to assume that it's using EXT3 running on an updated Linux kernel, in which case you should find this thread quite useful. The reply from Sean Reifschneider further down on that page is also quite enlightening.

Link to comment
Share on other sites

This depends a lot on the file system used by the server, and the capabilities of both it and the OS.

I'm going to assume that it's using EXT3 running on an updated Linux kernel, in which case you should find this thread quite useful. The reply from Sean Reifschneider further down on that page is also quite enlightening.

 

Definitely a good read.

Link to comment
Share on other sites

How are you storing the customer ID's? Through a database? If by database, you could probably make this easier by doing the following:

 

if(!file_exists('uploads/' . $customer_id)) {
mkdir('uploads/' . $customer_id, 0755);
}

 

This will create a directory for each customer with no limit restriction. If the customer exists with a valid id, then the directory will be created for that customer.

Edited by parkerj
Link to comment
Share on other sites

akphidelt2007 - Many thanks for the help. This works very nicely, and it would be very easy for any other program to navigate through the structure to quickly find what needs to be found. Gold star for this solution. And having 100 customers per directory is probably sufficient especially after the read from serverfault.com.

 

Side note: I'm using EXT4 or latest Linux kernel (Ubuntu 12.04). With that in mind, why would someone run EXT3 on a modern server OS? Just curious to whether there is an advantage to run EXT3 over EXT4.

Link to comment
Share on other sites

Not really, but there are a lot of hosts who're really slow to upgrade such things. Thus the high probability of it being in use.

 

I would also like to point out two, potential, flaws with akphidelt2007 solution: Directory depth, and uneven distribution. For the latter one, I found this article very informative:

http://michaelandrews.typepad.com/the_technical_times/2009/10/creating-a-hashed-directory-structure.html

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.