Jump to content

Guidance please


Recommended Posts

I have created a script that will upload and reminder images.
Currently, the script gives each image a common name (myPhotoSample) and a timestamp.

Now, I want to replace the timestamp with a number so that each image is listed sequentially with more easily recognized values (1,2,3,etc).

My concern is the effect this could have if several uploads were coincidentally started simultaneously.

I doubt the script (or directory) would allow duplicate names to be rendered, so would I lose files because of overwriting?

If 3 uploads of 10 images each were all started at exactly midnight what could go wrong? How can I best resolve the risk and ensure that I get 30 complete files sequentially numbered?
 

Link to post
Share on other sites

using a timestamp also has this problem, since multiple concurrent uploads can all complete in the same second and attempt to use the same timestamp value.

the simplest, fool-proof, and 'atomic' way of doing this is to insert a row into a database table that has an auto-increment integer primary index column, get the last insert id from that query, then use the last insert id as part of the file name.

Link to post
Share on other sites

@gw1500se can you give me more info on how PHP handles this automatically?  After upload, every file needs to be "moved" to the destination directory, so I am trying to avoid doing double work.

@mac_gyver. Thank you for pointing out the timestamp issue. I am trying to minimize database involvement.

If I record a users name and other random info when they upload, it will involve a single row in a table. Are you suggesting that I repeat this info and add a row for every image that the user uploads?  Won't that add a significant storage problem?

Link to post
Share on other sites

@gw1500se can you give me more info on how PHP handles this automatically?  After upload, every file needs to be "moved".  Are you saying that my concern is actually a non-issue because PHP will 'magically' eliminate the potential for conflict? Even without using a db safety net?

Link to post
Share on other sites
7 minutes ago, phppup said:

Are you suggesting that I repeat this info and add a row for every image that the user uploads?

yes. a row per uploaded file. the only repeated information would be the user's id, relating the row back to the user it belongs with.

databases are for storing data. each file that gets uploaded has who (user id), what (title, description), when (datetime), where (it's customary to record the ip address of the user for each piece of data that gets stored from them), and possibly some other why information associated with it.

10 minutes ago, phppup said:

Won't that add a significant storage problem?

what do you consider to be a significant storage problem? with today's server hardware, large database tables start at 5-10 million rows.

Link to post
Share on other sites

@mac_gyverI suppose that size ought to be enough, to start. LoL

Taken a step further, if persons Anne, Bill, and Charlie began their uploads ten minutes apart, then image numbers 1 thru 10 would be Anne's images, 11 - 20 would belong to Bill, and 21 - 30 would be from Charlie.

If they begin simultaneously, what is the likelihood that the same result is achieved?

How likely is it that the end result would be a shuffling resulting in Anne's first, Bill's first, Charlie's first, Anne's second, etc? Or worse?

Will using a db  eliminate  a situation like this, or simply make it easier to unscramble the result?

Link to post
Share on other sites

the computer doesn't care what the actual image numbers/filenames end up being, why do you think it will be a problem? you would just query to get any person's list of image data/filenames matching any condition you want.

Link to post
Share on other sites
1 hour ago, phppup said:

Will using a db  eliminate  a situation like this, or simply make it easier to unscramble the result?

Using a DB allows you to store meta-data separately from the physical file, so the name you actually give the physical file become irrelevant.

In my systems all the files are stored with an random name, generated using something like bin2hex(random_bytes(8)); which results in a name like f1d47394c0bf8d9d.   That is the name of the file on disk and means there is very little chance of a name collision (though I check anyway for one).

That random name is then stored in the database along side whatever the name of the original upload was ($_FILES['blah']['name']) and the original mime type ($_FILES['blah']['type']) so when the user wants to view their files I can show it in their original name and download it with the original type.  As a result every user can have their own me.jpg or whatever without any conflicts on the server.

If you'd rather have your own naming convention rather than preserving the user's original name then all you do is generate and save your custom name rather than the original, everything else is the same.

 

 

Link to post
Share on other sites

@mac_gyver @kickenI guess I should elaborate somewhat, by clarifying that I want all the uploads in one single folder. 

For example, I go to a concert and stand at center of the audience. Linda is on the left aisle and Ricky on the right. We each take 20 photos that we all upload to a single folder.

They will all be stored and renamed as "Concert" followed by the sequence number (ie. Concert_1, etc).

My only real concern (at this point) is an attempt to avoid sequence conflict or shuffling of images.

Ideally, I would want the total of 60 photos to be numbered so that the first 20, middle 20, and last 20 can be attributed to each individual person with a limited amount of mish-mash, interloping, or shuffling involved (even if we all begin the upload process at the exact same moment).

Link to post
Share on other sites

@phppup, it doesn't matter how the files are stored on disk unless your users are just loading up that directory and viewing the files directly.

Store them on disk however you want and just track which files belong to which user in your database then present them to the users that way.

You can't really sequence them how you want unless you know ahead of time how many photos each user will be uploading.  Say you decide to just +10 for each user so Linda is 1-10, You're 11-20 and Ricky is 21-30.  So everyone uploads one photo and now you have Concert_1.jpg, Concert_11.jpg, Concert_21.jpg like you want.  But then Linda uploads 20 photos not 10?  You have a problem, your files are no longer grouped nicely.

Forgot about trying to match the physical storage to some sort of order like that. Just store them.  Handle the sorting/ordering in the DB.

Edited by kicken
  • Great Answer 1
Link to post
Share on other sites

@kickenMy example was only meant to indicate that the uploaded images for each person would be sequentially numbered as a full batch without interruption regardless of quantity.  

If Linda uploaded 4, her four files would remain together, if Ricky uploaded 132, all 132 would run the next sequence of numbers.

So getting back to my original question, how will PHP handle an instance where two (or more) uploads occur at exactly the same time and create the same file name (which is possible even when using a timestamp or random number)?

[I mean, I haven't actually tried it, but what will happen in a directory if you run a PHP script to rename each file TEST. Will an appended message be created by default (ie.  Copy1, Copy 2)?]

What are the pros and cons between storing the image as a file in the directory versus in the DB table?

Thanks for all the info.

 

Edited by phppup
Link to post
Share on other sites
20 minutes ago, phppup said:

how will PHP handle an instance where two (or more) uploads occur at exactly the same time and create the same file name (which is possible even when using a timestamp or random number)?

PHP uploads the files into a temporary directory with a random name, so there's no problem.  It's entirely up to you what happens after that.

If you want to keep them around long term, you have to move them from that temporary directory to your own storage location.  If you just do a simple copy/move to the same name then one or them will be lost.

22 minutes ago, phppup said:

What are the pros and cons between storing the image as a file in the directory versus in the DB table?

To be clear, you still store the files in a directory, not the DB.  You just track what file you stored where using the DB.  That allows you to mostly ignore the problem of clashing file names by either naming the file randomly or by using the auto-generated ID of the database record as the files name.  It also enables you to store additional meta data about the file (who owns it, when it was uploaded, tags, # of downloads, etc) that can be used for additional features.

26 minutes ago, phppup said:

My example was only meant to indicate that the uploaded images for each person would be sequentially numbered

Like I said, just do that then if you want

19 hours ago, kicken said:

If you'd rather have your own naming convention rather than preserving the user's original name then all you do is generate and save your custom name

$storageFolder = './Uploads/';
$nameTemplate = 'Content_%d.jpg';
$counter = getStartingNumber();
foreach ($uploadedFile as $file){
    $diskName = bin2hex(random_bytes(8));
    $friendlyName = sprintf($nameTemplate, $counter++);
    
    copy($file, $storageFolder.$diskName);
    // INSERT INTO user_file (UserId, DiskName, FriendlyName) VALUES ($userId, $diskName, $friendlyName)
}

Every user then would have their own Content_1.jpg ... Content_n.jpg without having to worry about file clashes.

 

If you really don't want to use a DB, then the best thing to do would be to have a separate folder for each user, ie ./Uploads/$userId/ then you can still give each user their own Content_$number.jpg file in sequence.

If you absolutely have to store everything in one directory then you need to have a lock file.  Your script would obtain a lock on that lock file before processing the uploads.  Then create your file using fopen($fileName, 'x'); to avoid any potential conflict and copy the data over.

 

Link to post
Share on other sites

@kicken I'm beginning to see your point. And thank you for your patience.

One last thing, since you mention the temporary directory with random file naming: Assuming the same scenarios, are random names somehow discarded after use?

Is there a possibility of a random name being repeated?

If I used a naming convention of timestamp and sequence number and temp_name could a conflict STILL occur (albeit a slim chance) when multiple uploads occurred simultaneously?

PS: I am realizing the benefit of using a db, but the addition will take some re-tooling.

Thanks again for the help.

Link to post
Share on other sites
35 minutes ago, phppup said:

One last thing, since you mention the temporary directory with random file naming: Assuming the same scenarios, are random names somehow discarded after use?

Is there a possibility of a random name being repeated?

When PHP initially saves the upload to the temporary folder with a random name (in the form of phpXXXX.tmp).  Once the script ends, these files are then deleted.  AFAIK PHP won't re-used the name of a file that already exists however so a file won't get overwritten while your trying to handle it.

As far as when you generate a file random name to save the file permanently, re-use is something you need to consider.  Something like what I showed above is unlikely to generate a duplicate name, however you can guard against it if you want by checking and generating a new one.

do {
	$name = bin2hex(random_bytes(8));
} while (file_exists($name));

Technically even that is subject to a race condition, but the chance is so small I don't worry about it.

Adding additional uniqueness on top of the random name will certainly help if that's something you want to do.  For example you might use a format of $userName-$randomBytes so each user can have their own set of randomness.  That would also allow you to reduce the number of random bytes.  Creating a directory per user would similarly reduce the chance of conflict, and would also help prevent having tons of files in a single directory in case you ever need to inspect the directories manually.

It basically just boils down to what you feel like implementing and how good is "good enough" for you.  My go to these days is to have a simple table that associates files on disk with data in the database and using purely random names that are spread out by the first two letters of the name (ie f1d47394c0bf8d9d gets stored in f1/f1d47394c0bf8d9d).

 

Edited by kicken
Link to post
Share on other sites

I think you have to answer some questions that perhaps you have not even thought through:

When Linda uploads 4 pictures, followed by John uploading 20 pictures, while at the same time Fred is uploading 40 pictures, how does the system know the difference between these 3 users?  Most gallery systems just deal with the date the picture was taken in terms of keeping them together in groups, as they naturally can store them based on the date/time the picture was taken.  Obviously you can have multiple pictures take at the same date/time but you really only need to add to the name a small randomly generated piece to be able to store multiples, and the chances of 2 pictures (in a small gallery) being taken at the exact same second is usually an edge case.

Databases have thought through a lot of the issues with concurrency (multiple users trying to manipulate the database at the same moment while perhaps others are trying to read out information) as well as ways to provide fast access to a subset of the information.

With that said, you could also develop this entirely without a database by using a meta file.  Xml, yaml, ini and json are all popular file formats that can be read and written to via php.  

Without a database, you could store a meta file of the same name, only with a different extension, to store the information about the file (original filename, user etc) which could then be read to service display of the gallery or whatever you are engineering.  A big advantage of that is that those files will be very small compared to the actual images and can act (as a database row would also act) as a standin for the actual images when you are displaying or sorting the gallery. 

The other thing to keep in mind about images is that php also can read the exif data from an image, so that opens up some interesting features that might let you introspect the information from the picture.  For example, the FileDateTime timestamp of the photo maintained by the camera or phone.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.