arianhojat Posted February 27, 2007 Share Posted February 27, 2007 I want to make sure there are no duplicates of certain photographs uploaded to my site. Can I use md5() on a photograph's data and then store that value in the database. And when another user uploads an image, I can do an md5() on that and compare that md5 to others in the database. if they are the same, then delete the 2nd pic. My friend said using md5 on an image was pretty much very unique. Basically I want to ensure another user doesnt submit same photographs. so if you know any other way, or know previous method wont work, let me know. PS i might be getting hundreds of photographs a day so I cant check each one manually and compare to each other one. Thanks in advance! Quote Link to comment Share on other sites More sharing options...
magnetica Posted February 27, 2007 Share Posted February 27, 2007 Yea md5 is very unique But your best bet is to generate a GUID (Globally Unique Identifier) The chances of you getting two GUID the same is the same as two people having 100% exactly the same DNA Quote Link to comment Share on other sites More sharing options...
Orio Posted February 27, 2007 Share Posted February 27, 2007 Yes, you can. It would be easier to first save the photo, fetch it's md5 using md5_file() and then check if it already exists (I assume you use a database that maps all the photos?). If it does, delete it. If it doesn't, insert it to the database (including the md5 for future use). Orio. Quote Link to comment Share on other sites More sharing options...
magnetica Posted February 27, 2007 Share Posted February 27, 2007 But to save having to check use GUID Quote Link to comment Share on other sites More sharing options...
arianhojat Posted February 27, 2007 Author Share Posted February 27, 2007 Hello, all thanks for replies... Some quick questions... what exactly is a GUID? Isnt the md5 hash going to become the unique identifer in the system, are you talking about something else? P.S. is it faster to use md5_file() on the filename versus suppling the file's string data to md5()? I see both return a 32 char HEX number. aka $filename = "/uploads/something.jpeg"; $handle = fopen($filename, "r"); $contents = fread($handle, filesize($filename)); $hash = md5($contents);//not even sure if this is how you would do it on the photo's data if i were to use md5() itself fclose($handle); //versus $filename = "/uploads/something.jpeg"; $hash = md5_file($filename); Quote Link to comment Share on other sites More sharing options...
magnetica Posted February 27, 2007 Share Posted February 27, 2007 Right are you just encrypting the actual image name: image.gif or the whole path the file: images/image.gif If the whole path then forget I said anything. But it seemed you were a little worried whether md5 make a double at some point. A GUID as I said is a Globally Unique Identifier and the chances of a GUID making a double are about 1 in 1000000 billion http://en.wikipedia.org/wiki/Globally_Unique_Identifier Quote Link to comment Share on other sites More sharing options...
arianhojat Posted February 27, 2007 Author Share Posted February 27, 2007 I am not doing either, md5 encrypts the actually data in the file in my code examples? Not the name of the file correct? Quote Link to comment Share on other sites More sharing options...
btherl Posted February 28, 2007 Share Posted February 28, 2007 Yes, your code encrypts the actual data. Strictly speaking it creates a "hash" of the data. md5() is appropriate for this (although it won't detect similar images of course). GUID isn't appropriate, as it's randomly generated rather than derived from the file itself. As for which is faster, run each a thousand times and measure it Quote Link to comment Share on other sites More sharing options...
arianhojat Posted February 28, 2007 Author Share Posted February 28, 2007 why is md5_file() so quick? i change 1 px in an image and its filesize is exactly the same, yet the hash is totally different. i gues i was expecting this, reading about the function itself, but seems like processing the data is really quick (almost like magic haha). Quote Link to comment Share on other sites More sharing options...
arianhojat Posted February 28, 2007 Author Share Posted February 28, 2007 bump why is md5_file() so quick? i change 1 px in an image and its filesize is exactly the same, yet the hash is totally different. i gues i was expecting this, reading about the function itself, but seems like processing the data is really quick (almost like magic haha). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.