Marl Posted April 2, 2006 Share Posted April 2, 2006 I have a website with about 15 000 images, and get about 20-30 new every day. However - duplicates are becomming a problem, and right now i'm looking for a method to find alert the user who are uploading that his/her image may already be in the database.First - i have a SHA-checksum on all the images, so it's not a problem to find a exact duplicate, but the problem arise when it has been scaled, saved an extra time (jpg) or someting like that.After searching for a solution i found out that ImageMagick has a "compare"-function, but running compare on 15000+ images every time a user is uploading a new image is not an option :/Another method i was thinking about was taking 20-30 "testpixels" from each image, and save the color, and try to match them on every new image. However - this would only work if the images has the same size.The last solution I have been thinking about is calculating some sort of "average color" of a picture, but i fear that it wouldn't be very reliable and either return far to many, or only exact copies.How would you solve this problem?Isn't there some kind of standart solution? Quote Link to comment https://forums.phpfreaks.com/topic/6394-finding-duplicate-images/ Share on other sites More sharing options...
Desdinova Posted April 2, 2006 Share Posted April 2, 2006 Well for starters you could of course limit your needs. So instead of wanting to make sure you get ALL duplicates, just find some, or a lot. And for this you could use the methods you described yourself. And maybe you should think of Filesize. Quote Link to comment https://forums.phpfreaks.com/topic/6394-finding-duplicate-images/#findComment-23154 Share on other sites More sharing options...
litebearer Posted April 2, 2006 Share Posted April 2, 2006 Sunday morning, daylight savings time, lost another hour of sleep, just took my meds, not happy with the bald spot that keeps growing on the back of my head (but at least its not being replaced by ones on my back or in my ears), anyway how about...a separate table |imageID|cksum value of image|cksum value of thumb|index on cksum valueswhen new image uploaded, search the table for matches?Lite... Quote Link to comment https://forums.phpfreaks.com/topic/6394-finding-duplicate-images/#findComment-23170 Share on other sites More sharing options...
redbullmarky Posted April 2, 2006 Share Posted April 2, 2006 i've been looking at ways of doing a similar thing, and there were two occasions where an image match would cause issues and some solutions i had for both:1, image has been cropped compared to the existing image/image is the larger, uncropped version of the existing image - i thought, even though it's a lengthly process, taking a few 'sample' lines from the smaller of the two images using 'imagecolorat' (GD library). then going through each line of the larger image (also with imagecolorat) looking for a match.2, image has been resized. scale the smaller picture either normal or resampled (using GD library again) so it's the same size as the larger picture. run similar check as in point 1 above.ok so it's not going to be perfect, but if it cuts down on even 10 or 20% of duplicates, it's a start, and a smaller problem to what you have now. i've tested out number 1 on a few images and found it to be reasonably successful at what it does.to be honest, you're not going to get anything that's gonna search 15,000 images doing any of these methods working very fast. all i can suggest is you let the user upload whatever picture they choose, but use a function like this to 'prune' your files yourself to tidy things up a bit. Quote Link to comment https://forums.phpfreaks.com/topic/6394-finding-duplicate-images/#findComment-23179 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.