lefthand Posted April 1, 2010 Share Posted April 1, 2010 There is some information on indexing libpuzzle vectors in their readme: http://download.pureftpd.org/pub/pure-ftpd/misc/libpuzzle/doc/README http://libpuzzle.pureftpd.org/project/libpuzzle/php However - I don't understand it. Has anyone done this? This problem is part php and part mysql... What I need help with is how to split the vectors into words and what parts to put where in the database. Help would be appreciated Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/ Share on other sites More sharing options...
andrewgauger Posted April 1, 2010 Share Posted April 1, 2010 So after all the compilation help it calls out: $signature = puzzle_fill_cvec_from_file($filename); as the way to use the library to get pictures. Are you able to do this and echo the results? So next get 2 signatures: $signature = puzzle_fill_cvec_from_file($filename); $signature2 = puzzle_fill_cvec_from_file($filename2); $d = puzzle_vector_normalized_distance($signature1, $signature2); echo that result. So now that you have signatures set up a database as the directions calls out: CREATE TABLE signatures (sig_id int auto_increment primary key, signature char(544), pic_id int); CREATE TABLE pic (pic_id bigint auto_increment primary key, filename varchar(255)); CREATE TABLE words (words char(10), sig_id int); so your $signatures go into signatures table, with pic_id being the key returned from an insert into pic with the filename words are generated by parsing through the signature (1 to 544-10)-1 for ($i=0;$i<533;$i++){ $words=substr($signature,$i,10); //INSERT $WORDS INTO WORDS with sig_id being a reference to signatures } let me know how it goes, this is an interesting module, let me know of a practical purpose if you could. Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/#findComment-1035420 Share on other sites More sharing options...
lefthand Posted April 1, 2010 Author Share Posted April 1, 2010 as the way to use the library to get pictures. Are you able to do this and echo the results?Yes Thanks for the loop helped a bunch! Though to be correct I had to change it to: $words[]=substr($signature,$i,10); CREATE TABLE words (words char(10), sig_id int);My trouble now is that this should contain "pos_and_word" where do I put position?? And when that is done ... how does one sort the table to put similar pictures next to eachother? let me know how it goes, this is an interesting module, let me know of a practical purpose if you could. Big database of images downloaded by different people. I wish to remove "duplicates" that does not share the same sha1 hash - save space And it's fun working with databases Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/#findComment-1035462 Share on other sites More sharing options...
andrewgauger Posted April 1, 2010 Share Posted April 1, 2010 Ok, I didn't think that the pos was important but you should extend the table with a pos into it and input the value of $i. Also your modification will not put a single value into the table. Be advised that the $words was supposed to be a single value of 10 chars input into a table and then reused. that mysql command should insert into table words (word,pos,sig_id) values ($words, $i, $sig_id) BUT forget about the words. That is for seraching internal elements of the picture for similarity. You can simply use the entire signature and verify closer to .8 similarity because you are looking for identical pictures, not just pictures with 60% similarity in the upper right corner--which is the implementation of the words. Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/#findComment-1035491 Share on other sites More sharing options...
lefthand Posted April 1, 2010 Author Share Posted April 1, 2010 Hmmm, I'm heavily loaded on vin rouge right now but I'll reply anyways Then, index your vector with a compound index of (word + position). Even with millions of images, K = 10 and N = 100 should be enough to have very little entries sharing the same index. I was sort of hoping to have MySql sort the output and that would create a list of similar images. What I wish to do is something like tineye.com ... Your input has nonetheless been very useful Andrew Thanks! I'll reread it again when sober BUT forget about the words. That is for seraching internal elements of the picture for similarity. You can simply use the entire signature and verify closer to .8 similarity because you are looking for identical pictures, not just pictures with 60% similarity in the upper right corner--which is the implementation of the words. Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/#findComment-1035500 Share on other sites More sharing options...
lefthand Posted April 1, 2010 Author Share Posted April 1, 2010 I sort of did it. Indexed pos and word ... SELECT DISTINCT sha1 FROM puzzle_words USE INDEX(pos_and_word) So far unable to tell if it works. With 10.000 pics you'll get a table with 10.000.000 rows (using K=100). I'll try it on a smaller sample size with a lot of dupes (not sharing the same sha1 hash). Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/#findComment-1035548 Share on other sites More sharing options...
andrewgauger Posted April 2, 2010 Share Posted April 2, 2010 Oh, you would actually just compile a database of signatures. Then do a nested foreach so you'd //TODO assign $signature[]=SQL select of all signatures $signatures2=$signature; foreach $signatures as $sig{ foreach $signatures2 as $sig2{ if ($d = puzzle_vector_normalized_distance($sig, $sig2)>.99){ echo "match: $sig and $sig2"; } } } savvy? Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/#findComment-1036040 Share on other sites More sharing options...
andrewgauger Posted April 3, 2010 Share Posted April 3, 2010 oops, I made an error in that code: //TODO assign $signature[]=SQL select of all signatures $signatures2=$signature; foreach $signatures as $sig{ foreach $signatures2 as $sig2{ if (puzzle_vector_normalized_distance($sig, $sig2)>.99){ echo "match: $sig and $sig2"; } } } Also this example would output the actual signatures. You'd better create a class that holds the signatures and have ->id and ->sig so you could print the ids and normalize the ->sig(s) Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/#findComment-1036284 Share on other sites More sharing options...
orcanoid Posted July 5, 2010 Share Posted July 5, 2010 hi! can u you post SQL query example, how find similary images for my main IMG? i must SELECT all records with words or....? how use this database structure? Thanks! Link to comment https://forums.phpfreaks.com/topic/197264-libpuzzle-vectors-in-database/#findComment-1081606 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.