stubarny Posted August 27, 2012 Share Posted August 27, 2012 Hi All, Thanks for your help on the other collaberative filtering thread. I'd just like to post a similar problem I had a while back that I gave up on - but I think in hindsight it may be possible to solve? Let's say we have a website where users post "submissions" and then can vote whether they like or dislike each other's submissions by submitting "submission_votes". I'd like to create a list of recommended posts by following the below logic: 1. find all the submissions that I like 2. find all the users who like at least 1 submission that I like 3. find all the submissions that are liked by users who like at least 1 submission that I like 4. order the returned submissions by the number users who like them (by "users" I mean "users who like at least 1 submission that I like") The tricky bit(?) is that users can vote on a submission more than once (maybe they disliked it at first, but now like it) and therefore only the latest vote should be used (for each combination of submission_vote_account_number and submission_vote_submission_number). Please find the table structures below: submissions: submission_number (unique index, autoincrementing) submission_content submission_account_number (foreign key --> field "account_number" in table "accounts") submission_timestamp submission_votes: submission_vote_index (unique index, autoincrementing) submission_vote_account_number (foreign key --> field "account_number" in table "accounts") submission_vote_submission_number (foreign key --> field "submission_number" in table "submissions") submission_vote_content (1 or 0 - 1 being like, 0 being dislike) submission_vote_timestamp accounts: account_number (unique index, autoincrementing) account_username Thanks, Stu Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 28, 2012 Author Share Posted August 28, 2012 Assuming my account number is 1: Step1: " SELECT submission_vote_submission_number WHERE submission_vote_account_number = '1' " but this is finding all votes where I "liked" a submission, how do I change this so that it only looks at my last vote for each submission? Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted August 28, 2012 Share Posted August 28, 2012 Assuming my account number is 1: Step1: " SELECT submission_vote_submission_number WHERE submission_vote_account_number = '1' " but this is finding all votes where I "liked" a submission, how do I change this so that it only looks at my last vote for each submission? Just add one more statement to compare current unixtimestamp to last date record. Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 28, 2012 Author Share Posted August 28, 2012 thanks but I need to find my last vote for each submission that I've voted on... I think the code below will group by submission_vote_submission_number and then order by timestamp, but how do I ignore all but the most recent record for each value of submission_vote_submission_number? Step1: "SELECT submission_vote_submission_number WHERE submission_vote_account_number = '1' GROUP BY submission_vote_submission_number, submission_vote_timestamp DESC" Thanks, Stu Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted August 28, 2012 Share Posted August 28, 2012 Try, SELECT submission_vote_submission_number FROM `submission_votes` WHERE submission_vote_account_number = '1' GROUP BY `submission_vote_submission_number` HAVING `submission_vote_timestamp`= MAX(`submission_vote_timestamp`); Why are you using only one table? Use both, join them and make the query much more flexible. Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 28, 2012 Author Share Posted August 28, 2012 thanks I'll try that code Why are you using only one table? Use both, join them and make the query much more flexible. step 1 is to just find which submissions I currently "like". so in steps 2-4 I'll probably be joining tables (and asking you for help ) Thanks, Stu Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 28, 2012 Author Share Posted August 28, 2012 Thank you - I have step 1 (find all the submissions that I like) working. I just needed to add submission_vote_timestamp into the first line and " AND (submission_vote_content = '1')" into the 2nd line: SELECT submission_vote_submission_number, submission_vote_timestamp FROM `submission_votes` WHERE ((submission_vote_account_number = '$_SESSION_account_number') AND (submission_vote_content = '1')) GROUP BY `submission_vote_submission_number` HAVING `submission_vote_timestamp`= MAX(`submission_vote_timestamp`) Now for step 2: "find all the users who like at least 1 submission that I like".... Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 28, 2012 Author Share Posted August 28, 2012 I give up, I've spent 3 hours on step 2 but I can't do it. I would post my workings but I don't want to confuse people. Does anyone understand what I'm trying to do? Quote Link to comment Share on other sites More sharing options...
xyph Posted August 28, 2012 Share Posted August 28, 2012 You can always use sub-queries to filter your results. Step 1 - Straightforward SELECT entryid FROM votes WHERE userid = 1 AND value = 1 Step 2 - The JOIN is the query above. The main query selects DISTINCT userids that have a matching entryid to the ones within the JOIN SELECT DISTINCT userid FROM votes JOIN ( SELECT entryid FROM votes WHERE userid = 1 AND value = 1 ) as sub USING(entryid) WHERE userid <> 1 Step 3-4 - Take our previous query, and use it as another join. The innermost JOIN grabs the entries that match the given userid, the other one uses that list to get a list of userids that also have the entryids, and the final, outer query uses that list to grab entryids by those userids, grouping, counting and ordering them. SELECT entryid, COUNT(entryid) as total FROM votes JOIN ( SELECT DISTINCT userid FROM votes JOIN ( SELECT entryid FROM votes WHERE userid = 1 AND value = 1 ) as sub1 USING(entryid) WHERE userid <> 1 ) as sub2 USING(userid) GROUP BY entryid ORDER BY total DESC I'm using the following structure/data. CREATE TABLE IF NOT EXISTS `entries` ( `id` int(11) NOT NULL AUTO_INCREMENT, `userid` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=18 ; INSERT INTO `entries` (`id`, `userid`) VALUES (1, 1), (2, 1), (3, 2), (4, 2), (5, 3), (6, 4), (7, 4), (8, 4), (9, 5), (10, 5), (11, 6), (12, 7), (13, 7), (14, 7), (15, 7), (16, 7), (17, ; CREATE TABLE IF NOT EXISTS `users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(10) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=9 ; INSERT INTO `users` (`id`, `name`) VALUES (1, 'matt'), (2, 'bob'), (3, 'joe'), (4, 'tim'), (5, 'carl'), (6, 'john'), (7, 'foo'), (8, 'bar'); CREATE TABLE IF NOT EXISTS `votes` ( `userid` int(11) NOT NULL, `entryid` int(11) NOT NULL, `value` tinyint(1) NOT NULL DEFAULT '1', UNIQUE KEY `userid` (`userid`,`entryid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO `votes` (`userid`, `entryid`, `value`) VALUES (1, 3, 1), (1, 7, 1), (1, 10, 1), (1, 14, 1), (1, 16, 1), (2, 5, 1), (2, 7, 1), (2, 9, 1), (2, 14, 1), (2, 15, 1), (3, 13, 1), (4, 2, 1), (4, 3, 1), (4, 9, 1), (4, 10, 1), (4, 13, 1), (4, 14, 1), (5, 3, 1), (5, 4, 1), (5, 5, 1), (5, 7, 1), (5, 11, 1), (6, 9, 1), (7, 1, 1), (7, 2, 1), (7, 5, 1), (7, 7, 1), (7, 9, 1), (7, 10, 1), (8, 1, 1), (8, 4, 1), (8, 16, 1); Quote Link to comment Share on other sites More sharing options...
Barand Posted August 28, 2012 Share Posted August 28, 2012 As you are always going to be using the latest votes for this exercise I'd suggest creating a temporary table "latest_votes" which will contain just the latest votes by each voter for each submission and use this instead of the "submission_votes" table. First CREATE TEMPORARY TABLE latest_votes LIKE submission_votes; then INSERT INTO latest_votes SELECT s.* FROM submission_votes s INNER JOIN ( SELECT sv_account_number, sv_submission_number, MAX(sv_timestamp) as latest FROM submission_votes GROUP BY sv_account_number, sv_submission_number ) as x ON s.sv_account_number = x.sv_account_number AND s.sv_submission_number = x.sv_submission_number AND s.sv_timestamp = x.latest; Note: You will have to edit the query to use your column names. Using your column names in queries was worse than writing my Last Will and Testament Quote Link to comment Share on other sites More sharing options...
xyph Posted August 28, 2012 Share Posted August 28, 2012 I think a better solution to this would be to NOT store multiple votes per user. Make a single unique index on the (userid,submissionid) columns. INSERT INTO submission_votes SET userid = $userid, submissionid = $submissionid, content = $likeOrNot ON DUPLICATE KEY UPDATE content = $likeOrNot Quote Link to comment Share on other sites More sharing options...
Barand Posted August 29, 2012 Share Posted August 29, 2012 That would make his life easier (and ours) Quote Link to comment Share on other sites More sharing options...
xyph Posted August 29, 2012 Share Posted August 29, 2012 Yeah, I completely skimmed over that requirement of the query. Quote Link to comment Share on other sites More sharing options...
Barand Posted August 29, 2012 Share Posted August 29, 2012 Once you have the latest votes (whichever path you take) the rest of your problem is virtually identical to to your previous issue of "those who bought your items also bought ..." so you should be able to work it out. Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 30, 2012 Author Share Posted August 30, 2012 Ok thank you, I'll yield and use single votes for each submission/user combination. I've added the missing quotes incase someone googles this: INSERT INTO submission_votes SET userid = '$userid', submissionid = '$submissionid', content = '$likeOrNot' ON DUPLICATE KEY UPDATE content = '$likeOrNot' I'll try steps 2-4 tomorrow. Thank you :-) Stu Quote Link to comment Share on other sites More sharing options...
xyph Posted August 30, 2012 Share Posted August 30, 2012 If they're all integers, you don't need and shouldn't use quotes. If you don't use quotes, real_escape_string doesn't work though, so you should instead check using ctype_digit if( !ctype_digit((string)$varToCheck) ) { // bad input } If you don't need to explicitly check for valid input, you can simply cast it as an integer Don't forget to make sure that anything that shouldn't be 0, isn't Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 30, 2012 Author Share Posted August 30, 2012 Sorry I'm back I tried this code from the other thread but the result isn't quite right: $query_recommendations=" SELECT submission_vote_submission_number, COUNT(*) as recommendation_subtotal FROM submission_votes INNER JOIN ( SELECT submission_vote_account_number FROM submission_votes INNER JOIN ( SELECT submission_vote_submission_number FROM submission_votes WHERE submission_vote_account_number = 1209 ) as X ON submission_votes.submission_vote_submission_number = X.submission_vote_submission_number WHERE submission_vote_account_number <> 1209 ) as Y ON submission_votes.submission_vote_account_number = Y.submission_vote_account_number GROUP BY submission_vote_submission_number ORDER BY recommendation_subtotal DESC; "; I assume the 2 lines below that I edited are horribly wrong? as X ON submission_votes.submission_vote_submission_number = X.submission_vote_submission_number as Y ON submission_votes.submission_vote_account_number = Y.submission_vote_account_number (It returned the following result arrays. The submission_vote_submission_number values are wrong and the count is just returning the number of records?): Array ( [0] => 1070 [submission_vote_submission_number] => 1070 [1] => 7 [recommendation_subtotal] => 7 ) Array ( [0] => 1071 [submission_vote_submission_number] => 1071 [1] => 7 [recommendation_subtotal] => 7 ) Array ( [0] => 1072 [submission_vote_submission_number] => 1072 [1] => 7 [recommendation_subtotal] => 7 ) Array ( [0] => 1073 [submission_vote_submission_number] => 1073 [1] => 7 [recommendation_subtotal] => 7 ) Array ( [0] => 1074 [submission_vote_submission_number] => 1074 [1] => 7 [recommendation_subtotal] => 7 ) Array ( [0] => 1075 [submission_vote_submission_number] => 1075 [1] => 7 [recommendation_subtotal] => 7 ) Thanks, Stu Quote Link to comment Share on other sites More sharing options...
Barand Posted August 31, 2012 Share Posted August 31, 2012 There needs to be something in there to check for who "likes" the submissions Quote Link to comment Share on other sites More sharing options...
Barand Posted August 31, 2012 Share Posted August 31, 2012 I tried this against my test data SELECT sv_submission_number, COUNT(*) as recommendation_subtotal FROM submission_votes INNER JOIN ( SELECT DISTINCT sv_account_number FROM submission_votes INNER JOIN ( SELECT DISTINCT sv_submission_number FROM submission_votes WHERE sv_account_number = 1 AND sv_content = 1 ) as X ON submission_votes.sv_submission_number = X.sv_submission_number WHERE sv_account_number <> 1 ) as Y ON submission_votes.sv_account_number = Y.sv_account_number WHERE sv_content = 1 GROUP BY sv_submission_number ORDER BY recommendation_subtotal DESC; Quote Link to comment Share on other sites More sharing options...
stubarny Posted September 2, 2012 Author Share Posted September 2, 2012 Thank you Barand! :'( Here is the finished code: $query_recommendations=" SELECT submission_votes.submission_vote_submission_number, COUNT(*) as recommendation_subtotal FROM submission_votes INNER JOIN ( SELECT submission_vote_account_number, submission_votes.submission_vote_submission_number FROM submission_votes INNER JOIN ( SELECT submission_vote_submission_number FROM submission_votes WHERE (submission_vote_account_number = 1200 AND submission_vote_content = 1) ) as X ON submission_votes.submission_vote_submission_number = X.submission_vote_submission_number WHERE (submission_votes.submission_vote_account_number <> 1200 AND submission_votes.submission_vote_content = 1) ) as Y ON submission_votes.submission_vote_account_number = Y.submission_vote_account_number WHERE submission_votes.submission_vote_content = 1 GROUP BY submission_votes.submission_vote_submission_number ORDER BY recommendation_subtotal DESC; "; which gives the following result: nrows_recommendations is 4 Array ( [0] => 1069 [submission_vote_submission_number] => 1069 [1] => 3 [recommendation_subtotal] => 3 ) Array ( [0] => 1074 [submission_vote_submission_number] => 1074 [1] => 3 [recommendation_subtotal] => 3 ) Array ( [0] => 1073 [submission_vote_submission_number] => 1073 [1] => 2 [recommendation_subtotal] => 2 ) Array ( [0] => 1070 [submission_vote_submission_number] => 1070 [1] => 1 [recommendation_subtotal] => 1 ) (it doesn't use SELECT DISTINCT submission_vote_account_number but I'm fine with this because the above code gives more weighting to accounts that have more in common with the user). Out of curiosity I have an academic question.... .... if I comment out the group by statement in the penultimate line: $query_recommendations=" SELECT submission_votes.submission_vote_submission_number, COUNT(*) as recommendation_subtotal FROM submission_votes INNER JOIN ( SELECT submission_vote_account_number, submission_votes.submission_vote_submission_number FROM submission_votes INNER JOIN ( SELECT submission_vote_submission_number FROM submission_votes WHERE (submission_vote_account_number = 1200 AND submission_vote_content = 1) ) as X ON submission_votes.submission_vote_submission_number = X.submission_vote_submission_number WHERE (submission_votes.submission_vote_account_number <> 1200 AND submission_votes.submission_vote_content = 1) ) as Y ON submission_votes.submission_vote_account_number = Y.submission_vote_account_number WHERE submission_votes.submission_vote_content = 1 # GROUP BY submission_votes.submission_vote_submission_number ORDER BY recommendation_subtotal DESC; "; Then I get the following result: nrows_recommendations is 1 Array ( [0] => 1069 [submission_vote_submission_number] => 1069 [1] => 9 [recommendation_subtotal] => 9 ) Why do I only get 1 row of data but a subtotal count of 9? Thanks, Stu Quote Link to comment Share on other sites More sharing options...
Barand Posted September 2, 2012 Share Posted September 2, 2012 Because you are using an aggregation function (ie COUNT() ) without a group by clause, it aggregates the whole set of data Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.