Jump to content

collaberative filtering - post 'likes' and 'dislikes'


stubarny

Recommended Posts

Hi All,

 

Thanks for your help on the other collaberative filtering thread. I'd just like to post a similar problem I had a while back that I gave up on - but I think in hindsight it may be possible to solve?

 

Let's say we have a website where users post "submissions" and then can vote whether they like or dislike each other's submissions by submitting "submission_votes".

 

I'd like to create a list of recommended posts by following the below logic:

1. find all the submissions that I like

2. find all the users who like at least 1 submission that I like

3. find all the submissions that are liked by users who like at least 1 submission that I like

4. order the returned submissions by the number users who like them (by "users" I mean "users who like at least 1 submission that I like")

 

The tricky bit(?) is that users can vote on a submission more than once (maybe they disliked it at first, but now like it) and therefore only the latest vote should be used (for each combination of submission_vote_account_number and submission_vote_submission_number).

 

 

Please find the table structures below:

 

submissions:

submission_number (unique index, autoincrementing)

submission_content

submission_account_number  (foreign key --> field "account_number" in table "accounts")

submission_timestamp

 

submission_votes:

submission_vote_index (unique index, autoincrementing)

submission_vote_account_number (foreign key --> field "account_number" in table "accounts")

submission_vote_submission_number  (foreign key --> field "submission_number" in table "submissions")

submission_vote_content (1 or 0 - 1 being like, 0 being dislike)

submission_vote_timestamp

 

accounts:

account_number (unique index, autoincrementing)

account_username

 

 

Thanks,

 

Stu

Link to comment
Share on other sites

Assuming my account number is 1:

 

Step1: " SELECT submission_vote_submission_number WHERE submission_vote_account_number = '1' "

 

but this is finding all votes where I "liked" a submission, how do I change this so that it only looks at my last vote for each submission?

Link to comment
Share on other sites

Assuming my account number is 1:

 

Step1: " SELECT submission_vote_submission_number WHERE submission_vote_account_number = '1' "

 

but this is finding all votes where I "liked" a submission, how do I change this so that it only looks at my last vote for each submission?

 

Just add one more statement to compare current unixtimestamp to last date record.

Link to comment
Share on other sites

thanks but I need to find my last vote for each submission that I've voted on...

 

I think the code below will group by submission_vote_submission_number and then order by timestamp, but how do I ignore all but the most recent record for each value of submission_vote_submission_number?

 

Step1: "SELECT submission_vote_submission_number WHERE submission_vote_account_number = '1'  GROUP BY submission_vote_submission_number, submission_vote_timestamp DESC"

 

Thanks,

 

Stu

Link to comment
Share on other sites

Try,

 

SELECT submission_vote_submission_number FROM `submission_votes` WHERE submission_vote_account_number = '1'  GROUP BY `submission_vote_submission_number` HAVING `submission_vote_timestamp`= MAX(`submission_vote_timestamp`);

 

Why are you using only one table? Use both, join them and make the query much more flexible.

 

Link to comment
Share on other sites

thanks I'll try that code

 

Why are you using only one table? Use both, join them and make the query much more flexible.

 

step 1 is to just find which submissions I currently "like". so in steps 2-4 I'll probably be joining tables (and asking you for help  :D)

 

Thanks,

 

Stu

Link to comment
Share on other sites

Thank you - I have step 1 (find all the submissions that I like) working. I just needed to add submission_vote_timestamp into the first line and " AND (submission_vote_content = '1')" into the 2nd line:

 

SELECT submission_vote_submission_number, submission_vote_timestamp FROM `submission_votes`

WHERE ((submission_vote_account_number = '$_SESSION_account_number') AND (submission_vote_content = '1')) 

GROUP BY `submission_vote_submission_number`

HAVING `submission_vote_timestamp`= MAX(`submission_vote_timestamp`)

 

Now for step 2: "find all the users who like at least 1 submission that I like"....

Link to comment
Share on other sites

You can always use sub-queries to filter your results.

 

Step 1 - Straightforward

SELECT entryid FROM votes
WHERE userid = 1 AND value = 1

 

Step 2 - The JOIN is the query above. The main query selects DISTINCT userids that have a matching entryid to the ones within the JOIN

SELECT DISTINCT userid
FROM votes
JOIN (
  SELECT entryid FROM votes
  WHERE userid = 1 AND value = 1
) as sub USING(entryid)
WHERE userid <> 1

 

Step 3-4 - Take our previous query, and use it as another join. The innermost JOIN grabs the entries that match the given userid, the other one uses that list to get a list of userids that also have the entryids, and the final, outer query uses that list to grab entryids by those userids, grouping, counting and ordering them.

SELECT entryid, COUNT(entryid) as total
FROM votes
JOIN (
  SELECT DISTINCT userid
  FROM votes
  JOIN (
    SELECT entryid FROM votes
    WHERE userid = 1 AND value = 1
  ) as sub1 USING(entryid)
  WHERE userid <> 1
) as sub2 USING(userid)
GROUP BY entryid
ORDER BY total DESC

 

I'm using the following structure/data.

CREATE TABLE IF NOT EXISTS `entries` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `userid` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=18 ;

INSERT INTO `entries` (`id`, `userid`) VALUES
(1, 1),
(2, 1),
(3, 2),
(4, 2),
(5, 3),
(6, 4),
(7, 4),
(8, 4),
(9, 5),
(10, 5),
(11, 6),
(12, 7),
(13, 7),
(14, 7),
(15, 7),
(16, 7),
(17, ;

CREATE TABLE IF NOT EXISTS `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=9 ;

INSERT INTO `users` (`id`, `name`) VALUES
(1, 'matt'),
(2, 'bob'),
(3, 'joe'),
(4, 'tim'),
(5, 'carl'),
(6, 'john'),
(7, 'foo'),
(8, 'bar');

CREATE TABLE IF NOT EXISTS `votes` (
  `userid` int(11) NOT NULL,
  `entryid` int(11) NOT NULL,
  `value` tinyint(1) NOT NULL DEFAULT '1',
  UNIQUE KEY `userid` (`userid`,`entryid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `votes` (`userid`, `entryid`, `value`) VALUES
(1, 3, 1),
(1, 7, 1),
(1, 10, 1),
(1, 14, 1),
(1, 16, 1),
(2, 5, 1),
(2, 7, 1),
(2, 9, 1),
(2, 14, 1),
(2, 15, 1),
(3, 13, 1),
(4, 2, 1),
(4, 3, 1),
(4, 9, 1),
(4, 10, 1),
(4, 13, 1),
(4, 14, 1),
(5, 3, 1),
(5, 4, 1),
(5, 5, 1),
(5, 7, 1),
(5, 11, 1),
(6, 9, 1),
(7, 1, 1),
(7, 2, 1),
(7, 5, 1),
(7, 7, 1),
(7, 9, 1),
(7, 10, 1),
(8, 1, 1),
(8, 4, 1),
(8, 16, 1);

Link to comment
Share on other sites

As you are always going to be using the latest votes for this exercise I'd suggest creating a temporary table "latest_votes" which will contain just the latest votes by each voter for each submission and use this instead of the "submission_votes" table.

 

First

CREATE TEMPORARY TABLE latest_votes LIKE submission_votes;

 

then

INSERT INTO latest_votes
SELECT s.* 
FROM submission_votes s
    INNER JOIN
    (
    SELECT sv_account_number, sv_submission_number, MAX(sv_timestamp) as latest
    FROM submission_votes
    GROUP BY sv_account_number, sv_submission_number
    ) as x 
    ON s.sv_account_number = x.sv_account_number
        AND s.sv_submission_number = x.sv_submission_number
        AND s.sv_timestamp = x.latest;

 

Note: You will have to edit the query to use your column names. Using your column names in queries was worse than writing my Last Will and Testament

Link to comment
Share on other sites

I think a better solution to this would be to NOT store multiple votes per user. Make a single unique index on the (userid,submissionid) columns.

 

INSERT INTO submission_votes SET userid = $userid, submissionid = $submissionid, content = $likeOrNot

ON DUPLICATE KEY UPDATE content = $likeOrNot

Link to comment
Share on other sites

Ok thank you, I'll yield and use single votes for each submission/user combination.

 

I've added the missing quotes incase someone googles this:

 

INSERT INTO submission_votes SET userid = '$userid', submissionid = '$submissionid', content = '$likeOrNot'

ON DUPLICATE KEY UPDATE content = '$likeOrNot'

 

I'll try steps 2-4 tomorrow.

 

Thank you :-)

 

Stu

Link to comment
Share on other sites

If they're all integers, you don't need and shouldn't use quotes.

 

If you don't use quotes, real_escape_string doesn't work though, so you should instead check using ctype_digit

if( !ctype_digit((string)$varToCheck) ) {
   // bad input
}

 

If you don't need to explicitly check for valid input, you can simply cast it as an integer

 

Don't forget to make sure that anything that shouldn't be 0, isn't :)

Link to comment
Share on other sites

Sorry I'm back  :D

 

I tried this code from the other thread but the result isn't quite right:

 


$query_recommendations="

SELECT submission_vote_submission_number, COUNT(*) as recommendation_subtotal
FROM submission_votes
INNER JOIN (
    SELECT submission_vote_account_number
    FROM submission_votes
    INNER JOIN (
        SELECT submission_vote_submission_number FROM submission_votes WHERE submission_vote_account_number = 1209
        ) as X ON submission_votes.submission_vote_submission_number = X.submission_vote_submission_number
    WHERE submission_vote_account_number <> 1209
    ) as Y ON submission_votes.submission_vote_account_number = Y.submission_vote_account_number
GROUP BY submission_vote_submission_number
ORDER BY recommendation_subtotal DESC;
";

 

I assume the 2 lines below that I edited are horribly wrong?

as X ON submission_votes.submission_vote_submission_number = X.submission_vote_submission_number

as Y ON submission_votes.submission_vote_account_number = Y.submission_vote_account_number

 

(It returned the following result arrays. The submission_vote_submission_number values are wrong and the count is just returning the number of

records?):

 

Array ( [0] => 1070 [submission_vote_submission_number] => 1070 [1] => 7 [recommendation_subtotal] => 7 ) 
Array ( [0] => 1071 [submission_vote_submission_number] => 1071 [1] => 7 [recommendation_subtotal] => 7 ) 
Array ( [0] => 1072 [submission_vote_submission_number] => 1072 [1] => 7 [recommendation_subtotal] => 7 ) 
Array ( [0] => 1073 [submission_vote_submission_number] => 1073 [1] => 7 [recommendation_subtotal] => 7 ) 
Array ( [0] => 1074 [submission_vote_submission_number] => 1074 [1] => 7 [recommendation_subtotal] => 7 ) 
Array ( [0] => 1075 [submission_vote_submission_number] => 1075 [1] => 7 [recommendation_subtotal] => 7 ) 

 

Thanks,

 

Stu

Link to comment
Share on other sites

I tried this against my test data

 

SELECT sv_submission_number, COUNT(*) as recommendation_subtotal
FROM submission_votes
INNER JOIN (
    SELECT DISTINCT sv_account_number
    FROM submission_votes
    INNER JOIN (
        SELECT DISTINCT sv_submission_number FROM submission_votes 
        WHERE sv_account_number = 1
        AND sv_content = 1
        ) as X ON submission_votes.sv_submission_number = X.sv_submission_number
    WHERE sv_account_number <> 1
    ) as Y ON submission_votes.sv_account_number = Y.sv_account_number
WHERE sv_content = 1  
GROUP BY sv_submission_number
ORDER BY recommendation_subtotal DESC;

Link to comment
Share on other sites

Thank you Barand!  :'(

 

Here is the finished code:

 

	$query_recommendations="

	SELECT submission_votes.submission_vote_submission_number, COUNT(*) as recommendation_subtotal
	FROM submission_votes
	INNER JOIN (
	    SELECT submission_vote_account_number, submission_votes.submission_vote_submission_number
	    FROM submission_votes
	    INNER JOIN (
	        SELECT submission_vote_submission_number FROM submission_votes WHERE (submission_vote_account_number = 1200 AND submission_vote_content = 1)
	        ) as X ON submission_votes.submission_vote_submission_number = X.submission_vote_submission_number
	    WHERE (submission_votes.submission_vote_account_number <> 1200 AND submission_votes.submission_vote_content = 1)
	    ) as Y ON submission_votes.submission_vote_account_number = Y.submission_vote_account_number
	    WHERE submission_votes.submission_vote_content = 1
	GROUP BY submission_votes.submission_vote_submission_number
	ORDER BY recommendation_subtotal DESC;
";

 

which gives the following result:

 

nrows_recommendations is 4

Array ( [0] => 1069 [submission_vote_submission_number] => 1069 [1] => 3 [recommendation_subtotal] => 3 )

Array ( [0] => 1074 [submission_vote_submission_number] => 1074 [1] => 3 [recommendation_subtotal] => 3 )

Array ( [0] => 1073 [submission_vote_submission_number] => 1073 [1] => 2 [recommendation_subtotal] => 2 )

Array ( [0] => 1070 [submission_vote_submission_number] => 1070 [1] => 1 [recommendation_subtotal] => 1 )

 

(it doesn't use SELECT DISTINCT submission_vote_account_number but I'm fine with this because the above code gives more weighting to accounts that have more in common with the user).

 

Out of curiosity I have an academic question....  :D

 

....  if I comment out the group by statement in the penultimate line:

 

	$query_recommendations="

	SELECT submission_votes.submission_vote_submission_number, COUNT(*) as recommendation_subtotal
	FROM submission_votes
	INNER JOIN (
	    SELECT submission_vote_account_number, submission_votes.submission_vote_submission_number
	    FROM submission_votes
	    INNER JOIN (
	        SELECT submission_vote_submission_number FROM submission_votes WHERE (submission_vote_account_number = 1200 AND submission_vote_content = 1)
	        ) as X ON submission_votes.submission_vote_submission_number = X.submission_vote_submission_number
	    WHERE (submission_votes.submission_vote_account_number <> 1200 AND submission_votes.submission_vote_content = 1)
	    ) as Y ON submission_votes.submission_vote_account_number = Y.submission_vote_account_number
	    WHERE submission_votes.submission_vote_content = 1
#	GROUP BY submission_votes.submission_vote_submission_number
	ORDER BY recommendation_subtotal DESC;
";

 

Then I get the following result:

 

nrows_recommendations is 1

Array ( [0] => 1069 [submission_vote_submission_number] => 1069 [1] => 9 [recommendation_subtotal] => 9 ) 

 

Why do I only get 1 row of data but a subtotal count of 9?  :)

 

Thanks,

 

Stu

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.