Jump to content

Collaberative Filtering


stubarny

Recommended Posts

Hi,

 

I'd like to create a "people who liked the products you bought also bought..." tool

 

For example please see below a list of users and purchased products,

 

User | Product

David | Lawn Mower

David | Greenhouse

Sam | Greenhouse

Sam | Paving slab

Geoff | Lawn Mower

Geoff | BBQ

Peter | Greenhouse

Peter | BBQ

Tim | BBQ

Tim | Chainsaw

 

Say my user is "David" - I'd like to identify the people who bought at least 1 identical item as David (i.e. everyone except Tim), and return a count of the products these people bought (sorted by the count field):

Product | Count

Greenhouse | 2

BBQ | 2

Paving slab | 1

Lawn Mower | 1

 

Please could you tell me how to do this? Can it be done in one SQL query?

 

Thanks,

 

Stu

 

 

Link to comment
Share on other sites

With the data given

 

SELECT u.product, COUNT(*) as tot
FROM userprod u
INNER JOIN (
    SELECT user 
    FROM userprod
    INNER JOIN (
        SELECT product FROM userprod WHERE user='David'
        ) as X using (product)
    WHERE user <> 'David'
    ) as Y USING (user)
GROUP BY product
ORDER BY tot DESC

Link to comment
Share on other sites

Thank you, I see your point about the foreign ID's - how would I code for the below structure?

 

User_ID | Username

1 | David

2 | Sam

3 | Geoff

4 | Peter

5 | Tim

 

Product_ID | ProductName

1 | Lawn Mower

2 | Greenhouse

3 | Paving slab

4 | BBQ

5 | Chainsaw

 

Purchase_ID | User_ID | Product_ID

1 | 1 | 1

2 | 1 | 2

3 | 2 | 2

4 | 2 | 3

5 | 3 | 1

6 | 3 | 4

7 | 4 | 2

8 | 4 | 4

9 | 5 | 4

10 | 5 | 5

 

Output table:

 

Subtotal_ID | Product_ID | Count

1 | 2 | 2

2 | 4 | 2

3 | 3 | 1

4 | 1 | 1

 

Thanks,

 

Stu

Link to comment
Share on other sites

I created an "alsobought" table for the output (count is reserved word so don't use as col name)

You will prob create a temp table - if you really want a table

 

CREATE TABLE `alsobought` (
  `subtotal_id` int(11) NOT NULL AUTO_INCREMENT,
  `product_id` int(11) DEFAULT NULL,
  `product_count` int(11) DEFAULT NULL,
  PRIMARY KEY (`subtotal_id`)
) 

 

Query:

INSERT INTO alsobought (product_id, product_count)
SELECT p.product_id, COUNT(*) as count
FROM purchase p
INNER JOIN (
    SELECT user_id 
    FROM purchase
    INNER JOIN (
        SELECT product_id FROM purchase WHERE user_id = 1
        ) as X using (product_id)
    WHERE user_id <> 1
    ) as Y USING (user_id)
GROUP BY product_id
ORDER BY count DESC

 

output from alsobought table

+-------------+------------+---------------+

| subtotal_id | product_id | product_count |

+-------------+------------+---------------+

|          1 |          2 |            2 |

|          2 |          4 |            2 |

|          3 |          3 |            1 |

|          4 |          1 |            1 |

+-------------+------------+---------------+

Link to comment
Share on other sites

If you want you retrieve product's name and count all of them except Tim's products, there's my suggest.

Tip: Tim's id is 5.

SELECT `pr`.`name` , COUNT( * ) AS tot
FROM `product` AS `pr`
WHERE `pr`.`product_id`
IN (SELECT `p`.`product_id`
FROM `purchase` AS `p`
WHERE `p`.`user_id` <> 5
)
GROUP BY `pr`.`name`
ORDER BY tot DESC

Link to comment
Share on other sites

Hello, thank you for your patience with me I promise I'm not being deliberately inept :-)

 

I'm afraid I'm getting confused by your solutions because I've used the same field names in different tables, please could you allow me to have another try where I have the table name at the start of each field name? I've also put an underscore in any field name that is a foreign key, and just to clarify the output doesn't have to be a table - just a normal SQL results array:

 

users

userID | userUsername

1 | David

2 | Sam

3 | Geoff

4 | Peter

5 | Tim

 

products

productID | productName

1 | Lawn Mower

2 | Greenhouse

3 | Paving slab

4 | BBQ

5 | Chainsaw

 

purchases

purchaseID | purchase_userID | purchase_productID

1 | 1 | 1

2 | 1 | 2

3 | 2 | 2

4 | 2 | 3

5 | 3 | 1

6 | 3 | 4

7 | 4 | 2

8 | 4 | 4

9 | 5 | 4

10 | 5 | 5

 

SQL query output:

recommendationID | recommendation_productID | recommendationSubtotal

1 | 2 | 2

2 | 4 | 2

3 | 3 | 1

4 | 1 | 1

 

 

Please could you tell me if the below code is correct (sorry I've removed aliases because I find it hard to remember which tables they relate to):

 

SELECT 'products'.'productID' As recommendation_productID, COUNT(*) AS recommendationSubtotal

FROM 'products'

WHERE 'products'.'productID'

IN (SELECT 'purchases'.'purchaseID'

FROM 'purchases'

WHERE 'purchases'.'purchase_UserID' <> 5

)

GROUP BY 'products'.'productName'

ORDER BY recommendationSubtotal DESC

 

 

Many thanks,

 

Stu

Link to comment
Share on other sites

Ok thanks Barand, please could you tell me if this is correct (I think I just needed to remove the first line of the sql query to stop it trying to insert into a table?):

 

SELECT products.product_id As recommendation_productID, COUNT(*) As recommendationSubtotal

FROM purchases

INNER JOIN (

    SELECT purchaseUserID

    FROM purchases

    INNER JOIN (

        SELECT purchase_productID FROM purchases WHERE purchase_userId = 1

        ) as X using (product_id)

    WHERE purchase_UserID <> 1

    ) as Y USING (purchase_userID)

GROUP BY productID

ORDER BY recommendationSubtotal DESC

Link to comment
Share on other sites

i fixed the table and column names for you

SELECT purchases.purchase_productID, COUNT(*) as recommendationSubtotal
FROM purchases
INNER JOIN (
    SELECT purchase_UserID
    FROM purchases
    INNER JOIN (
        SELECT purchase_productID FROM purchases WHERE purchase_userID = 1
        ) as X using (purchase_productID)
    WHERE purchase_userID <> 1
    ) as Y USING (purchase_userID)
GROUP BY purchase_productID
ORDER BY recommendationSubtotal DESC;

Link to comment
Share on other sites

@stubarny, may be the Barand's example is a little bit complex for you, take a look at mine:

You have 3 tables - users, purchases, products

 

USERS:

+----+-------+

| id | name  |

+----+-------+

|  1 | David |

|  2 | Sam  |

|  3 | Geoff |

|  4 | Peter |

|  5 | Tim  |

+----+-------+

 

PURCHASES

+----+-----------------+--------------------+

| id | purchase_userID | purchase_productID |

+----+-----------------+--------------------+

|  1 |              1 |                  1 |

|  2 |              1 |                  2 |

|  3 |              2 |                  2 |

|  4 |              2 |                  3 |

|  5 |              3 |                  1 |

|  6 |              3 |                  4 |

|  7 |              4 |                  2 |

|  8 |              4 |                  4 |

|  9 |              5 |                  4 |

| 10 |              5 |                  5 |

+----+-----------------+--------------------+

 

PRODUCTS

+----+-------------+

| id | productName |

+----+-------------+

|  1 | Lawn Mower  |

|  2 | Greenhouse  |

|  3 | Paving slab |

|  4 | BBQ        |

|  5 | Chainsaw    |

+----+-------------+

 

Let us say, we want to retrieve all purchase_productID and count them from all users which id is no 5.

Our query is gonna be like this:

 

SELECT `p`.`id` , `pr`.`productName` , COUNT( `p`.`purchase_productID` ) AS `total`
FROM `purchases` AS `p`
JOIN `products` AS `pr` ON `pr`.`id` = `p`.`purchase_productID`
WHERE `p`.`purchase_userID`
IN (
1,2,3,4
)
GROUP BY `p`.`purchase_productID`
ORDER BY `p`.`id` DESC;

 

If you want the result IN (1,2,3,4) to be  dynamic, you could use a subquery like this:

 

SELECT `p`.`id` , `pr`.`productName` , COUNT( `p`.`purchase_productID` ) AS `total`
FROM `purchases` AS `p`
JOIN `products` AS `pr` ON `pr`.`id` = `p`.`purchase_productID`
WHERE `p`.`purchase_userID`
IN (
SELECT `u`.`id`
FROM `users` AS `u`
WHERE `u`.`id` <>5
)
GROUP BY `p`.`purchase_productID`
ORDER BY `p`.`id` DESC;

 

RESULT:

+----+-------------+-------+

| id | productName | total |

+----+-------------+-------+

|  6 | BBQ        |    2 |

|  4 | Paving slab |    1 |

|  2 | Greenhouse  |    3 |

|  1 | Lawn Mower  |    2 |

+----+-------------+-------+

 

If you want to see what purchases David was made (him id is 1) :

SELECT `p`.`id` , `pr`.`productName` , COUNT( `p`.`purchase_productID` ) AS `total`
FROM `purchases` AS `p`
JOIN `products` AS `pr` ON `pr`.`id` = `p`.`purchase_productID`
WHERE `p`.`purchase_userID`
IN (
SELECT `u`.`id`
FROM `users` AS `u`
WHERE `u`.`id` = 1
)
GROUP BY `p`.`purchase_productID`
ORDER BY `p`.`id` DESC;

 

RESULT:

+----+-------------+-------+

| id | productName | total |

+----+-------------+-------+

|  2 | Greenhouse  |    1 |

|  1 | Lawn Mower  |    1 |

+----+-------------+-------+

 

 

Link to comment
Share on other sites

It's a simple standard many-to-many relationship.

+------------+        +------------+        +-----------+

|  user      |        | purchase  |        | product  |

+------------+        +------------+        +-----------+

|  userID    | --+    | id        |  +--- | productID |

|  name      |  +--< | userID    |  |    | descrip  |

+------------+        | productID  | >-+    +-----------+

                      +------------+

 

How much more modelling can you do on this?

Link to comment
Share on other sites

So every purchase can only have one product, and only one instance of this product..? That's what your table structure says, which I suspect isn't quite correct. ;)

The example used in the video is in fact exactly the same that you're trying to set up here, which is why I dug it up. I'm quite certain you'll find it useful, and hopefully informative as well. :)

 

Link to comment
Share on other sites

So every purchase can only have one product, and only one instance of this product..? That's what your table structure says, which I suspect isn't quite correct. ;)

The example used in the video is in fact exactly the same that you're trying to set up here, which is why I dug it up. I'm quite certain you'll find it useful, and hopefully informative as well. :)

 

 

You could always just have a transactionID go along in the purchases table, tying multiple purchases to one overall transaction.

 

Gotta watch out over-normalizing though. Can lead to performance issues.

Link to comment
Share on other sites

There are users and there are products.

 

@Christian,

 

A user can purchase many products.

 

A product can be purchased by many users.

 

To resolve this condition we introduce an intermediate table with foreign keys - one for the user and one for the product. This is called a many-to-many relationship. When you have been doing this for a while you'll come across many instances of these.

 

 

@xyph

Agreed, but these are the columns relevant to the problem in hand - ie users who bought the products you bought also bought what?

Link to comment
Share on other sites

Barand: Trust me, I've been doing it for a while. Quite a long while too, in fact. :P

My point was that your design violates the first normal form, in that if you support multiple items per order, then both the customer ID and order ID has to be repeated for each item ordered. A user can have many orders, which can consist of many items. One item can also be found in many orders, so the many-many relationship isn't between users and items, but items and orders.

What you've done is to mix together items ordered and orders. So you need to split the "purchase" table into two tables, "orders" and "order_items", just like shown in the video. (I hope you are watching/have watched them by now, as I really don't want to repeat everything said in them.)

 

Though, thanks for attempting to share your knowledge. :)

 

xyph: Performance issues from 3rd NF isn't much of a concern for most applications, and if you do find yourself in that situation then you have a hell of a lot of traffic. A problem best solved with proper caching. ;)

Link to comment
Share on other sites

I think you two are trying to solve different problems. I was simply trying to merge them.

 

@ChristianF - I never said this example was over-normalized, simply that it can and does happen. Some data just doesn't need to be normalized, and you can absorb performance loss on generating reports to gain performance on more common queries.

Link to comment
Share on other sites

Christian It is not my design, it is the data given by the OP. I'm well aware there would be a table of orders and order_items in the real world. This is the data necessary to solve his problem, viz. "People who purchased your items also purchased ...."

 

I've been developing for over 40 years and designing and using relational databases professionally for over 25 years so don't patronise me and try to teach me to suck eggs.

Link to comment
Share on other sites

Thanks everyone,

 

Please may I ask what these 2 bold parts of the SQL query below do?:

 

SELECT purchases.purchase_productID, COUNT(*) as recommendationSubtotal

FROM purchases

INNER JOIN (

    SELECT purchase_UserID

    FROM purchases

    INNER JOIN (

        SELECT purchase_productID FROM purchases WHERE purchase_userID = 1

        ) as X using (purchase_productID)

    WHERE purchase_userID <> 1

    ) as Y USING (purchase_userID)

GROUP BY purchase_productID

ORDER BY recommendationSubtotal DESC;

 

I assume they create aliased tables called X and Y (which I guess I don't technically need in my code?)? But please could you tell me what the "using (purchase_productID)" and "using (purchase_userID)" does?

 

Thanks,

 

Stu

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.