stubarny Posted August 26, 2012 Share Posted August 26, 2012 Hi, I'd like to create a "people who liked the products you bought also bought..." tool For example please see below a list of users and purchased products, User | Product David | Lawn Mower David | Greenhouse Sam | Greenhouse Sam | Paving slab Geoff | Lawn Mower Geoff | BBQ Peter | Greenhouse Peter | BBQ Tim | BBQ Tim | Chainsaw Say my user is "David" - I'd like to identify the people who bought at least 1 identical item as David (i.e. everyone except Tim), and return a count of the products these people bought (sorted by the count field): Product | Count Greenhouse | 2 BBQ | 2 Paving slab | 1 Lawn Mower | 1 Please could you tell me how to do this? Can it be done in one SQL query? Thanks, Stu Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted August 26, 2012 Share Posted August 26, 2012 You can do this, but tell us more about tables structure. Do they have IDs, a foreign key, etc.... Quote Link to comment Share on other sites More sharing options...
Barand Posted August 26, 2012 Share Posted August 26, 2012 With the data given SELECT u.product, COUNT(*) as tot FROM userprod u INNER JOIN ( SELECT user FROM userprod INNER JOIN ( SELECT product FROM userprod WHERE user='David' ) as X using (product) WHERE user <> 'David' ) as Y USING (user) GROUP BY product ORDER BY tot DESC Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 26, 2012 Author Share Posted August 26, 2012 Thank you, I see your point about the foreign ID's - how would I code for the below structure? User_ID | Username 1 | David 2 | Sam 3 | Geoff 4 | Peter 5 | Tim Product_ID | ProductName 1 | Lawn Mower 2 | Greenhouse 3 | Paving slab 4 | BBQ 5 | Chainsaw Purchase_ID | User_ID | Product_ID 1 | 1 | 1 2 | 1 | 2 3 | 2 | 2 4 | 2 | 3 5 | 3 | 1 6 | 3 | 4 7 | 4 | 2 8 | 4 | 4 9 | 5 | 4 10 | 5 | 5 Output table: Subtotal_ID | Product_ID | Count 1 | 2 | 2 2 | 4 | 2 3 | 3 | 1 4 | 1 | 1 Thanks, Stu Quote Link to comment Share on other sites More sharing options...
Barand Posted August 26, 2012 Share Posted August 26, 2012 Substitute your names for mine and add join to product table. Half the information gets you half the solution. Quote Link to comment Share on other sites More sharing options...
Barand Posted August 26, 2012 Share Posted August 26, 2012 I created an "alsobought" table for the output (count is reserved word so don't use as col name) You will prob create a temp table - if you really want a table CREATE TABLE `alsobought` ( `subtotal_id` int(11) NOT NULL AUTO_INCREMENT, `product_id` int(11) DEFAULT NULL, `product_count` int(11) DEFAULT NULL, PRIMARY KEY (`subtotal_id`) ) Query: INSERT INTO alsobought (product_id, product_count) SELECT p.product_id, COUNT(*) as count FROM purchase p INNER JOIN ( SELECT user_id FROM purchase INNER JOIN ( SELECT product_id FROM purchase WHERE user_id = 1 ) as X using (product_id) WHERE user_id <> 1 ) as Y USING (user_id) GROUP BY product_id ORDER BY count DESC output from alsobought table +-------------+------------+---------------+ | subtotal_id | product_id | product_count | +-------------+------------+---------------+ | 1 | 2 | 2 | | 2 | 4 | 2 | | 3 | 3 | 1 | | 4 | 1 | 1 | +-------------+------------+---------------+ Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted August 26, 2012 Share Posted August 26, 2012 If you want you retrieve product's name and count all of them except Tim's products, there's my suggest. Tip: Tim's id is 5. SELECT `pr`.`name` , COUNT( * ) AS tot FROM `product` AS `pr` WHERE `pr`.`product_id` IN (SELECT `p`.`product_id` FROM `purchase` AS `p` WHERE `p`.`user_id` <> 5 ) GROUP BY `pr`.`name` ORDER BY tot DESC Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 26, 2012 Author Share Posted August 26, 2012 Hello, thank you for your patience with me I promise I'm not being deliberately inept :-) I'm afraid I'm getting confused by your solutions because I've used the same field names in different tables, please could you allow me to have another try where I have the table name at the start of each field name? I've also put an underscore in any field name that is a foreign key, and just to clarify the output doesn't have to be a table - just a normal SQL results array: users userID | userUsername 1 | David 2 | Sam 3 | Geoff 4 | Peter 5 | Tim products productID | productName 1 | Lawn Mower 2 | Greenhouse 3 | Paving slab 4 | BBQ 5 | Chainsaw purchases purchaseID | purchase_userID | purchase_productID 1 | 1 | 1 2 | 1 | 2 3 | 2 | 2 4 | 2 | 3 5 | 3 | 1 6 | 3 | 4 7 | 4 | 2 8 | 4 | 4 9 | 5 | 4 10 | 5 | 5 SQL query output: recommendationID | recommendation_productID | recommendationSubtotal 1 | 2 | 2 2 | 4 | 2 3 | 3 | 1 4 | 1 | 1 Please could you tell me if the below code is correct (sorry I've removed aliases because I find it hard to remember which tables they relate to): SELECT 'products'.'productID' As recommendation_productID, COUNT(*) AS recommendationSubtotal FROM 'products' WHERE 'products'.'productID' IN (SELECT 'purchases'.'purchaseID' FROM 'purchases' WHERE 'purchases'.'purchase_UserID' <> 5 ) GROUP BY 'products'.'productName' ORDER BY recommendationSubtotal DESC Many thanks, Stu Quote Link to comment Share on other sites More sharing options...
Barand Posted August 26, 2012 Share Posted August 26, 2012 If your problem is "Give me all products I've ever sold that were not bought by Tim" then go with Jazzman's solution. But I don't think that was your original question. Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 26, 2012 Author Share Posted August 26, 2012 Ok thanks Barand, please could you tell me if this is correct (I think I just needed to remove the first line of the sql query to stop it trying to insert into a table?): SELECT products.product_id As recommendation_productID, COUNT(*) As recommendationSubtotal FROM purchases INNER JOIN ( SELECT purchaseUserID FROM purchases INNER JOIN ( SELECT purchase_productID FROM purchases WHERE purchase_userId = 1 ) as X using (product_id) WHERE purchase_UserID <> 1 ) as Y USING (purchase_userID) GROUP BY productID ORDER BY recommendationSubtotal DESC Quote Link to comment Share on other sites More sharing options...
Barand Posted August 26, 2012 Share Posted August 26, 2012 i fixed the table and column names for you SELECT purchases.purchase_productID, COUNT(*) as recommendationSubtotal FROM purchases INNER JOIN ( SELECT purchase_UserID FROM purchases INNER JOIN ( SELECT purchase_productID FROM purchases WHERE purchase_userID = 1 ) as X using (purchase_productID) WHERE purchase_userID <> 1 ) as Y USING (purchase_userID) GROUP BY purchase_productID ORDER BY recommendationSubtotal DESC; Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted August 26, 2012 Share Posted August 26, 2012 @stubarny, may be the Barand's example is a little bit complex for you, take a look at mine: You have 3 tables - users, purchases, products USERS: +----+-------+ | id | name | +----+-------+ | 1 | David | | 2 | Sam | | 3 | Geoff | | 4 | Peter | | 5 | Tim | +----+-------+ PURCHASES +----+-----------------+--------------------+ | id | purchase_userID | purchase_productID | +----+-----------------+--------------------+ | 1 | 1 | 1 | | 2 | 1 | 2 | | 3 | 2 | 2 | | 4 | 2 | 3 | | 5 | 3 | 1 | | 6 | 3 | 4 | | 7 | 4 | 2 | | 8 | 4 | 4 | | 9 | 5 | 4 | | 10 | 5 | 5 | +----+-----------------+--------------------+ PRODUCTS +----+-------------+ | id | productName | +----+-------------+ | 1 | Lawn Mower | | 2 | Greenhouse | | 3 | Paving slab | | 4 | BBQ | | 5 | Chainsaw | +----+-------------+ Let us say, we want to retrieve all purchase_productID and count them from all users which id is no 5. Our query is gonna be like this: SELECT `p`.`id` , `pr`.`productName` , COUNT( `p`.`purchase_productID` ) AS `total` FROM `purchases` AS `p` JOIN `products` AS `pr` ON `pr`.`id` = `p`.`purchase_productID` WHERE `p`.`purchase_userID` IN ( 1,2,3,4 ) GROUP BY `p`.`purchase_productID` ORDER BY `p`.`id` DESC; If you want the result IN (1,2,3,4) to be dynamic, you could use a subquery like this: SELECT `p`.`id` , `pr`.`productName` , COUNT( `p`.`purchase_productID` ) AS `total` FROM `purchases` AS `p` JOIN `products` AS `pr` ON `pr`.`id` = `p`.`purchase_productID` WHERE `p`.`purchase_userID` IN ( SELECT `u`.`id` FROM `users` AS `u` WHERE `u`.`id` <>5 ) GROUP BY `p`.`purchase_productID` ORDER BY `p`.`id` DESC; RESULT: +----+-------------+-------+ | id | productName | total | +----+-------------+-------+ | 6 | BBQ | 2 | | 4 | Paving slab | 1 | | 2 | Greenhouse | 3 | | 1 | Lawn Mower | 2 | +----+-------------+-------+ If you want to see what purchases David was made (him id is 1) : SELECT `p`.`id` , `pr`.`productName` , COUNT( `p`.`purchase_productID` ) AS `total` FROM `purchases` AS `p` JOIN `products` AS `pr` ON `pr`.`id` = `p`.`purchase_productID` WHERE `p`.`purchase_userID` IN ( SELECT `u`.`id` FROM `users` AS `u` WHERE `u`.`id` = 1 ) GROUP BY `p`.`purchase_productID` ORDER BY `p`.`id` DESC; RESULT: +----+-------------+-------+ | id | productName | total | +----+-------------+-------+ | 2 | Greenhouse | 1 | | 1 | Lawn Mower | 1 | +----+-------------+-------+ Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 26, 2012 Share Posted August 26, 2012 A little bit on the side of your question, but I really think you should watch these . By not having a proper model you're making things harder for yourself than what they need to be, so I strongly recommend restructuring your database as done in those videos. Quote Link to comment Share on other sites More sharing options...
Barand Posted August 26, 2012 Share Posted August 26, 2012 It's a simple standard many-to-many relationship. +------------+ +------------+ +-----------+ | user | | purchase | | product | +------------+ +------------+ +-----------+ | userID | --+ | id | +--- | productID | | name | +--< | userID | | | descrip | +------------+ | productID | >-+ +-----------+ +------------+ How much more modelling can you do on this? Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 26, 2012 Share Posted August 26, 2012 So every purchase can only have one product, and only one instance of this product..? That's what your table structure says, which I suspect isn't quite correct. The example used in the video is in fact exactly the same that you're trying to set up here, which is why I dug it up. I'm quite certain you'll find it useful, and hopefully informative as well. Quote Link to comment Share on other sites More sharing options...
xyph Posted August 26, 2012 Share Posted August 26, 2012 So every purchase can only have one product, and only one instance of this product..? That's what your table structure says, which I suspect isn't quite correct. The example used in the video is in fact exactly the same that you're trying to set up here, which is why I dug it up. I'm quite certain you'll find it useful, and hopefully informative as well. You could always just have a transactionID go along in the purchases table, tying multiple purchases to one overall transaction. Gotta watch out over-normalizing though. Can lead to performance issues. Quote Link to comment Share on other sites More sharing options...
Barand Posted August 26, 2012 Share Posted August 26, 2012 There are users and there are products. @Christian, A user can purchase many products. A product can be purchased by many users. To resolve this condition we introduce an intermediate table with foreign keys - one for the user and one for the product. This is called a many-to-many relationship. When you have been doing this for a while you'll come across many instances of these. @xyph Agreed, but these are the columns relevant to the problem in hand - ie users who bought the products you bought also bought what? Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 26, 2012 Share Posted August 26, 2012 Barand: Trust me, I've been doing it for a while. Quite a long while too, in fact. My point was that your design violates the first normal form, in that if you support multiple items per order, then both the customer ID and order ID has to be repeated for each item ordered. A user can have many orders, which can consist of many items. One item can also be found in many orders, so the many-many relationship isn't between users and items, but items and orders. What you've done is to mix together items ordered and orders. So you need to split the "purchase" table into two tables, "orders" and "order_items", just like shown in the video. (I hope you are watching/have watched them by now, as I really don't want to repeat everything said in them.) Though, thanks for attempting to share your knowledge. xyph: Performance issues from 3rd NF isn't much of a concern for most applications, and if you do find yourself in that situation then you have a hell of a lot of traffic. A problem best solved with proper caching. Quote Link to comment Share on other sites More sharing options...
xyph Posted August 26, 2012 Share Posted August 26, 2012 I think you two are trying to solve different problems. I was simply trying to merge them. @ChristianF - I never said this example was over-normalized, simply that it can and does happen. Some data just doesn't need to be normalized, and you can absorb performance loss on generating reports to gain performance on more common queries. Quote Link to comment Share on other sites More sharing options...
Barand Posted August 26, 2012 Share Posted August 26, 2012 Christian It is not my design, it is the data given by the OP. I'm well aware there would be a table of orders and order_items in the real world. This is the data necessary to solve his problem, viz. "People who purchased your items also purchased ...." I've been developing for over 40 years and designing and using relational databases professionally for over 25 years so don't patronise me and try to teach me to suck eggs. Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 26, 2012 Share Posted August 26, 2012 Oh, sorry about that. Got you two mixed up. Well, then the OP should indeed be watching those videos. Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 27, 2012 Author Share Posted August 27, 2012 Thanks everyone, Please may I ask what these 2 bold parts of the SQL query below do?: SELECT purchases.purchase_productID, COUNT(*) as recommendationSubtotal FROM purchases INNER JOIN ( SELECT purchase_UserID FROM purchases INNER JOIN ( SELECT purchase_productID FROM purchases WHERE purchase_userID = 1 ) as X using (purchase_productID) WHERE purchase_userID <> 1 ) as Y USING (purchase_userID) GROUP BY purchase_productID ORDER BY recommendationSubtotal DESC; I assume they create aliased tables called X and Y (which I guess I don't technically need in my code?)? But please could you tell me what the "using (purchase_productID)" and "using (purchase_userID)" does? Thanks, Stu Quote Link to comment Share on other sites More sharing options...
xyph Posted August 27, 2012 Share Posted August 27, 2012 USING(column) is the same as ON right_table.column = left_table.column Quote Link to comment Share on other sites More sharing options...
stubarny Posted August 27, 2012 Author Share Posted August 27, 2012 Thank you Quote Link to comment Share on other sites More sharing options...
Barand Posted August 27, 2012 Share Posted August 27, 2012 I assume they create aliased tables called X and Y (which I guess I don't technically need in my code?)? Yes you do. Subqueries used as tables (as these are) require aliases. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.