davestewart Posted February 17, 2008 Share Posted February 17, 2008 Hi all, This looks like a long post, but it isn't - I've just entered the code / results in to make it easier for people to help me - hopefully! Anyway - I'm a new to intermediate MySQL user, and I'm mainly there with JOINs, but I need to know if my GROUP BY is a good or bad solution. I'm joining 3 tables, slowly working from left to right with the appropriate ids. I'm building a photographers' agent's site, and I'm basically finding working out what images belong to what collections by a certain photographer. My SQL is like so: [pre] SELECT cs.photographer_id, # collections photographer_id cs.collection_id, # collections collection_id cs.name as collection_name, # collections collection_name im.name as `image_name`, # images image_name cm.image_id # images image_id FROM images im LEFT JOIN `collections` cs ON im.photographer_id = cs.photographer_id LEFT JOIN `collections_map` cm ON cs.collection_id = cm.collection_id WHERE cs.photographer_id = 1 #GROUP BY collection_id, image_id[/pre] 1 - The first table lists the image data, such as names and ids 2 - The second table lists the collection data, such as the names and ids 3 - the third table lists the relationships between the collection_ids and the image_ids Once I get to the end, I have all the correct data, but there's a lot of duplication, with some rows obviously not making sense. [pre] photographer_id collection_id collection_name image_name image_id 1 1 Collection 1 IMAGE 1 1 1 1 Collection 1 IMAGE 2 1 1 1 Collection 1 IMAGE 3 1 1 1 Collection 1 IMAGE 1 2 1 1 Collection 1 IMAGE 2 2 1 1 Collection 1 IMAGE 3 2 1 1 Collection 1 IMAGE 1 3 1 1 Collection 1 IMAGE 2 3 1 1 Collection 1 IMAGE 3 3 1 3 Collection 2 IMAGE 1 1 1 3 Collection 2 IMAGE 2 1 1 3 Collection 2 IMAGE 3 1 1 3 Collection 2 IMAGE 1 2 1 3 Collection 2 IMAGE 2 2 1 3 Collection 2 IMAGE 3 2 [/pre] I'm then using GROUP BY on the values I'm interested in to reduce the dataset to only the rows I want. [pre] photographer_id collection_id collection_name image_name image_id 1 1 Collection 1 IMAGE 1 1 1 1 Collection 1 IMAGE 1 2 1 1 Collection 1 IMAGE 1 3 1 3 Collection 2 IMAGE 1 1 1 3 Collection 2 IMAGE 1 2 [/pre] It works, but is this right? I can't help but feel GROUP By is not supposed to be used like this, but as I said, I'm still not 100% on joins. image_ids are NOT globally unique (they are to each photographer) so I can't search for images by a certain photographer, then reduce them if they're in a collection Thanks! Dave Quote Link to comment Share on other sites More sharing options...
NL_Rosko Posted February 17, 2008 Share Posted February 17, 2008 when using left join u will see all the coloms of the left from the join even if there is right now relation the group by function is this case will not give the required result. use inner join, when a number in table a is found in table b it is joined. test youre query by using 1 id, and then continue the build. this is basic sql Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 17, 2008 Author Share Posted February 17, 2008 Actually, the GROUP BY clause does give the right result for image_id, but not image_name. I suspect something is not right, but as I'm not trained in SQL (it's one of those things some of us have to pick up through necessity) I don't know how to put my finger on it. > this is basic sql I've taken the time and trouble give specific information, and to have it as well-presented as possible, so to make it easier for people to reply. There's a few points I've not covered before, but retorts like that aren't really helpful. > coloms > youre > now > is this case This is basic spelling and grammar. Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 17, 2008 Author Share Posted February 17, 2008 The KEY thing in my original post (that I probably didn't make clear enough from the start!) is that the image_id in the images table is NOT unique. It's only unique when taken in conjunction with the photographer_id, ie [pre] p_id i_id 1 1 1 2 1 3 2 1 2 2 2 3 [/pre] This is why I'm finding it so confusing to do the join, and why I was attempting to working left to right (gathering more and more data along the way) then trying to "narrow it down" at the end. Seems odd now, but at the time it made sense. If I make the image_ids unique, then yes, it's a simple case of: [pre] SELECT cs.collection_id, cs.name, im.image_id, im.image_num, im.file_name FROM collections_map cm JOIN collections cs ON cm.collection_id = cs.collection_id JOIN images im ON cm.image_id = im.image_id WHERE cs.photographer_id = 2 ORDER BY collection_id, file_name [/pre] So - back to my original questioning: is it possible / sensible to JOIN tables where you need to specify two keys like this, or should I just accept that it's just the way to go to have unique ids for easy joining? Thanks, Dave Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 I think what I should probably ask is "can you explain what you are expecting" (e.g. a count of all images in each collection for each photographer) and give a sample dataset for it (i.e. what you want). Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 Hi Aschk, I'm expecting a table that shows the images within each collection, which will look like... [pre]collection_id name image_id image_num file_name 1 Collection 1 1 1 IMAGE 1.jpg 1 Collection 1 2 2 IMAGE 2.jpg 1 Collection 1 3 3 IMAGE 3.jpg 3 Collection 2 1 1 IMAGE 1.jpg 3 Collection 2 2 2 IMAGE 2.jpg[/pre] ...which I get if I use a globally unique id within the collections_map and images tables. However, I want to do the join (if possible) using photographer_id AND image_num (a photographer-specific unique id for each image). I've created a very simple database structure (`photographers_test` - 4 tables) and only the necessary sample data to test it, which you can down here if you want to quickly play: http://www.keyframesandcode.com/temp/photographers_test.sql (I've updated collections_map to include the image_num column.) As a test, if you run the previous SQL query, you will get back the correct rows. To make it obvious, I have made photographer 1's filenames UPPERCASE and photographer 2's filenames lowercase, so the filename column should be all UPPERCASE or lowercase depending on the photographer_id. Thanks for looking! Dave Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 Ok, so i don't see why you're GROUPing at all. It seems a simple SELECT with JOINs is giving you what you want (as in your original post). I'm wondering if you're misunderstanding what GROUP BY does, because it seems to me you're not using any GROUP'ing functions (such as COUNT, SUM, AVG). Let me have a play with the data in an attempt to understand what it is you're REALLY looking for. Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 Ok, just wanting to break down the data structure 1 second. 1) Each photo is taken by a photographer, thus the image is related to the photographer by id 2) Each image can be part of 1 OR MORE collections, hence collections consist of images (1 collection can have MANY images, but 1 image can also be part of MANY collections). 3) Each collection is related to a photographer thus 1 photographer can have MANY collections. It's number 3 that i'm lost with, are you saying that people can put together collections but not necessarily have been the person taking the photo? i.e. person who took the image is unrelated to the collection. So Joe Bloggs takes photo1.jpg , but Samwise Gangee has photo1.jpg is HIS collection? Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 Going by your first post, (IN WORDS) you want this. For a particular photographer (1) you want all the collections he has, and their related images? e.g. SELECT p.name, c.name, i.file_name FROM collections c JOIN collections_map cm ON c.collection_id = cm.collection_id JOIN images i ON cm.image_id = i.image_id JOIN photographers p ON c.photographer_id = p.photographer_id WHERE p.photographer_id = 1; Which gives name | name | file_name ======================================= Dave Stewart | Collection 1 | IMAGE 1.jpg Dave Stewart | Collection 1 | IMAGE 2.jpg Dave Stewart | Collection 1 | IMAGE 3.jpg Dave Stewart | Collection 2 | IMAGE 1.jpg Dave Stewart | Collection 2 | IMAGE 2.jpg n.b. what i don't like at the minute is that photographers can have collections that appear to contain images they didn't take. Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 Hey again aschk, Thanks so much for your input. Yes, I understand that I was misusing GROUP! It's tasks like this which will help me see where NOT to use something. Your understanding of points 1 and 2 is correct! As to point 3: A collection owned by a single photographer will only EVER have images from that photographer. This is an application constraint rather than a database constraint. So in your terminology: A Joe Bloggs photo will only ever be in a Joe Bloggs collection, and will never be in a Samwise Gangee collection. Are you clear on the photographer_id / image_num correlation in `images`? Basically the combination of the two makes it a primary key. Looking forward to your reply! Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 Would appear i have been modifying my post while you were typing. See the above query. However now given the information you have supplied I believe your database structure to be wrong. 1) photographers can have MANY collections 2) each collection contains MANY photos Thus you don't need the link between photographs and image because all that is taken care of by the collections. As a summation: Photographer -> HAS MANY -> collections Collection -> HAS MANY -> photos Photo -> CAN BE PART OF MANY -> Collections. Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 (You replied whilst I was typing... so I'll check your reply) OK, more info: What I would like to do would be to do-away with the image_id field, as in theory there is no need for it, as photographs can be identified by the photographer_id/image_num pairing. So the new SELECT query will need to reference image_num ONLY and not image_id (which hopefully we can get rid of!) I added both image_id and image_num to the collections_map table for the convenience reasons, so we can test new SELECT queries against both columns. Hopefully that makes things clearer. I wish I could sum this up in a more concise way! Sorry for the massive effort Dave Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 Hmm on 2nd thoughts maybe the images photographer reference has a use in a denormalisation situation. note: In answer to your last post: using a composite key is more difficult and using individual ids for each photo isn't a problem really as they ARE unique. Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 > Thus you don't need the link between photographs and image because all that is taken care of by the collections. Are you saying I don't need the photographer_id in the images table? Actually, I do, as an image might not be part of ANY collections! For example, a photographer might - upload 100 images - create 5 collections - assign 80 images to the 5 collections (with as much overlap as he likes) The 20 remaining images still need to know which photographer they belong to. And if any collections are deleted (and their corresponding collections_map data) those images won't belong to ANY collections, and would have been orphaned. Put me straight if I don't understand your question / statement. Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 > using a composite key is more difficult and using individual ids for each photo isn't a problem really as they ARE unique OK, I geddit. In that case, the new SELECT query should reference the photographer_id and image_num ONLY and NOT the globally unique image_id (which hopefully we can get rid of, as the first two can be thought of as globally unique when thought of as a pair) Make sense? Perhaps it's not possible / too much effort! I'm getting out of my depth here... Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 you're perfectly correct with your requirement for the photographer_id being in the images. A requirement that I didn't think of was that an image might not be assigned to a collection. My order of working was: 1) Make user 2) User make collection 3) User make image and assign to collection But what is real world is that a user can create an image and NOT assign it to a collection. So collections are just a handy grouping of images that a photographer might have. Now knowing this, did my SQL work for you? Is it the case that you want to know of all the images that a particular photographer created and all the collections that those images belong to? (this query is slightly different ) Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 Hey there, Yeah - your SELECT worked just fine, but if you check back - I had already built the same statement and, as you have proved as well, this works. However, it works on the premise that each image has a globally unique image_id, in addition to it's photographer-unique image_num (a job requirement). I want to get of the globally unique id and use both the photographer_id and the image_num to the same effect. Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 Crikey - I've done it! I think... I just added 2 WHERE clauses. Can you check my logic? [pre] SELECT cs.photographer_id, cs.collection_id, cs.name, im.image_id, im.image_num, im.file_name FROM collections_map cm JOIN collections cs ON cm.collection_id = cs.collection_id JOIN images im ON cm.image_num = im.image_num WHERE cs.photographer_id = 2 AND im.photographer_id = cs.photographer_id AND im.image_num = cm.image_num ORDER BY collection_id, file_name[/pre] Which results in this dataset: [pre] photographer_id collection_id name image_id image_num file_name 2 2 Collection A 4 1 image 1.jpg 2 2 Collection A 5 2 image 2.jpg 2 2 Collection A 6 3 image 3.jpg[/pre] Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 Yup, looks kosher. The additional AND statements however are really part of the JOIN (and the 2nd one is m00t because it's identical to the JOIN condition), thus: SELECT cs.photographer_id, cs.collection_id, cs.name, im.image_id, im.image_num, im.file_name FROM collections_map cm JOIN collections cs ON cm.collection_id = cs.collection_id JOIN images im ON cm.image_num = im.image_num AND im.photographer_id = cs.photographer_id WHERE cs.photographer_id = 2 ORDER BY collection_id, file_name Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 OK - I'm really starting to get this JOINS thing now. So I could have written the whole thing using WHERE clauses in theta style, but this ANSI style (have I got that right!?) requires the extra WHERE clause to complete the requirements of the set-theory stuff. Update: Ah - I see you added the WHERE clause to the JOIN. I didn't realise you could do that! Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 Heh, i think you lost me there. Basically it's best to supply the clause on the JOIN rather than in the WHERE because WHERE is a filtering that occurs after the JOIN (i believe). Looking back though it seems my query that I supplied early on gives the same answer as the last one you posted (minus a couple of columns). Also, I would strip out the image_num from collections_map and images as I don't believe this has a use. Quote Link to comment Share on other sites More sharing options...
aschk Posted February 18, 2008 Share Posted February 18, 2008 Ah right up with theta/ANSI thing now (after a bit of googling). Yes. Quote Link to comment Share on other sites More sharing options...
davestewart Posted February 18, 2008 Author Share Posted February 18, 2008 Hey aschk, Actually image_num is a photographer-unique image identifier that each photographer can use to reference their own images. So Dave Stewart might say "I'll have image_nums 1, 2 and 10" as opposed to "I'll have image_ids 1021, 1021 and 1030". It's more the case that I don't need the (redundant) globally-unique image_id, as there will NEVER be a time when an image needs to be referenced outside the context of a photographer. It will always be (for example) photographer 7 and image 10, rather than just image 1030. Anyway - thanks for all the input. It's very, very much appreciated! Still not sure how I could have asked the question in a more succinct way (let me know if you think of anything), but glad we got there in the end. Cheers!! Dave Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.