Jump to content

Join question


davestewart

Recommended Posts

Hi all,

This looks like a long post, but it isn't - I've just entered the code / results in to make it easier for people to help me - hopefully!

 

Anyway - I'm a new to intermediate MySQL user, and I'm mainly there with JOINs, but I need to know if my GROUP BY is a good or bad solution.

 

I'm joining 3 tables, slowly working from left to right with the appropriate ids. I'm building a photographers' agent's site, and I'm basically finding working out what images belong to what collections by a certain photographer.

 

My SQL is like so:

[pre]

SELECT 

cs.photographer_id,   # collections photographer_id

cs.collection_id,     # collections collection_id

cs.name as collection_name, # collections collection_name

im.name as `image_name`,     # images image_name

cm.image_id           # images image_id

 

FROM images im

 

LEFT JOIN `collections` cs

ON im.photographer_id = cs.photographer_id

 

LEFT JOIN `collections_map` cm

ON cs.collection_id = cm.collection_id

 

WHERE cs.photographer_id = 1

 

#GROUP BY collection_id, image_id[/pre]

 

1 - The first table lists the image data, such as names and ids

2 - The second table lists the collection data, such as the names and ids

3 - the third table lists the relationships between the collection_ids and the image_ids

 

Once I get to the end, I have all the correct data, but there's a lot of duplication, with some rows obviously not making sense.

[pre]

photographer_id collection_id collection_name image_name image_id

1 1 Collection 1 IMAGE 1 1

1 1 Collection 1 IMAGE 2 1

1 1 Collection 1 IMAGE 3 1

1 1 Collection 1 IMAGE 1 2

1 1 Collection 1 IMAGE 2 2

1 1 Collection 1 IMAGE 3 2

1 1 Collection 1 IMAGE 1 3

1 1 Collection 1 IMAGE 2 3

1 1 Collection 1 IMAGE 3 3

1 3 Collection 2 IMAGE 1 1

1 3 Collection 2 IMAGE 2 1

1 3 Collection 2 IMAGE 3 1

1 3 Collection 2 IMAGE 1 2

1 3 Collection 2 IMAGE 2 2

1 3 Collection 2 IMAGE 3 2

[/pre]

 

I'm then using GROUP BY on the values I'm interested in to reduce the dataset to only the rows I want.

 

[pre]

photographer_id collection_id collection_name image_name image_id

1 1 Collection 1 IMAGE 1 1

1 1 Collection 1 IMAGE 1 2

1 1 Collection 1 IMAGE 1 3

1 3 Collection 2 IMAGE 1 1

1 3 Collection 2 IMAGE 1 2

[/pre]

 

It works, but is this right? I can't help but feel GROUP By is not supposed to be used like this, but as I said, I'm still not 100% on joins.

 

image_ids are NOT globally unique (they are to each photographer) so I can't search for images by a certain photographer, then reduce them if they're in a collection

 

Thanks!

Dave

Link to comment
Share on other sites

when using left join u will see all the coloms of the left from the join even if there is right now relation

the group by function is this case will not give the required result.

 

use inner join, when a number in table a is found in table b it is joined.

 

test youre query by using 1 id, and then continue the build.

 

this is basic sql

Link to comment
Share on other sites

Actually, the GROUP BY clause does give the right result for image_id, but not image_name.

 

I suspect something is not right, but as I'm not trained in SQL (it's one of those things some of us have to pick up through necessity) I don't know how to put my finger on it.

 

> this is basic sql

 

I've taken the time and trouble give specific information, and to have it as well-presented as possible, so to make it easier for people to reply. There's a few points I've not covered before, but retorts like that aren't really helpful.

 

> coloms

> youre

> now

> is this case

 

This is basic spelling and grammar.

Link to comment
Share on other sites

The KEY thing in my original post (that I probably didn't make clear enough from the start!) is that the image_id in the images table is NOT unique.

 

It's only unique when taken in conjunction with the photographer_id, ie

[pre]

p_id i_id

1 1

1 2

1 3

2 1

2 2

2 3

[/pre]

 

This is why I'm finding it so confusing to do the join, and why I was attempting to working left to right (gathering more and more data along the way) then trying to "narrow it down" at the end. Seems odd now, but at the time it made sense.

 

If I make the image_ids unique, then yes, it's a simple case of:

[pre] SELECT

cs.collection_id,

cs.name,

im.image_id,

im.image_num,

im.file_name

 

FROM collections_map cm

 

JOIN collections cs

ON cm.collection_id = cs.collection_id

 

JOIN images im

ON cm.image_id = im.image_id

 

WHERE cs.photographer_id = 2

ORDER BY collection_id, file_name

[/pre]

 

So - back to my original questioning: is it possible / sensible to JOIN tables where you need to specify two keys like this, or should I just accept that it's just the way to go to have unique ids for easy joining?

 

Thanks,

Dave

 

 

Link to comment
Share on other sites

Hi Aschk,

 

I'm expecting a table that shows the images within each collection, which will look like...

 

[pre]collection_id name image_id image_num file_name

1 Collection 1 1 1 IMAGE 1.jpg

1 Collection 1 2 2 IMAGE 2.jpg

1 Collection 1 3 3 IMAGE 3.jpg

3 Collection 2 1 1 IMAGE 1.jpg

3 Collection 2 2 2 IMAGE 2.jpg[/pre]

 

...which I get if I use a globally unique id within the collections_map and images tables.

 

However, I want to do the join (if possible) using photographer_id AND image_num (a photographer-specific unique id for each image).

 

I've created a very simple database structure (`photographers_test` - 4 tables) and only the necessary sample data to test it, which you can down here if you want to quickly play:

 

http://www.keyframesandcode.com/temp/photographers_test.sql

 

(I've updated collections_map to include the image_num column.)

 

As a test, if you run the previous SQL query, you will get back the correct rows. To make it obvious, I have made photographer 1's filenames UPPERCASE and photographer 2's filenames lowercase, so the filename column should be all UPPERCASE or lowercase depending on the photographer_id.

 

Thanks for looking!

Dave

Link to comment
Share on other sites

Ok, so i don't see why you're GROUPing at all.

It seems a simple SELECT with JOINs is giving you what you want (as in your original post).

I'm wondering if you're misunderstanding what GROUP BY does, because it seems to me you're not using any GROUP'ing functions (such as COUNT, SUM, AVG).

 

Let me have a play with the data in an attempt to understand what it is you're REALLY looking for.

Link to comment
Share on other sites

Ok, just wanting to break down the data structure 1 second.

 

1) Each photo is taken by a photographer, thus the image is related to the photographer by id

2) Each image can be part of 1 OR MORE collections, hence collections consist of images (1 collection can have MANY images, but 1 image can also be part of MANY collections).

3) Each collection is related to a photographer thus 1 photographer can have MANY collections.

 

It's number 3 that i'm lost with, are you saying that people can put together collections but not necessarily have been the person taking the photo? i.e. person who took the image is unrelated to the collection.

So Joe Bloggs takes photo1.jpg , but Samwise Gangee has photo1.jpg is HIS collection?

Link to comment
Share on other sites

Going by your first post, (IN WORDS) you want this.

For a particular photographer (1) you want all the collections he has, and their related images?

e.g.

SELECT p.name, c.name, i.file_name
FROM collections c
JOIN collections_map cm ON c.collection_id = cm.collection_id
JOIN images i ON cm.image_id = i.image_id
JOIN photographers p ON c.photographer_id = p.photographer_id
WHERE p.photographer_id = 1;

 

Which gives

name         | name         | file_name
=======================================
Dave Stewart | Collection 1 | IMAGE 1.jpg
Dave Stewart | Collection 1 | IMAGE 2.jpg
Dave Stewart | Collection 1 | IMAGE 3.jpg
Dave Stewart | Collection 2 | IMAGE 1.jpg
Dave Stewart | Collection 2 | IMAGE 2.jpg

 

n.b. what i don't like at the minute is that photographers can have collections that appear to contain images they didn't take.

 

 

Link to comment
Share on other sites

Hey again aschk,

Thanks so much for your input.

 

Yes, I understand that I was misusing GROUP! It's tasks like this which will help me see where NOT to use something.

 

Your understanding of points 1 and 2 is correct!

 

As to point 3:

A collection owned by a single photographer will only EVER have images from that photographer.

This is an application constraint rather than a database constraint.

 

So in your terminology: A Joe Bloggs photo will only ever be in a Joe Bloggs collection, and will never be in a Samwise Gangee collection.

 

Are you clear on the photographer_id / image_num correlation in `images`?

Basically the combination of the two makes it a primary key.

 

Looking forward to your reply!

:D

Link to comment
Share on other sites

Would appear i have been modifying my post while you were typing. See the above query.

However now given the information you have supplied I believe your database structure to be wrong.

 

1) photographers can have MANY collections

2) each collection contains MANY photos

 

Thus you don't need the link between photographs and image because all that is taken care of by the collections.

 

As a summation:

Photographer -> HAS MANY -> collections

Collection -> HAS MANY -> photos

Photo -> CAN BE PART OF MANY -> Collections.

Link to comment
Share on other sites

(You replied whilst I was typing... so I'll check your reply)

 

OK, more info:

 

What I would like to do would be to do-away with the image_id field, as in theory there is no need for it, as photographs can be identified by the photographer_id/image_num pairing.

 

So the new SELECT query will need to reference image_num ONLY and not image_id (which hopefully we can get rid of!)

 

I added both image_id and image_num to the collections_map table for the convenience reasons, so we can test new SELECT queries against both columns.

 

Hopefully that makes things clearer. I wish I could sum this up in a more concise way! Sorry for the massive effort :)

 

Dave

Link to comment
Share on other sites

Hmm on 2nd thoughts maybe the images photographer reference has a use in a denormalisation situation.

 

note: In answer to your last post: using a composite key is more difficult and using individual ids for each photo isn't a problem really as they ARE unique.

Link to comment
Share on other sites

> Thus you don't need the link between photographs and image because all that is taken care of by the collections.

 

Are you saying I don't need the photographer_id in the images table?

 

Actually, I do, as an image might not be part of ANY collections! For example, a photographer might

 

- upload 100 images

- create 5 collections

- assign 80 images to the 5 collections (with as much overlap as he likes)

 

The 20 remaining images still need to know which photographer they belong to.

And if any collections are deleted (and their corresponding collections_map data) those images won't belong to ANY collections, and would have been orphaned.

 

Put me straight if I don't understand your question / statement.

Link to comment
Share on other sites

> using a composite key is more difficult and using individual ids for each photo isn't a problem really as they ARE unique

 

OK, I geddit.

 

In that case, the new SELECT query should reference the photographer_id and image_num ONLY and NOT the globally unique image_id (which hopefully we can get rid of, as the first two can be thought of as globally unique when thought of as a pair)

 

Make sense? Perhaps it's not possible / too much effort! I'm getting out of my depth here...

Link to comment
Share on other sites

you're perfectly correct with your requirement for the photographer_id being in the images.

A requirement that I didn't think of was that an image might not be assigned to a collection.

My order of working was:

1) Make user

2) User make collection

3) User make image and assign to collection

 

But what is real world is that a user can create an image and NOT assign it to a collection. So collections are just a handy grouping of images that a photographer might have. Now knowing this, did my SQL work for you?

Is it the case that you want to know of all the images that a particular photographer created and all the collections that those images belong to? (this query is slightly different ;))

Link to comment
Share on other sites

Hey there,

Yeah - your SELECT worked just fine, but if you check back - I had already built the same statement and, as you have proved as well, this works.

 

However, it works on the premise that each image has a globally unique image_id, in addition to it's photographer-unique image_num (a job requirement). I want to get of the globally unique id and use both the photographer_id and the image_num to the same effect.

 

Link to comment
Share on other sites

Crikey - I've done it! I think...

 

I just added 2 WHERE clauses. Can you check my logic?

 


[pre] SELECT

cs.photographer_id,

cs.collection_id,

cs.name,

im.image_id,

im.image_num,

im.file_name

 

FROM collections_map cm

 

JOIN collections cs

ON cm.collection_id = cs.collection_id

 

JOIN images im

ON cm.image_num = im.image_num

 

WHERE

cs.photographer_id = 2

AND im.photographer_id = cs.photographer_id

AND im.image_num = cm.image_num

 

ORDER BY

collection_id,

file_name[/pre]


Which results in this dataset:

 

[pre] photographer_id collection_id name image_id image_num file_name

2 2 Collection A 4 1 image 1.jpg

2 2 Collection A 5 2 image 2.jpg

2 2 Collection A 6 3 image 3.jpg[/pre]

Link to comment
Share on other sites

Yup, looks kosher. The additional AND statements however are really part of the JOIN (and the 2nd one is m00t because it's identical to the JOIN condition), thus:

 

   SELECT
      cs.photographer_id,
      cs.collection_id,
      cs.name,
      im.image_id,
      im.image_num,
      im.file_name
   
   FROM collections_map cm
   
   JOIN collections cs
   ON cm.collection_id = cs.collection_id
   
   JOIN images im
   ON cm.image_num = im.image_num AND im.photographer_id = cs.photographer_id
   
   WHERE
      cs.photographer_id = 2      
   
   ORDER BY
      collection_id, 
      file_name

Link to comment
Share on other sites

OK - I'm really starting to get this JOINS thing now.

 

So I could have written the whole thing using WHERE clauses in theta style, but this ANSI style (have I got that right!?) requires the extra WHERE clause to complete the requirements of the set-theory stuff.

 

Update: Ah - I see you added the WHERE clause to the JOIN. I didn't realise you could do that!

Link to comment
Share on other sites

Heh, i think you lost me there.  :o

 

Basically it's best to supply the clause on the JOIN rather than in the WHERE because WHERE is a filtering that occurs after the JOIN (i believe).

 

Looking back though it seems my query that I supplied early on gives the same answer as the last one you posted (minus a couple of columns).

 

Also, I would strip out the image_num from collections_map and images as I don't believe this has a use.

Link to comment
Share on other sites

Hey aschk,

 

Actually image_num is a photographer-unique image identifier that each photographer can use to reference their own images.

 

So Dave Stewart might say "I'll have image_nums 1, 2 and 10" as opposed to "I'll have image_ids 1021, 1021 and 1030".

 

It's more the case that I don't need the (redundant) globally-unique image_id, as there will NEVER be a time when an image needs to be referenced outside the context of a photographer. It will always be (for example) photographer 7 and image 10, rather than just image 1030.

 

Anyway - thanks for all the input. It's very, very much appreciated!

 

Still not sure how I could have asked the question in a more succinct way (let me know if you think of anything), but glad we got there in the end.

 

Cheers!!

Dave

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.