Jump to content

Archived

This topic is now archived and is closed to further replies.

yeago

Multitable query syntax

Recommended Posts

CREATE TEMPORARY TABLE Results SELECT

(attractiveness + originality + variety + video_quality)/4 as rating,

sites. * , categories.site_id, count( categories.category ) as relev

FROM categories, sites, ratings

WHERE categories.site_id = sites.id AND sites.id = ratings.site_id and ($includes)

GROUP BY site_id,id";

 

 

This query takes a site, gets the number of categories it matches (to determine relevance) and then goes to the ratings table and averages the fields, giving a number (to determing ranking).

 

Problem: Sites that haven't been ranked are not appearing. Yet I must say where sites.id = ratings.site_id or else I get very strange results, such as all ratings appearing the same for all sites.

Share this post


Link to post
Share on other sites

You need to use a LEFT JOIN -- with the multitable select above, you're only including records that have matching rows in both tables. For the non-matching rows, you'll get NULLs, which should be fine for the COUNT() function.

 

Try the following (UNTESTED):

 

CREATE TEMPORARY TABLE Results SELECT
(attractiveness + originality + variety + video_quality)/4 as rating,
sites. * , categories.site_id, count( categories.category ) as relev
FROM sites 
LEFT JOIN categories ON ( categories.site_id = sites.id ) 
LEFT JOIN ratings ON ( ratings.site_id = sites.id )
WHERE $includes 
GROUP BY site_id,id";

Share this post


Link to post
Share on other sites

Perfect except for one thing (which I didn't account for when I posted).

 

I can't simply take the ratings and divide them by 4 because a site may have been rated by more than one person.

 

How can I take the ratings (attractiveness, originality, etc...) and divide them by the number of entries which a given id appears?

 

Instead of:

 

(a + b + c)/4

 

I want

 

(a+b+c)/4/*number of times site has been reviewed*

Share this post


Link to post
Share on other sites

Sounds like you want the average rating; try AVG(....)/4 instead. BTW, you don't need to GROUP BY id at the end, since by definition, it's unique.

Share this post


Link to post
Share on other sites

[!--quoteo(post=332698:date=Jan 3 2006, 09:39 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 3 2006, 09:39 AM) 332698[/snapback][/div][div class=\'quotemain\'][!--quotec--]

Sounds like you want the average rating; try AVG(....)/4 instead. BTW, you don't need to GROUP BY id at the end, since by definition, it's unique.

 

I tried that....

AVG(attractiveness + watchability + originality + variety)/4 as rating

 

Sites that have this rating:

 

site id blah blah blah blah blah

6 3 3 3 3 3

 

Return rating=6!

 

 

Share this post


Link to post
Share on other sites

Now that doesn't make any sense... try running the AVG() function on a few rows, and you'll see how it's supposed to work. I don't know specifically why you're getting a different output. Remember, this is an average across all of the sites' rating, so I don't understand why you're posting just a single site rating record in your example.

Share this post


Link to post
Share on other sites

[!--quoteo(post=332804:date=Jan 3 2006, 06:28 PM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 3 2006, 06:28 PM) 332804[/snapback][/div][div class=\'quotemain\'][!--quotec--]

Now that doesn't make any sense... try running the AVG() function on a few rows, and you'll see how it's supposed to work. I don't know specifically why you're getting a different output. Remember, this is an average across all of the sites' rating, so I don't understand why you're posting just a single site rating record in your example.

 

I need this:

 

Site name, url, description, relevancy (number of times it appears in 'categories') rating (average rating divided by the number of times it appears in 'ratings')

Share this post


Link to post
Share on other sites

First, I meant to write AVG( ( .... / 4 ) ), with the division _inside_ the average function. Second, if that doesn't produce the desired output, just post an example of a site_id with multiple ratings -- just those rows -- and I will show you what I mean.

Share this post


Link to post
Share on other sites

[!--quoteo(post=333046:date=Jan 4 2006, 06:44 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 4 2006, 06:44 AM) 333046[/snapback][/div][div class=\'quotemain\'][!--quotec--]

First, I meant to write AVG( ( .... / 4 ) ), with the division _inside_ the average function. Second, if that doesn't produce the desired output, just post an example of a site_id with multiple ratings -- just those rows -- and I will show you what I mean.

 

Didn't change anything.

 

example:

 

site_id,usability,features, etc

 

4, 5, 3, 2

4, 4, 1, 2

4, 5, 2, 2

 

Should be

 

(site_id) 4, 4.75, 2, 2

Share this post


Link to post
Share on other sites

That's different than before -- you were combining all of the score categories and average over them. Now you want an average _within_ each category. The following would general the desired output given the rows you posted:

 

SELECT AVG(usability), AVG(features), AVG(etc) FROM some_table GROUP BY site_id

 

Does this help?

Share this post


Link to post
Share on other sites

I have changed the way I'm going to do it. I'm just going to have users rate sites by 1-5, no different categories.

 

I have a problem, however.

 

I find that sites that have been rated more than once are receiving double 'relev' from count('category.categories')

 

Also, I still need to do all this within one query (because I am using MySQL to sort them) so I need to get the average rating in one shot, and then call it 'rating' within my temp table.

Share this post


Link to post
Share on other sites

Now that you've changed the question, I have no idea what the problem is anymore. Post your current query -- because I don't see how your COUNT() could be wrong if you're GROUPing BY site_id.

Share this post


Link to post
Share on other sites

[!--quoteo(post=333812:date=Jan 6 2006, 05:39 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 6 2006, 05:39 AM) 333812[/snapback][/div][div class=\'quotemain\'][!--quotec--]

Now that you've changed the question, I have no idea what the problem is anymore. Post your current query -- because I don't see how your COUNT() could be wrong if you're GROUPing BY site_id.

 

Ok here goes:

 

Table 'sites' (id, url, description)

Table 'categories' (site_id, category) <-- site_id will appear twice if site has been placed into two diff categories

Table 'ratings' (site_id, rating) <-- site_id will appear twice if the site has been rated twice

 

Main condition: I must get this done in one fell swoop using a temporary table because results must be sortable and I'm choosing MySQL rather than php Array_sort() to sort them. Comment on wisdom of this?

 

What I need:

 

I need a table that looks like

 

id, url, description, count(categories) as relev, avg(rating) as rating

                CREATE TEMPORARY TABLE Results select
                s.*,avg(r.rating) as rating
                from sites as s
                LEFT JOIN ratings as r on (r.site_id = s.id)
                where
                r.site_id = s.id group by site_id

 

Problem with this:

 

Sites that have been rated twice are simply having their ratings added together, not averaged. Sites that have been rated once are having their ratings doubled.

 

 

Share this post


Link to post
Share on other sites

Where is the part of the query that deal with the categories table?

Share this post


Link to post
Share on other sites

[!--quoteo(post=334148:date=Jan 6 2006, 11:22 PM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 6 2006, 11:22 PM) 334148[/snapback][/div][div class=\'quotemain\'][!--quotec--]

Where is the part of the query that deal with the categories table?

 

Well, actually there are two queries, one that deals with them and one that doesn't, considering that the user may not have specified categories to add or exclude from search.

 

Here it is anyway:

 

I removed reference to the rating for now since I can't get it right.

 


CREATE TEMPORARY TABLE Results SELECT
s.*, count( c.category ) as relev
FROM sites as s
LEFT JOIN categories as c on (c.site_id = s.id)
WHERE ($includes) and c.site_id = s.id
GROUP BY s.id,c.site_id";

Share this post


Link to post
Share on other sites

[!--quoteo--][div class=\'quotetop\']QUOTE[/div][div class=\'quotemain\'][!--quotec--]Problem with this:

 

Sites that have been rated twice are simply having their ratings added together, not averaged. Sites that have been rated once are having their ratings doubled.

 

That's impossible. Something else is going on... there's no way that this query:

 

SELECT s.id,AVG(r.rating) FROM sites AS s LEFT JOIN ratings AS r ON( r.site_id = s.id ) WHERE r.site_id = s.id GROUP BY s.id

 

could possibly be doing what you've described. Try this query on two new tables, one with a few site IDs, and the other with 3 ratings for one of them and just 1 for the other. You'll see that in this simple scenario, everything works as expected.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.