yeago Posted December 25, 2005 Share Posted December 25, 2005 CREATE TEMPORARY TABLE Results SELECT (attractiveness + originality + variety + video_quality)/4 as rating, sites. * , categories.site_id, count( categories.category ) as relev FROM categories, sites, ratings WHERE categories.site_id = sites.id AND sites.id = ratings.site_id and ($includes) GROUP BY site_id,id"; This query takes a site, gets the number of categories it matches (to determine relevance) and then goes to the ratings table and averages the fields, giving a number (to determing ranking). Problem: Sites that haven't been ranked are not appearing. Yet I must say where sites.id = ratings.site_id or else I get very strange results, such as all ratings appearing the same for all sites. Quote Link to comment Share on other sites More sharing options...
fenway Posted December 26, 2005 Share Posted December 26, 2005 You need to use a LEFT JOIN -- with the multitable select above, you're only including records that have matching rows in both tables. For the non-matching rows, you'll get NULLs, which should be fine for the COUNT() function. Try the following (UNTESTED): CREATE TEMPORARY TABLE Results SELECT (attractiveness + originality + variety + video_quality)/4 as rating, sites. * , categories.site_id, count( categories.category ) as relev FROM sites LEFT JOIN categories ON ( categories.site_id = sites.id ) LEFT JOIN ratings ON ( ratings.site_id = sites.id ) WHERE $includes GROUP BY site_id,id"; Quote Link to comment Share on other sites More sharing options...
yeago Posted January 3, 2006 Author Share Posted January 3, 2006 Perfect except for one thing (which I didn't account for when I posted). I can't simply take the ratings and divide them by 4 because a site may have been rated by more than one person. How can I take the ratings (attractiveness, originality, etc...) and divide them by the number of entries which a given id appears? Instead of: (a + b + c)/4 I want (a+b+c)/4/*number of times site has been reviewed* Quote Link to comment Share on other sites More sharing options...
fenway Posted January 3, 2006 Share Posted January 3, 2006 Sounds like you want the average rating; try AVG(....)/4 instead. BTW, you don't need to GROUP BY id at the end, since by definition, it's unique. Quote Link to comment Share on other sites More sharing options...
yeago Posted January 3, 2006 Author Share Posted January 3, 2006 [!--quoteo(post=332698:date=Jan 3 2006, 09:39 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 3 2006, 09:39 AM) 332698[/snapback][/div][div class=\'quotemain\'][!--quotec--] Sounds like you want the average rating; try AVG(....)/4 instead. BTW, you don't need to GROUP BY id at the end, since by definition, it's unique. I tried that.... AVG(attractiveness + watchability + originality + variety)/4 as rating Sites that have this rating: site id blah blah blah blah blah 6 3 3 3 3 3 Return rating=6! Quote Link to comment Share on other sites More sharing options...
fenway Posted January 3, 2006 Share Posted January 3, 2006 Now that doesn't make any sense... try running the AVG() function on a few rows, and you'll see how it's supposed to work. I don't know specifically why you're getting a different output. Remember, this is an average across all of the sites' rating, so I don't understand why you're posting just a single site rating record in your example. Quote Link to comment Share on other sites More sharing options...
yeago Posted January 4, 2006 Author Share Posted January 4, 2006 [!--quoteo(post=332804:date=Jan 3 2006, 06:28 PM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 3 2006, 06:28 PM) 332804[/snapback][/div][div class=\'quotemain\'][!--quotec--] Now that doesn't make any sense... try running the AVG() function on a few rows, and you'll see how it's supposed to work. I don't know specifically why you're getting a different output. Remember, this is an average across all of the sites' rating, so I don't understand why you're posting just a single site rating record in your example. I need this: Site name, url, description, relevancy (number of times it appears in 'categories') rating (average rating divided by the number of times it appears in 'ratings') Quote Link to comment Share on other sites More sharing options...
fenway Posted January 4, 2006 Share Posted January 4, 2006 First, I meant to write AVG( ( .... / 4 ) ), with the division _inside_ the average function. Second, if that doesn't produce the desired output, just post an example of a site_id with multiple ratings -- just those rows -- and I will show you what I mean. Quote Link to comment Share on other sites More sharing options...
yeago Posted January 4, 2006 Author Share Posted January 4, 2006 [!--quoteo(post=333046:date=Jan 4 2006, 06:44 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 4 2006, 06:44 AM) 333046[/snapback][/div][div class=\'quotemain\'][!--quotec--] First, I meant to write AVG( ( .... / 4 ) ), with the division _inside_ the average function. Second, if that doesn't produce the desired output, just post an example of a site_id with multiple ratings -- just those rows -- and I will show you what I mean. Didn't change anything. example: site_id,usability,features, etc 4, 5, 3, 2 4, 4, 1, 2 4, 5, 2, 2 Should be (site_id) 4, 4.75, 2, 2 Quote Link to comment Share on other sites More sharing options...
fenway Posted January 4, 2006 Share Posted January 4, 2006 That's different than before -- you were combining all of the score categories and average over them. Now you want an average _within_ each category. The following would general the desired output given the rows you posted: SELECT AVG(usability), AVG(features), AVG(etc) FROM some_table GROUP BY site_id Does this help? Quote Link to comment Share on other sites More sharing options...
yeago Posted January 6, 2006 Author Share Posted January 6, 2006 I have changed the way I'm going to do it. I'm just going to have users rate sites by 1-5, no different categories. I have a problem, however. I find that sites that have been rated more than once are receiving double 'relev' from count('category.categories') Also, I still need to do all this within one query (because I am using MySQL to sort them) so I need to get the average rating in one shot, and then call it 'rating' within my temp table. Quote Link to comment Share on other sites More sharing options...
fenway Posted January 6, 2006 Share Posted January 6, 2006 Now that you've changed the question, I have no idea what the problem is anymore. Post your current query -- because I don't see how your COUNT() could be wrong if you're GROUPing BY site_id. Quote Link to comment Share on other sites More sharing options...
yeago Posted January 6, 2006 Author Share Posted January 6, 2006 [!--quoteo(post=333812:date=Jan 6 2006, 05:39 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 6 2006, 05:39 AM) 333812[/snapback][/div][div class=\'quotemain\'][!--quotec--] Now that you've changed the question, I have no idea what the problem is anymore. Post your current query -- because I don't see how your COUNT() could be wrong if you're GROUPing BY site_id. Ok here goes: Table 'sites' (id, url, description) Table 'categories' (site_id, category) <-- site_id will appear twice if site has been placed into two diff categories Table 'ratings' (site_id, rating) <-- site_id will appear twice if the site has been rated twice Main condition: I must get this done in one fell swoop using a temporary table because results must be sortable and I'm choosing MySQL rather than php Array_sort() to sort them. Comment on wisdom of this? What I need: I need a table that looks like id, url, description, count(categories) as relev, avg(rating) as rating CREATE TEMPORARY TABLE Results select s.*,avg(r.rating) as rating from sites as s LEFT JOIN ratings as r on (r.site_id = s.id) where r.site_id = s.id group by site_id Problem with this: Sites that have been rated twice are simply having their ratings added together, not averaged. Sites that have been rated once are having their ratings doubled. Quote Link to comment Share on other sites More sharing options...
fenway Posted January 6, 2006 Share Posted January 6, 2006 Where is the part of the query that deal with the categories table? Quote Link to comment Share on other sites More sharing options...
yeago Posted January 6, 2006 Author Share Posted January 6, 2006 [!--quoteo(post=334148:date=Jan 6 2006, 11:22 PM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ Jan 6 2006, 11:22 PM) 334148[/snapback][/div][div class=\'quotemain\'][!--quotec--] Where is the part of the query that deal with the categories table? Well, actually there are two queries, one that deals with them and one that doesn't, considering that the user may not have specified categories to add or exclude from search. Here it is anyway: I removed reference to the rating for now since I can't get it right. CREATE TEMPORARY TABLE Results SELECT s.*, count( c.category ) as relev FROM sites as s LEFT JOIN categories as c on (c.site_id = s.id) WHERE ($includes) and c.site_id = s.id GROUP BY s.id,c.site_id"; Quote Link to comment Share on other sites More sharing options...
fenway Posted January 7, 2006 Share Posted January 7, 2006 [!--quoteo--][div class=\'quotetop\']QUOTE[/div][div class=\'quotemain\'][!--quotec--]Problem with this: Sites that have been rated twice are simply having their ratings added together, not averaged. Sites that have been rated once are having their ratings doubled. That's impossible. Something else is going on... there's no way that this query: SELECT s.id,AVG(r.rating) FROM sites AS s LEFT JOIN ratings AS r ON( r.site_id = s.id ) WHERE r.site_id = s.id GROUP BY s.id could possibly be doing what you've described. Try this query on two new tables, one with a few site IDs, and the other with 3 ratings for one of them and just 1 for the other. You'll see that in this simple scenario, everything works as expected. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.