Jump to content

Archived

This topic is now archived and is closed to further replies.

kfreak

Standard Deviance and Outlier help

Recommended Posts

Basically I\'m just trying to switch my current sql statement which finds averages to do the same, but omit outliers. Here\'s my first one:

 

 

 

"SELECT a.*, AVG(b.rating) as rating FROM rand_thoughts a, ratings b WHERE b.rid=a.id GROUP BY b.rid ORDER BY rating DESC"

 

 

Works fine, but say you have the set of votes (from 1-10)

7

7

7

8

6

1

 

The outlier would obviously be one, and it would throw the voting. So I remembered finding the IQR in math class (I was so shocked that I was using something I actually learned, I\'ll have to tell my math teacher) to calculate the outliers after you\'ve found the 25th and 75th quartiles. But the problem is with the sql statement, this is what I\'ve come up with.

 

"SELECT a.id, b.rating FROM rand_thoughts a, ratings b WHERE b.rid=a.id && (b.rating > (AVG(b.rating)-STD(b.rating))) && (b.rating < (AVG(b.rating)+STD(b.rating) * .675)) GROUP BY b.rid"

 

 

I\'m pretty sure the logic works, but my problem is it gives me an error. This is the error from mysql_error():

 

Invalid use of group function

 

I\'ve searched google and all the programming forums that I know of, that\'s part of how I initially came up with finding the standard deviance to check for outliers.

 

Anyway, any help or suggestions would be great.

 

Thanks.

Share this post


Link to post
Share on other sites

You cannot group by something you haven\'t selected.

 

Try ... GROUP BY a.id ...

Share this post


Link to post
Share on other sites

Thanks for the reply.

 

I fixed what you suggested. I think I just did that while trying all the different ways trying to get it to work. I\'ve managed to get this to work in 2 seperate queries but I really hate to do that if I have to. This is what I\'m working with now.

 

"SELECT a.id, b.rid, b.rating FROM rand_thoughts a, ratings b WHERE b.rid=a.id && (b.rating > (AVG(b.rating)-STD(b.rating))) && (b.rating < (AVG(b.rating)+STD(b.rating) * .675)) GROUP BY b.rid"

 

I still get the same error.

Share this post


Link to post
Share on other sites

You\'re selecting where a.id = b.rid therefore, if they are always the same, why select both?

 

SELECT a.id, b.rating FROM rand_thoughts a, ratings b WHERE b.rid=a.id

GROUP BY a.id, b.rating

HAVING (b.rating > (AVG(b.rating)-STD(b.rating))) && (b.rating < (AVG(b.rating)+STD(b.rating) * .675))

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.