#### Archived

This topic is now archived and is closed to further replies.

# Standard Deviance and Outlier help

## Recommended Posts

Basically I\'m just trying to switch my current sql statement which finds averages to do the same, but omit outliers. Here\'s my first one:

`"SELECT a.*, AVG(b.rating) as rating FROM rand_thoughts a, ratings b WHERE b.rid=a.id GROUP BY b.rid ORDER BY rating DESC"`

Works fine, but say you have the set of votes (from 1-10)

7

7

7

8

6

1

The outlier would obviously be one, and it would throw the voting. So I remembered finding the IQR in math class (I was so shocked that I was using something I actually learned, I\'ll have to tell my math teacher) to calculate the outliers after you\'ve found the 25th and 75th quartiles. But the problem is with the sql statement, this is what I\'ve come up with.

`"SELECT a.id, b.rating FROM rand_thoughts a, ratings b WHERE b.rid=a.id && (b.rating > (AVG(b.rating)-STD(b.rating))) && (b.rating < (AVG(b.rating)+STD(b.rating) * .675)) GROUP BY b.rid"`

I\'m pretty sure the logic works, but my problem is it gives me an error. This is the error from mysql_error():

`Invalid use of group function`

I\'ve searched google and all the programming forums that I know of, that\'s part of how I initially came up with finding the standard deviance to check for outliers.

Anyway, any help or suggestions would be great.

Thanks.

##### Share on other sites

You cannot group by something you haven\'t selected.

Try ... GROUP BY a.id ...

##### Share on other sites

I fixed what you suggested. I think I just did that while trying all the different ways trying to get it to work. I\'ve managed to get this to work in 2 seperate queries but I really hate to do that if I have to. This is what I\'m working with now.

`"SELECT a.id, b.rid, b.rating FROM rand_thoughts a, ratings b WHERE b.rid=a.id && (b.rating > (AVG(b.rating)-STD(b.rating))) && (b.rating < (AVG(b.rating)+STD(b.rating) * .675)) GROUP BY b.rid"`

I still get the same error.

##### Share on other sites

You\'re selecting where a.id = b.rid therefore, if they are always the same, why select both?

SELECT a.id, b.rating FROM rand_thoughts a, ratings b WHERE b.rid=a.id

GROUP BY a.id, b.rating

HAVING (b.rating > (AVG(b.rating)-STD(b.rating))) && (b.rating < (AVG(b.rating)+STD(b.rating) * .675))