Jump to content

Why can't I GROUP BY emp_name only in this query? But why am I able to GROUP BY orderid alone?


polaryeti
Go to solution Solved by gizmola,

Recommended Posts

I'm on Windows 10, MySQL Workbench. I'm on MySQL 8.0.

Here's the dataset. northwind.sql.

https://pastebin.com/bMgjXvfT

Objective:

Write a query to get the order ID, customer's name, grand total of each order, and the name of the employee who handled each order. See below the expected first four rows.

Output should look like this.

image.thumb.png.e0d3f59fd08dbd9ea6bbc8eca022b427.png

This is the database schema diagram.

https://brucebauer.info/assets/ITEC3610/Northwind/Northwind-Sample-Database-Diagram.pdf

It's northwind.sql database.

This is my query.

SELECT 

    o.orderid,

    c.contactname,

    SUM(od.unitprice * od.quantity),

    CONCAT(e.lastname, ' ', e.firstname) AS emp_name

FROM

    orders o

        INNER JOIN

    customers c ON o.customerid = c.customerid

        INNER JOIN

    orderdetails od ON o.orderid = od.OrderID

        INNER JOIN

    employees e ON o.EmployeeID = e.EmployeeID

GROUP BY emp_name

ORDER BY orderid

LIMIT 4;

But it was producing an error.


Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'northwind.o.OrderID' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

But if I order by orderid alone, it'll work, Why does it work? Should not it be producing the same error as above as not all non-aggregated query aren't present in group by clause ?

What's the computer science(Database concept) behind all this? Can anyone explain it? 

Link to comment
Share on other sites

An employee can have many orderid's.

When you group by employee you get one row per employee and, therefore, only one of the orderid values would be shown. As this could be from any of the many orders its specific value is meaningless.

On the other hand, gouping by orderid means there will be a unique employee.

In standared SQL, the rule is enforced that only columns in the group by clause, or that are being aggregated, can be selected.

Link to comment
Share on other sites

7 minutes ago, polaryeti said:

So that means we should always GROUP BY Primary  key?

And what erroneous logic path leads you to that conclusion?

 

8 minutes ago, polaryeti said:

Why is such rule enforced in  SQL? What's the benefit of such rule?

As stated above - to prevent the inclusion of meaningless values in results.

 

https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html

Link to comment
Share on other sites

Quote

And what erroneous logic path leads you to that conclusion?

Because you said you group by orderid...What's the rule of thumb for GROUP BY? Or an intituition of what I should be using in GROUP BY? Can you elaborate?

Edited by polaryeti
Link to comment
Share on other sites

You group by whatever you want to aggregate by.

If you want total monthly sales, say, then you'd have something like

SELECT YEAR(sales_date) as year
     , MONTH(sales_date) as month
     , SUM(sales_amount) as total
FROM sales
GROUP BY year, month

If you wanted customer totals

SELECT customer_id
     , SUM(sales_amount) as total
FROM sales
GROUP BY customer_id

Neither customer_id nor sales_date is the primary key of the sales table.

Link to comment
Share on other sites

26 minutes ago, kicken said:

That when you use GROUP BY, the only columns that can be in your SELECT clause are

  1. Also listed in your group by clause or
  2. Used with an aggregate function so that multiple values get condensed down the a single value.

 

But if I order by orderid alone, it'll work, Why does it work? Should not it be producing the same error as above as not all non-aggregated query aren't present in group by clause ? Why does it work? Why rule of thumb not work here?

Link to comment
Share on other sites

22 hours ago, polaryeti said:

But if I order by orderid alone, it'll work, Why does it work?

As far as I know, the standard allows for columns that are directly dependent on the group by columns to be used as well.  For example, since orders.OrderId is a primary key, any other column in the orders table could be selected as well since there will only be one value for any given order id.  The same would be true for customers.CustomerId and employees.EmployeeId.  The way you are doing the joins would ensure there's only a single values from those tables for any given order ID, so in theory you could select any columns from those tables as well.

In practice, whether a DB supports this type of analysis of the situation or not can vary.  MySQL must support it to some extent.  Microsoft's SQL server doesn't support it at all so you have to follow the rule of thumb describe above.  I'm not sure about other DB systems.   Since support varies, the safest thing to do is just follow the above rule of thumb and limit the select to your group by columns and aggregate functions.

 

Link to comment
Share on other sites

Step back from "what works" and think about the fundamental ideas.

  • A query returns a result set that can have 0 .. n rows. 
  • If your original query did not group by, what would the result set have?
    • Because you have join from order to order_details, and the relationship is such that each order can have one-to-many order_details, you can end up with multiple rows for an order.
      • It's clear from the original question, that the desired result is to have one row per order

This is where GROUP BY comes in.  Whatever column(s) you group by, will create exactly one row

The evaluation of the grouping is left to right, by which I mean that you can group by multiple values, but the grouping starts with the first column, and each subsequent column in your group by, is evaluated only after the first/preceding grouping is done.  For the question you started with, you don't need additional grouping other than orderid.  

You should be able to determine this, because the aggregation being done (SUM in this case) is being done on the order_detail price/quantity values.

The error you started with, is based on trying to order by emp_name, which is entirely different grouping. At the point that you GROUP the original result set by the names of the employees, you no longer are guaranteed to have one row for every order.  If an employee named "Bob" created 10 orders, there would again be one row for "Bob".

From the database standpoint, orderId is nonsensical.  Which of the 10 possible orderId values that were reduced to a single row, should be displayed with Bob's group?  

This is no different than if you were to GROUP By order_id, and try to include order_detailID.  It wouldn't make sense for the database to try and pick one order_detailID out of the set of related rows, so the database returns an error.

As Kicken pointed out, ANSI standard SQL has certain rules associated with it, but MySQL (and other relational databases which provide modality and configuration) will sometimes allow things that a particular standard doesn't.

MySQL, for example, has a mode that allowed you to specify in your select list, columns that weren't specified in the group by, or in aggregation functions.  The manual talks about the way this works, and the specific default mode that drives how mysql group by works.

In your query, it is the c.contactname value that mysql is allowing you to select even though that column is not part of the group by with 

However, if you consider what I previously wrote about grouping, you would be able to get the same result by adding that column to the group by:

GROUP BY orderid, c.contactname, empname

  

Link to comment
Share on other sites

  • 2 weeks later...

I'm still not clear about this topic. This is confusing as hell. How do I select a column to group by? What should be the properties of that column? Is it necessary for that column I choose to group by to functionally determine other columns presented in the select query except the aggregated column(eg, column where count is being used or sum is being used)?

Link to comment
Share on other sites

  • Solution
13 hours ago, polaryeti said:

I'm still not clear about this topic. This is confusing as hell. How do I select a column to group by? What should be the properties of that column? Is it necessary for that column I choose to group by to functionally determine other columns presented in the select query except the aggregated column(eg, column where count is being used or sum is being used)?

 

The column(s) you select will generate one row(group) per unique value of that column.  

Simple example of orders, assuming the order contained a column for the country_code of the customer that placed the order.

You want a result with the "Total value of orders by country".

So you would want to GROUP BY country_code.  

Your result set will then have 1 row for each country, but a SUM(amount), of course will provide you the total value of orders for that country

Let's say instead, you want a sum of orders by country, but you want the total for that country by year.  

SELECT country_code, YEAR(order_date) as year, SUM(amount)
FROM ORDERS
GROUP BY country_code, YEAR(order_date)
ORDER BY country_code, year

At that point you're going to get a row for every country_code/year combination, and the SUM(amount) is relative to that grouping.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.