March 29, 2016 at 7:34 am
I have a concept question on this code:
SELECT cust_name,
cust_state,
(SELECT Count(*)
FROM orders
WHERE order.cust_id=customers.cust_id) as orders
FROM customers
ORDER BY cust_name;
I understand how the Count function works with it counting the rows that are matching with regards to customers.cust_id with orders.customers_id. What concern me if what happens if there are many cust_names that are the same. How does SQL serve know to match up which cust_name with the correct cust_id in order to have the count function match the correct count with the customer_id?
March 29, 2016 at 8:17 am
SQL 2016 (3/29/2016)
I have a concept question on this code:
SELECT cust_name,
cust_state,
(SELECT Count(*)
FROM orders
WHERE order.cust_id=customers.cust_id) as orders
FROM customers
ORDER BY cust_name;
I understand how the Count function works with it counting the rows that are matching with regards to customers.cust_id with orders.customers_id. What concern me if what happens if there are many cust_names that are the same. How does SQL serve know to match up which cust_name with the correct cust_id in order to have the count function match the correct count with the customer_id?
It doesn't. It knows to connect them based on cust_id, because you tell it to. Presumably, cust_id is the primary key in the customers table, so it uniquely identifies only one customer. It doesn't matter how many customers have the same name, because it's not using the name field to make the connection.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
March 29, 2016 at 8:29 am
Got it thanks for the insights : )
March 29, 2016 at 8:36 am
Quick thought, there is no need for the subquery as this can written as a normal query
😎
SELECT
C.cust_id
,C.cust_name
,C.cust_state
,COUNT(*) AS ORDERS
FROM customers C
LEFT OUTER JOIN [order] O
ON C.cust_id = O.cust_id
GROUP BY C.cust_id
,C.cust_name
,C.cust_state
Edit: missing C.cust_id in group by
March 29, 2016 at 8:46 am
Eirikur Eiriksson (3/29/2016)
Quick thought, there is no need for the subquery as this can written as a normal query😎
SELECT
C.cust_id
,C.cust_name
,C.cust_state
,COUNT(*) AS ORDERS
FROM customers C
LEFT OUTER JOIN [order] O
ON C.cust_id = O.cust_id
GROUP BY C.cust_name
,C.cust_state
First, that will fail, because you didn't include C.cust_id in the GROUP BY or an aggregate function. Secondly, it will probably be more efficient to group the orders before performing the join, especially if there is an index on the orders table that starts with custid.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
March 29, 2016 at 9:00 am
drew.allen (3/29/2016)
Secondly, it will probably be more efficient to group the orders before performing the join, especially if there is an index on the orders table that starts with custid.
If C.Cust_ID is unique, then the optimiser knows it only has to aggregate the Orders table by the joining column, and the two will likely be the same in terms of performance.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 29, 2016 at 9:24 am
drew.allen (3/29/2016)
Eirikur Eiriksson (3/29/2016)
Quick thought, there is no need for the subquery as this can written as a normal query😎
SELECT
C.cust_id
,C.cust_name
,C.cust_state
,COUNT(*) AS ORDERS
FROM customers C
LEFT OUTER JOIN [order] O
ON C.cust_id = O.cust_id
GROUP BY C.cust_name
,C.cust_state
First, that will fail, because you didn't include C.cust_id in the GROUP BY or an aggregate function. Secondly, it will probably be more efficient to group the orders before performing the join, especially if there is an index on the orders table that starts with custid.
Drew
thanks, missed the cost_id column.
😎
Pretty certain that the performance is almost identical if the right indices are in place, certainly will if the cust_id column is unique.
March 30, 2016 at 12:40 am
I probably think this query would perform better.
SELECT cust_name,
cust_state,
(SELECT Count(*)
FROM orders
WHERE order.cust_id=customers.cust_id) as orders
FROM customers
ORDER BY cust_name;
Because the aggregate function would be executed before the physical join on customer table takes place.Means a distinct count along with customerid from the Orders table would be returned before being joined to the Customers table.This would drastically reduce the no of rows for the outer input of the join.
While the other query will have to first do a complete join on all the possible rows from the two tables and then sort this output(in case optimizer decides to use a stream aggregate) to perform an aggregate function to return the count value.
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
March 30, 2016 at 5:10 am
Frankly, I don't understand why anyone would be interested to compare performance between two queries that do not return the same results. At most one of them can be correct. I don't care about performance of incorrect queries.
March 31, 2016 at 9:35 am
Hugo Kornelis (3/30/2016)
Frankly, I don't understand why anyone would be interested to compare performance between two queries that do not return the same results. At most one of them can be correct. I don't care about performance of incorrect queries.
Good point -- it would appear that the original query would return Orders as 0 for any client with no orders but the suggested replacement query will surely return 1, would it not?
Nice catch.
- Les
March 31, 2016 at 7:28 pm
Good point -- it would appear that the original query would return Orders as 0 for any client with no orders but the suggested replacement query will surely return 1, would it not?
Yes and that's the only difference.The first query will return 0 for customers having no orders while the second one will return 1.The order count for the remaining records and the total no of rows returned by both the queries will always be the same, not a single record more or less.
Even if we fix the second query to fix the above problem.
SELECT
C.cust_name
,C.cust_state
,COUNT(O.cust_id) AS ORDERS
FROM customers C
LEFT OUTER JOIN [order] O
ON C.cust_id = O.cust_id
GROUP BY C.cust_name
,C.cust_state
The first query will always scale over it.
EDIT : Fixed a typo in the query
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply