Using Subqueries as Calculated Fields

Question

Using Subqueries as Calculated Fields

SQL 2016

SSC-Addicted

Points: 457
More actions
March 29, 2016 at 7:34 am

#305990

I have a concept question on this code:
SELECT cust_name,
cust_state,
(SELECT Count(*)
FROM orders
WHERE order.cust_id=customers.cust_id) as orders
FROM customers
ORDER BY cust_name;
I understand how the Count function works with it counting the rows that are matching with regards to customers.cust_id with orders.customers_id. What concern me if what happens if there are many cust_names that are the same. How does SQL serve know to match up which cust_name with the correct cust_id in order to have the count function match the correct count with the customer_id?

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 77014 More actions · Answer 1

SQL 2016 (3/29/2016)
I have a concept question on this code:
SELECT cust_name,
cust_state,
(SELECT Count(*)
FROM orders
WHERE order.cust_id=customers.cust_id) as orders
FROM customers
ORDER BY cust_name;
I understand how the Count function works with it counting the rows that are matching with regards to customers.cust_id with orders.customers_id. What concern me if what happens if there are many cust_names that are the same. How does SQL serve know to match up which cust_name with the correct cust_id in order to have the count function match the correct count with the customer_id?

It doesn't. It knows to connect them based on cust_id, because you tell it to. Presumably, cust_id is the primary key in the customers table, so it uniquely identifies only one customer. It doesn't matter how many customers have the same name, because it's not using the name field to make the connection.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

SQL 2016 SSC-Addicted Points: 457 More actions · Answer 2

SQL 2016

SSC-Addicted

Points: 457

March 29, 2016 at 8:29 am

#1868258

Got it thanks for the insights : )

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 3

Quick thought, there is no need for the subquery as this can written as a normal query

😎

SELECT

C.cust_id

,C.cust_name

,C.cust_state

,COUNT(*) AS ORDERS

FROM customers C

LEFT OUTER JOIN [order] O

ON C.cust_id = O.cust_id

GROUP BY C.cust_id

,C.cust_name

,C.cust_state

Edit: missing C.cust_id in group by

drew.allen SSC Guru Points: 77014 More actions · Answer 4

Eirikur Eiriksson (3/29/2016)
Quick thought, there is no need for the subquery as this can written as a normal query
😎
SELECT
C.cust_id
,C.cust_name
,C.cust_state
,COUNT(*) AS ORDERS
FROM customers C
LEFT OUTER JOIN [order] O
ON C.cust_id = O.cust_id
GROUP BY C.cust_name
,C.cust_state

First, that will fail, because you didn't include C.cust_id in the GROUP BY or an aggregate function. Secondly, it will probably be more efficient to group the orders before performing the join, especially if there is an index on the orders table that starts with custid.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

drew.allen (3/29/2016)
Secondly, it will probably be more efficient to group the orders before performing the join, especially if there is an index on the orders table that starts with custid.

If C.Cust_ID is unique, then the optimiser knows it only has to aggregate the Orders table by the joining column, and the two will likely be the same in terms of performance.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 6

drew.allen (3/29/2016)
Eirikur Eiriksson (3/29/2016)
Quick thought, there is no need for the subquery as this can written as a normal query
😎
SELECT
C.cust_id
,C.cust_name
,C.cust_state
,COUNT(*) AS ORDERS
FROM customers C
LEFT OUTER JOIN [order] O
ON C.cust_id = O.cust_id
GROUP BY C.cust_name
,C.cust_state
First, that will fail, because you didn't include C.cust_id in the GROUP BY or an aggregate function. Secondly, it will probably be more efficient to group the orders before performing the join, especially if there is an index on the orders table that starts with custid.
Drew

thanks, missed the cost_id column.

😎

Pretty certain that the performance is almost identical if the right indices are in place, certainly will if the cust_id column is unique.

Sachin Nandanwar SSCertifiable Points: 7699 More actions · Answer 7

I probably think this query would perform better.

SELECT cust_name,

cust_state,

(SELECT Count(*)

FROM orders

WHERE order.cust_id=customers.cust_id) as orders

FROM customers

ORDER BY cust_name;

Because the aggregate function would be executed before the physical join on customer table takes place.Means a distinct count along with customerid from the Orders table would be returned before being joined to the Customers table.This would drastically reduce the no of rows for the outer input of the join.

While the other query will have to first do a complete join on all the possible rows from the two tables and then sort this output(in case optimizer decides to use a stream aggregate) to perform an aggregate function to return the count value.

--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 8

Frankly, I don't understand why anyone would be interested to compare performance between two queries that do not return the same results. At most one of them can be correct. I don't care about performance of incorrect queries.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

lnoland SSCommitted Points: 1783 More actions · Answer 9

Hugo Kornelis (3/30/2016)
Frankly, I don't understand why anyone would be interested to compare performance between two queries that do not return the same results. At most one of them can be correct. I don't care about performance of incorrect queries.

Good point -- it would appear that the original query would return Orders as 0 for any client with no orders but the suggested replacement query will surely return 1, would it not?

Nice catch.

- Les

Sachin Nandanwar SSCertifiable Points: 7699 More actions · Answer 10

Good point -- it would appear that the original query would return Orders as 0 for any client with no orders but the suggested replacement query will surely return 1, would it not?

Yes and that's the only difference.The first query will return 0 for customers having no orders while the second one will return 1.The order count for the remaining records and the total no of rows returned by both the queries will always be the same, not a single record more or less.

Even if we fix the second query to fix the above problem.

SELECT

C.cust_name

,C.cust_state

,COUNT(O.cust_id) AS ORDERS

FROM customers C

LEFT OUTER JOIN [order] O

ON C.cust_id = O.cust_id

GROUP BY C.cust_name

,C.cust_state

The first query will always scale over it.

EDIT : Fixed a typo in the query

--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server