Nested subqueries

Question

Post reply

Nested subqueries

gserdijn

Hall of Fame

Points: 3283
More actions
August 18, 2003 at 5:49 am

#160554

Hello SQL-gurus everywhere,
I was asked to extract some data from the order tables, something like the small Northwind example below.
Give the numver of orders before and after a certain date, and the difference between the results. The output should be 1 line per customer.
Some questions:
- is this an acceptable query? If not, could you give me a clue for a better solution
- is there an alternative way to determine the value of the difference column?
use Northwind
go
select customerID,
( select count(orderID) from orders
where orderdate < '1997-01-01'
and orders.customerID = customers.customerID ) AS data_1996,
( select count(orderID) from orders
where orderdate > '1997-01-01'
and orders.customerID = customers.customerID ) AS data_1997,
ABS( ( select count(orderID) from orders
where orderdate < '1997-01-01'
and orders.customerID = customers.customerID ) -
( select count(orderID) from orders
where orderdate > '1997-01-01'
and orders.customerID = customers.customerID ) ) AS diff
from customers
go
Thank for any input,
Gerry
Dutch Anti-RBAR League

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

brokenrulz SSCrazy Points: 2261 More actions · Answer 1

Try this out..

select customers.customerID,sum (case when orderdate < '1997-01-01' then 1 else 0 end) as a,

sum (case when orderdate > '1997-01-01' then 1 else 0 end) as b ,

abs(sum (case when orderdate < '1997-01-01' then 1 else 0 end)- sum (case when orderdate > '1997-01-01' then 1 else 0 end))

from customers,orders where

orders.customerID = customers.customerID

group by customers.customerid

This looks more readable and also the execution plan seems to suggest a better performance.

Relationships are like Banks.You Invest the Principal and then get the Interest.

He who knows others is learned but the wise one is one who knows himself.

gserdijn Hall of Fame Points: 3283 More actions · Answer 2

Thank you for the illumination, Brokenrulz.

The use of the CASE statement is just what I needed.

Incidentally I needed the data from an undelying table, in my example [Order Details].UnitPrice.

When I tested both solutions against my own database (>200.000 items), I discovered that the 'dirty' script with the nested subqueries actually was faster than the clean solution you provided. But as you said, the use of CASE results in better readability.

Once again, thank you for your input.

In appreciation,

Gerry.

Dutch Anti-RBAR League

NPeeters SSChampion Points: 12540 More actions · Answer 3

While you are testing for performance, try this. Wonder if it's better than CASE and the subqueries ...



SELECT M.customerID,

       IsNull(T96.NumOrders, 0) AS data_1996,

       IsNull(T97.NumOrders, 0) AS data_1997,

       IsNull(T97.NumOrders,0) - IsNull(T96.NumOrders,0) AS Diff

FROM Orders M

 LEFT OUTER JOIN

     (SELECT CustomerId, Count(*) as NumOrders

      FROM Orders

      WHERE orderdate < '1997-01-01'

      GROUP BY CustomerId) AS T96

 ON M.CustomerId = T96.CustomerID

 LEFT OUTER JOIN

     (SELECT CustomerId, Count(*) as NumOrders

      FROM Orders

      WHERE orderdate >= '1997-01-01'

      GROUP BY CustomerId) AS T97

 ON M.CustomerId = T97.CustomerID

Edited by - npeeters on 08/18/2003 5:45:04 PM

EdVassie SSC Guru Points: 60476 More actions · Answer 4

I would go for something similar to that given by brokenrulz:

SELECT

o.customerID

,SUM(CASE WHEN o.orderdate < '1997-01-01' THEN 1 ELSE 0 END) AS data_1996

,SUM(CASE WHEN o.orderdate >= '1997-01-01' THEN 1 ELSE 0 END) AS data_1997

,ABS(SUM(CASE WHEN o.orderdate < '1997-01-01' THEN 1 ELSE -1 END)) AS diff

FROM orders o

INNER JOIN customers c

ON o.customerID = c.customerID

GROUP BY o.customerID

Reasons for the differences...

a) All of the data to be presented can be obtained from orders. By forcing SQL to get some data from customers (customerID and the GROUP BY) we may be causing a little more work to be done.

b) Only 2 categories of order data are required. Therefore the diff derivation can be simplified, reducing the CPU cost slightly. I am assuming that although orders on 1/1/1997 may be unlikely they should be included in 1997 data rather than totally ignored.

All information provided is a personal opinion that may not match reality.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

gserdijn Hall of Fame Points: 3283 More actions · Answer 5

NPeeters and EdVassie,

Thank you both for your remarks.

NPeeters, you solution was almost identical as ours in that it results in a Cartesian product. That was the problem I was asked to fix, besides the slow performance. A distinct might solve the problem, but I will use outer joins only when forced 🙂

(Hm, might be a nice subject for a discussion)

EdVassie shows a nice way of determining the difference. By stating that I only need orderdata he demonstrates that my example was oversimplified. Of course I will need additional customerdata. Mea Culpa.

Once again, much appreciated

Gerry

Dutch Anti-RBAR League

NPeeters SSChampion Points: 12540 More actions · Answer 6

Stupid me.

You will have to add a group by to the first 'table' in the FROM like so



...

FROM (SELECT CustomerID FROM Orders GROUP BY CustomerID) M

  LEFT OUTER JOIN

...

That should solve the cartesian product stuff.

And obviously, you could use inner joins if you're sure that at least one record exists for each date range.