August 18, 2003 at 5:49 am
Hello SQL-gurus everywhere,
I was asked to extract some data from the order tables, something like the small Northwind example below.
Give the numver of orders before and after a certain date, and the difference between the results. The output should be 1 line per customer.
Some questions:
- is this an acceptable query? If not, could you give me a clue for a better solution
- is there an alternative way to determine the value of the difference column?
use Northwind
go
select customerID,
( select count(orderID) from orders
where orderdate < '1997-01-01'
and orders.customerID = customers.customerID ) AS data_1996,
( select count(orderID) from orders
where orderdate > '1997-01-01'
and orders.customerID = customers.customerID ) AS data_1997,
ABS( ( select count(orderID) from orders
where orderdate < '1997-01-01'
and orders.customerID = customers.customerID ) -
( select count(orderID) from orders
where orderdate > '1997-01-01'
and orders.customerID = customers.customerID ) ) AS diff
from customers
go
Thank for any input,
Gerry
August 18, 2003 at 6:05 am
Try this out..
select customers.customerID,sum (case when orderdate < '1997-01-01' then 1 else 0 end) as a,
sum (case when orderdate > '1997-01-01' then 1 else 0 end) as b ,
abs(sum (case when orderdate < '1997-01-01' then 1 else 0 end)- sum (case when orderdate > '1997-01-01' then 1 else 0 end))
from customers,orders where
orders.customerID = customers.customerID
group by customers.customerid
This looks more readable and also the execution plan seems to suggest a better performance.
Relationships are like Banks.You Invest the Principal and then get the Interest.
He who knows others is learned but the wise one is one who knows himself.
August 18, 2003 at 7:32 am
Thank you for the illumination, Brokenrulz.
The use of the CASE statement is just what I needed.
Incidentally I needed the data from an undelying table, in my example [Order Details].UnitPrice.
When I tested both solutions against my own database (>200.000 items), I discovered that the 'dirty' script with the nested subqueries actually was faster than the clean solution you provided. But as you said, the use of CASE results in better readability.
Once again, thank you for your input.
In appreciation,
Gerry.
August 18, 2003 at 9:26 am
While you are testing for performance, try this. Wonder if it's better than CASE and the subqueries ...
SELECT M.customerID,
IsNull(T96.NumOrders, 0) AS data_1996,
IsNull(T97.NumOrders, 0) AS data_1997,
IsNull(T97.NumOrders,0) - IsNull(T96.NumOrders,0) AS Diff
FROM Orders M
LEFT OUTER JOIN
(SELECT CustomerId, Count(*) as NumOrders
FROM Orders
WHERE orderdate < '1997-01-01'
GROUP BY CustomerId) AS T96
ON M.CustomerId = T96.CustomerID
LEFT OUTER JOIN
(SELECT CustomerId, Count(*) as NumOrders
FROM Orders
WHERE orderdate >= '1997-01-01'
GROUP BY CustomerId) AS T97
ON M.CustomerId = T97.CustomerID
Edited by - npeeters on 08/18/2003 5:45:04 PM
August 19, 2003 at 5:22 am
I would go for something similar to that given by brokenrulz:
SELECT
o.customerID
,SUM(CASE WHEN o.orderdate < '1997-01-01' THEN 1 ELSE 0 END) AS data_1996
,SUM(CASE WHEN o.orderdate >= '1997-01-01' THEN 1 ELSE 0 END) AS data_1997
,ABS(SUM(CASE WHEN o.orderdate < '1997-01-01' THEN 1 ELSE -1 END)) AS diff
FROM orders o
INNER JOIN customers c
ON o.customerID = c.customerID
GROUP BY o.customerID
Reasons for the differences...
a) All of the data to be presented can be obtained from orders. By forcing SQL to get some data from customers (customerID and the GROUP BY) we may be causing a little more work to be done.
b) Only 2 categories of order data are required. Therefore the diff derivation can be simplified, reducing the CPU cost slightly. I am assuming that although orders on 1/1/1997 may be unlikely they should be included in 1997 data rather than totally ignored.
All information provided is a personal opinion that may not match reality.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
August 19, 2003 at 7:21 am
NPeeters and EdVassie,
Thank you both for your remarks.
NPeeters, you solution was almost identical as ours in that it results in a Cartesian product. That was the problem I was asked to fix, besides the slow performance. A distinct might solve the problem, but I will use outer joins only when forced 🙂
(Hm, might be a nice subject for a discussion)
EdVassie shows a nice way of determining the difference. By stating that I only need orderdata he demonstrates that my example was oversimplified. Of course I will need additional customerdata. Mea Culpa.
Once again, much appreciated
Gerry
August 19, 2003 at 8:50 am
Stupid me.
You will have to add a group by to the first 'table' in the FROM like so
...
FROM (SELECT CustomerID FROM Orders GROUP BY CustomerID) M
LEFT OUTER JOIN
...
That should solve the cartesian product stuff.
And obviously, you could use inner joins if you're sure that at least one record exists for each date range.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply