April 7, 2015 at 1:39 pm
I need to calculate the average number of days between customer orders.
my table has customer number and order number and the order date.
I would like to end with customer number last, order date and the average number of days between orders for each customer.
cust_idorder_numorder_date
HOLLGCAB 119482 02/27/2015
JILCO 119484 02/27/2015
KEY 119491 02/27/2015
TURNER 119496 02/27/2015
KEY 119499 02/27/2015
KEY 119500 02/27/2015
EASTSIDE 119486 02/27/2015
FE BELLA 119485 02/27/2015
EASTSIDE 119487 02/27/2015
TYCA 119488 02/27/2015
EASTSIDE 119489 02/27/2015
BOVE 119490 02/27/2015
HENRY 119492 02/27/2015
SUNLINE 119495 02/27/2015
BRUNO 119365 03/02/2015
April 7, 2015 at 1:44 pm
Quick suggestion, can you edit your post so the DDL and sample data is runnable? (create table and insert)
😎
April 7, 2015 at 2:02 pm
April 7, 2015 at 2:08 pm
pietlinden (4/7/2015)
Found an article at SQLAuthority that covers your options:Simulate LEAD() and LAG() without Using SQL Server 2012 Analytic Function[/url]
This is 2008 forum 😉
-- Itzik Ben-Gan 2001
April 7, 2015 at 2:12 pm
the results should be on a per customer id bases not between customers so each customer id will have its own average.
April 7, 2015 at 2:22 pm
Alan,
Yup, I noticed that... The link I posted is to an article by Pinal Dave simulating LAG/LEAD in 2008. Otherwise, I would have posted a query.
I guess he could also dig through Itzik's book on SQL 2008....
April 7, 2015 at 2:33 pm
Luis Cazares (4/7/2015)
Alan.B (4/7/2015)
pietlinden (4/7/2015)
Found an article at SQLAuthority that covers your options:Simulate LEAD() and LAG() without Using SQL Server 2012 Analytic Function[/url]
This is 2008 forum 😉
That's why it simulates LEAD and LAG instead of using them. 😉
I have smacked myself in the head to save everyone else the trouble. (not my day)...
Here's my solution (I have not looked at the link [obviously])
/* SAMPLE DATE */
-- create an orders table
DECLARE @orders TABLE (cust_id int,order_num varchar(36),order_date date);
-- note: don't need an orderid but I'm including one anyhow
INSERT INTO @orders VALUES
(1, newid(), '1/1/2014'),
(1, newid(), '1/6/2014'),
(1, newid(), '1/12/2014'),
(1, newid(), '1/15/2014'),
(2, newid(), '1/1/2014'),
(2, newid(), '1/10/2014'),
(2, newid(), '1/21/2014'),
(3, newid(), '1/1/2014'),
(3, newid(), '1/20/2014'),
(3, newid(), '1/30/2014');
--SELECT * FROM @orders;
/* SOLUTION */
WITH
orders AS -- add a rownumber to @orders
(
SELECT rn = ROW_NUMBER() over (partition by cust_id order by order_date), *
FROM @orders
),
prev_orders AS -- get previous orders for that customer
(
SELECT o1.*, order_date_prev = o2.order_date
FROM orders o1
JOIN orders o2
ON o1.rn = o2.rn+1 AND o1.cust_id = o2.cust_id
),
days_between AS -- calculate days between customers' orders
(
SELECT cust_id, Avg_days_between_orders = AVG(0.+DATEDIFF(DD,order_date_prev,order_date)) OVER (Partition by cust_id)
FROM prev_orders
)
-- The max technique should be better than doing a DISTINCT
SELECT cust_id, Avg_days_between_orders = MAX(Avg_days_between_orders)
FROM days_between
GROUP BY cust_id;
Edit: added 0.+ to datediff formula to deal with truncation.
-- Itzik Ben-Gan 2001
April 7, 2015 at 2:41 pm
pietlinden (4/7/2015)
Alan,Yup, I noticed that... The link I posted is to an article by Pinal Dave simulating LAG/LEAD in 2008. Otherwise, I would have posted a query.
I guess he could also dig through Itzik's book on SQL 2008....
I just finished reading the link you posted and it's actually very good. I need to play with some of the techniques posted and comparing them to the self-join technique I just posted. The "No Joins and No Analytic Functions" solution by Geri Reshefis looks like a really excellent solution.
-- Itzik Ben-Gan 2001
April 7, 2015 at 2:41 pm
Alan.B (4/7/2015)
I have smacked myself in the head to save everyone else the trouble. (not my day)...
Happy to help, need a HAND!:-P
😎
April 7, 2015 at 6:56 pm
Using Alan.B's table variable and sample data:
WITH orders AS -- add a rownumber to @orders
(
SELECT rn1=rn, rn2=rn+1, cust_id, order_date
FROM
(
SELECT rn = ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY order_date), *
FROM @orders
) a
)
SELECT cust_id, Avg_days_between_orders=AVG(0.+x)
FROM
(
SELECT cust_id, x=DATEDIFF(day, MIN(order_date), MAX(order_date))
FROM
(
SELECT cust_id, order_date, rn
FROM orders a
CROSS APPLY (VALUES(rn1),(rn2)) b(rn)
) a
GROUP BY cust_id, rn
HAVING COUNT(*) = 2
) a
GROUP BY cust_id;
On a heap, this is likely to perform a little better because it does a single table scan and one sort as opposed to 2 for Alan.B's solution. With Indexing however the self-join is likely to work better.
Alan.B - You did notice that your solution truncates the average number of days right?
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 7, 2015 at 8:32 pm
Alan.B - You did notice that your solution truncates the average number of days right?
Yep. I'm out of gas for today. I'll post a correct solution in the morning.
-- Itzik Ben-Gan 2001
April 8, 2015 at 5:48 am
Excuse my Homer Simpson moment there! Doh! :w00t:
SELECT cust_id
,Avg_days_between_orders=DATEDIFF(day, MIN(order_date), MAX(order_date))/(COUNT(*)-1.)
FROM @orders
GROUP BY cust_id
HAVING COUNT(*) > 1;
Edit: Added divide by zero protection (HAVING).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 8, 2015 at 8:29 am
One more question I need to add the last order date to this and then add the AVg_days_between_orders to it so that I get Last_order_Date + Avg_days_between_orders
WITH orders
AS -- add a rownumber to @orders
(
(
SELECT rn = (
ROW_NUMBER() OVER (
PARTITION BY ompCustomerOrganizationID ORDER BY ompcreateddate
)
)
,s.ompSalesOrderID
,s.ompCustomerOrganizationID
,ompcreateddate
,convert(VARCHAR(10), ompcreateddate, 101) AS order_dat
FROM SalesOrders s
WHERE ompCreatedDate >= '01-01-15'
)
)
,prev_orders AS -- get previous orders for that customer
(
SELECT o1.*
,order_date_prev = o2.ompcreateddate
FROM orders o1
JOIN orders o2 ON o1.rn = o2.rn + 1
AND o1.ompCustomerOrganizationID = o2.ompCustomerOrganizationID
)
,days_between
AS -- calculate days between customers' orders
(
SELECT ompCustomerOrganizationID
,Avg_days_between_orders = AVG(DATEDIFF(DD, order_date_prev, ompcreateddate)) OVER (PARTITION BY ompCustomerOrganizationID)
FROM prev_orders
)
-- The max technique should be better than doing a DISTINCT
SELECT ompCustomerOrganizationID
,Avg_days_between_orders = MAX(Avg_days_between_orders)
FROM days_between
GROUP BY ompCustomerOrganizationID;
April 8, 2015 at 10:14 am
Kat - you probably want to look at Dwain's answer just above. You really do NOT need to calculate the # of days between EACH order (as you are doing), you just need the average.
Average is easiest computed as:
( # of day between the FIRST order in your timeframe and the LAST order in your timeframe) / (total number of order within the timeframe-1)
for any customer with more than one order in the timeframe. You'd also want to clear up with the business what they intend to do when a custom has NO orders or just 1 one, since the calucation you requested could return something they're not expecting.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply