Average number of days between orders

  • I need to calculate the average number of days between customer orders.

    my table has customer number and order number and the order date.

    I would like to end with customer number last, order date and the average number of days between orders for each customer.

    cust_idorder_numorder_date

    HOLLGCAB 119482 02/27/2015

    JILCO 119484 02/27/2015

    KEY 119491 02/27/2015

    TURNER 119496 02/27/2015

    KEY 119499 02/27/2015

    KEY 119500 02/27/2015

    EASTSIDE 119486 02/27/2015

    FE BELLA 119485 02/27/2015

    EASTSIDE 119487 02/27/2015

    TYCA 119488 02/27/2015

    EASTSIDE 119489 02/27/2015

    BOVE 119490 02/27/2015

    HENRY 119492 02/27/2015

    SUNLINE 119495 02/27/2015

    BRUNO 119365 03/02/2015

  • Quick suggestion, can you edit your post so the DDL and sample data is runnable? (create table and insert)

    😎

  • Found an article at SQLAuthority that covers your options:

    Simulate LEAD() and LAG() without Using SQL Server 2012 Analytic Function[/url]

  • pietlinden (4/7/2015)


    Found an article at SQLAuthority that covers your options:

    Simulate LEAD() and LAG() without Using SQL Server 2012 Analytic Function[/url]

    This is 2008 forum 😉

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • the results should be on a per customer id bases not between customers so each customer id will have its own average.

  • Alan,

    Yup, I noticed that... The link I posted is to an article by Pinal Dave simulating LAG/LEAD in 2008. Otherwise, I would have posted a query.

    I guess he could also dig through Itzik's book on SQL 2008....

  • Alan.B (4/7/2015)


    pietlinden (4/7/2015)


    Found an article at SQLAuthority that covers your options:

    Simulate LEAD() and LAG() without Using SQL Server 2012 Analytic Function[/url]

    This is 2008 forum 😉

    That's why it simulates LEAD and LAG instead of using them. 😉

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (4/7/2015)


    Alan.B (4/7/2015)


    pietlinden (4/7/2015)


    Found an article at SQLAuthority that covers your options:

    Simulate LEAD() and LAG() without Using SQL Server 2012 Analytic Function[/url]

    This is 2008 forum 😉

    That's why it simulates LEAD and LAG instead of using them. 😉

    I have smacked myself in the head to save everyone else the trouble. (not my day)...

    Here's my solution (I have not looked at the link [obviously])

    /* SAMPLE DATE */

    -- create an orders table

    DECLARE @orders TABLE (cust_id int,order_num varchar(36),order_date date);

    -- note: don't need an orderid but I'm including one anyhow

    INSERT INTO @orders VALUES

    (1, newid(), '1/1/2014'),

    (1, newid(), '1/6/2014'),

    (1, newid(), '1/12/2014'),

    (1, newid(), '1/15/2014'),

    (2, newid(), '1/1/2014'),

    (2, newid(), '1/10/2014'),

    (2, newid(), '1/21/2014'),

    (3, newid(), '1/1/2014'),

    (3, newid(), '1/20/2014'),

    (3, newid(), '1/30/2014');

    --SELECT * FROM @orders;

    /* SOLUTION */

    WITH

    orders AS -- add a rownumber to @orders

    (

    SELECT rn = ROW_NUMBER() over (partition by cust_id order by order_date), *

    FROM @orders

    ),

    prev_orders AS -- get previous orders for that customer

    (

    SELECT o1.*, order_date_prev = o2.order_date

    FROM orders o1

    JOIN orders o2

    ON o1.rn = o2.rn+1 AND o1.cust_id = o2.cust_id

    ),

    days_between AS -- calculate days between customers' orders

    (

    SELECT cust_id, Avg_days_between_orders = AVG(0.+DATEDIFF(DD,order_date_prev,order_date)) OVER (Partition by cust_id)

    FROM prev_orders

    )

    -- The max technique should be better than doing a DISTINCT

    SELECT cust_id, Avg_days_between_orders = MAX(Avg_days_between_orders)

    FROM days_between

    GROUP BY cust_id;

    Edit: added 0.+ to datediff formula to deal with truncation.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • pietlinden (4/7/2015)


    Alan,

    Yup, I noticed that... The link I posted is to an article by Pinal Dave simulating LAG/LEAD in 2008. Otherwise, I would have posted a query.

    I guess he could also dig through Itzik's book on SQL 2008....

    I just finished reading the link you posted and it's actually very good. I need to play with some of the techniques posted and comparing them to the self-join technique I just posted. The "No Joins and No Analytic Functions" solution by Geri Reshefis looks like a really excellent solution.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (4/7/2015)


    I have smacked myself in the head to save everyone else the trouble. (not my day)...

    Happy to help, need a HAND!:-P

    😎

  • Using Alan.B's table variable and sample data:

    WITH orders AS -- add a rownumber to @orders

    (

    SELECT rn1=rn, rn2=rn+1, cust_id, order_date

    FROM

    (

    SELECT rn = ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY order_date), *

    FROM @orders

    ) a

    )

    SELECT cust_id, Avg_days_between_orders=AVG(0.+x)

    FROM

    (

    SELECT cust_id, x=DATEDIFF(day, MIN(order_date), MAX(order_date))

    FROM

    (

    SELECT cust_id, order_date, rn

    FROM orders a

    CROSS APPLY (VALUES(rn1),(rn2)) b(rn)

    ) a

    GROUP BY cust_id, rn

    HAVING COUNT(*) = 2

    ) a

    GROUP BY cust_id;

    On a heap, this is likely to perform a little better because it does a single table scan and one sort as opposed to 2 for Alan.B's solution. With Indexing however the self-join is likely to work better.

    Alan.B - You did notice that your solution truncates the average number of days right?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Alan.B - You did notice that your solution truncates the average number of days right?

    Yep. I'm out of gas for today. I'll post a correct solution in the morning.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Excuse my Homer Simpson moment there! Doh! :w00t:

    SELECT cust_id

    ,Avg_days_between_orders=DATEDIFF(day, MIN(order_date), MAX(order_date))/(COUNT(*)-1.)

    FROM @orders

    GROUP BY cust_id

    HAVING COUNT(*) > 1;

    Edit: Added divide by zero protection (HAVING).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • One more question I need to add the last order date to this and then add the AVg_days_between_orders to it so that I get Last_order_Date + Avg_days_between_orders

    WITH orders

    AS -- add a rownumber to @orders

    (

    (

    SELECT rn = (

    ROW_NUMBER() OVER (

    PARTITION BY ompCustomerOrganizationID ORDER BY ompcreateddate

    )

    )

    ,s.ompSalesOrderID

    ,s.ompCustomerOrganizationID

    ,ompcreateddate

    ,convert(VARCHAR(10), ompcreateddate, 101) AS order_dat

    FROM SalesOrders s

    WHERE ompCreatedDate >= '01-01-15'

    )

    )

    ,prev_orders AS -- get previous orders for that customer

    (

    SELECT o1.*

    ,order_date_prev = o2.ompcreateddate

    FROM orders o1

    JOIN orders o2 ON o1.rn = o2.rn + 1

    AND o1.ompCustomerOrganizationID = o2.ompCustomerOrganizationID

    )

    ,days_between

    AS -- calculate days between customers' orders

    (

    SELECT ompCustomerOrganizationID

    ,Avg_days_between_orders = AVG(DATEDIFF(DD, order_date_prev, ompcreateddate)) OVER (PARTITION BY ompCustomerOrganizationID)

    FROM prev_orders

    )

    -- The max technique should be better than doing a DISTINCT

    SELECT ompCustomerOrganizationID

    ,Avg_days_between_orders = MAX(Avg_days_between_orders)

    FROM days_between

    GROUP BY ompCustomerOrganizationID;

  • Kat - you probably want to look at Dwain's answer just above. You really do NOT need to calculate the # of days between EACH order (as you are doing), you just need the average.

    Average is easiest computed as:

    ( # of day between the FIRST order in your timeframe and the LAST order in your timeframe) / (total number of order within the timeframe-1)

    for any customer with more than one order in the timeframe. You'd also want to clear up with the business what they intend to do when a custom has NO orders or just 1 one, since the calucation you requested could return something they're not expecting.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply