Join date ranges with data from 2 tables

  • Hello:
    I have 2 tables with data as below:

    create table #product(product_id int, product_name varchar(50), regular_price money, start_dt datetime)
    create table #product_promotions(product_id int, promo_price money, start_dt datetime, end_dt datetime)

    insert into #product (product_id, product_name, regular_price, start_dt)
    VALUES ( 1, 'Calculator', $50, '1/1/2010'),
        ( 2, 'Cabinet', $100, '1/1/2011' ),
        ( 3, 'Chair', $200, '1/1/2012');

    insert into #product_promotions(product_id, promo_price, start_dt, end_dt)
    values
    (1, $40, '5/1/2010', '5/31/2010'),
    (1, $30, '10/1/2011', '10/31/2011'),
    (1, $20, '1/1/2018', '1/31/2018')

    insert into #product_promotions(product_id, promo_price, start_dt, end_dt)
    values
    (2, $90, '5/1/2010', '5/31/2010'),
    (2, $80, '10/1/2011', '10/31/2011'),
    (2, $70, '1/1/2018', '1/31/2018')

    select * from #product
    select * from #product_promotions

    As you see there are products with regular price and every now and then there are promo prices for the products. Now, I need to get one table with all the prices for the product with the date ranges as shown in the result below.

    What's the best way to get the above data using, CTE or LEAD (if that's better). I'm using SQL Server 2016.

  • The method needed is to first generate a list of dates and prices, and then group those values by product and the fact that the price changed between a pair of dates.   The latter can be represented by subtracting a ROW_NUMBER() from another wider ROW_NUMBER() to develop a group value.   Not sure if I have it set up perfectly, but it does appear to work.  Would highly recommend this be thoroughly tested.
    CREATE TABLE #product (
        product_id int,
        product_name varchar(50),
        regular_price money,
        start_dt datetime
    );
    CREATE TABLE #product_promotions (
        product_id int,
        promo_price money,
        start_dt datetime,
        end_dt datetime
    );
    INSERT INTO #product (product_id, product_name, regular_price, start_dt)
        VALUES    ( 1, 'Calculator', $50, '1/1/2010'),
                ( 2, 'Cabinet', $100, '1/1/2011' ),
                ( 3, 'Chair', $200, '1/1/2012');

    INSERT INTO #product_promotions (product_id, promo_price, start_dt, end_dt)
        VALUES    (1, $40, '5/1/2010', '5/31/2010'),
                (1, $30, '10/1/2011', '10/31/2011'),
                (1, $20, '1/1/2018', '1/31/2018'),
                (2, $90, '5/1/2010', '5/31/2010'),
                (2, $80, '10/1/2011', '10/31/2011'),
                (2, $70, '1/1/2018', '1/31/2018');

    DECLARE @MIN_DATE AS date = (
        SELECT MIN(S.start_dt)
        FROM (
                SELECT start_dt
                FROM #product
                UNION ALL
                SELECT start_dt
                FROM #product_promotions
            ) AS S
        );

    DECLARE @MAX_DATE AS date = (
        SELECT MAX(SE.start_dt)
        FROM (
                SELECT start_dt
                FROM #product
                UNION ALL
                SELECT start_dt
                FROM #product_promotions
                UNION ALL
                SELECT end_dt AS start_dt
                FROM #product_promotions
                UNION ALL
                SELECT GETDATE() AS start_dt
            ) AS SE
        );

    WITH E1 AS (

        SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ),
        TALLY AS (

            SELECT TOP(DATEDIFF(day, @MIN_DATE, @MAX_DATE) + 1)
                ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 AS NumDays
            FROM E1 AS A
            CROSS APPLY E1 AS B
            CROSS APPLY E1 AS C
            CROSS APPLY E1 AS D
    ),
        ALL_DATES AS (

            SELECT DATEADD(day, T.NumDays, @MIN_DATE) AS THE_DATE
            FROM TALLY AS T
    ),
        CURRENT_PRICES AS (

            SELECT
                AD.THE_DATE,
                P.product_id,
                P.product_name,
                CASE
                    WHEN PP.product_id IS NOT NULL THEN PP.promo_price
                    ELSE P.regular_price
                END AS CurrentPrice,
                ROW_NUMBER() OVER(PARTITION BY P.product_id, P.product_name ORDER BY AD.THE_DATE) -
                    ROW_NUMBER() OVER(
                        PARTITION BY P.product_id, P.product_name,
                            CASE
                                WHEN PP.product_id IS NULL THEN 1
                                ELSE 0
                            END
                        ORDER BY AD.THE_DATE) AS GrpNum
            FROM #product AS P
            INNER JOIN ALL_DATES AS AD
                ON P.start_dt <= AD.THE_DATE
            LEFT OUTER JOIN #product_promotions AS PP
                ON P.product_id = PP.product_id
                AND AD.THE_DATE BETWEEN PP.start_dt AND PP.end_dt
            --ORDER BY
            --    P.product_id,
            --    AD.THE_DATE;
    )
    SELECT
        CP.product_id,
        CP.product_name,
        CP.CurrentPrice,
        MIN(CP.THE_DATE) AS MIN_DATE,
        MAX(CP.THE_DATE) AS MAX_DATE
    FROM CURRENT_PRICES AS CP
    GROUP BY
        CP.product_id,
        CP.product_name,
        CP.CurrentPrice,
        CP.GrpNum
    ORDER BY
        CP.product_id,
        CP.product_name,
        MIN(CP.THE_DATE);

    DROP TABLE #product;
    DROP TABLE #product_promotions;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I'd certainly use a calendar table here to help set ranges. A tally table in a CTE will help build this. I'd then have to put in CASEs that would get the regular price except when there are intervals with matching values.

    This is kind of an islands and gaps problem. We have an article on the site for that: http://www.sqlservercentral.com/search/?q=islands+and+gaps&t=a&sort=relevance

  • sgmunson - Monday, September 10, 2018 7:40 AM

    The method needed is to first generate a list of dates and prices, and then group those values by product and the fact that the price changed between a pair of dates.   The latter can be represented by subtracting a ROW_NUMBER() from another wider ROW_NUMBER() to develop a group value.   Not sure if I have it set up perfectly, but it does appear to work.  Would highly recommend this be thoroughly tested.
    CREATE TABLE #product (
        product_id int,
        product_name varchar(50),
        regular_price money,
        start_dt datetime
    );
    CREATE TABLE #product_promotions (
        product_id int,
        promo_price money,
        start_dt datetime,
        end_dt datetime
    );
    INSERT INTO #product (product_id, product_name, regular_price, start_dt)
        VALUES    ( 1, 'Calculator', $50, '1/1/2010'),
                ( 2, 'Cabinet', $100, '1/1/2011' ),
                ( 3, 'Chair', $200, '1/1/2012');

    INSERT INTO #product_promotions (product_id, promo_price, start_dt, end_dt)
        VALUES    (1, $40, '5/1/2010', '5/31/2010'),
                (1, $30, '10/1/2011', '10/31/2011'),
                (1, $20, '1/1/2018', '1/31/2018'),
                (2, $90, '5/1/2010', '5/31/2010'),
                (2, $80, '10/1/2011', '10/31/2011'),
                (2, $70, '1/1/2018', '1/31/2018');

    DECLARE @MIN_DATE AS date = (
        SELECT MIN(S.start_dt)
        FROM (
                SELECT start_dt
                FROM #product
                UNION ALL
                SELECT start_dt
                FROM #product_promotions
            ) AS S
        );

    DECLARE @MAX_DATE AS date = (
        SELECT MAX(SE.start_dt)
        FROM (
                SELECT start_dt
                FROM #product
                UNION ALL
                SELECT start_dt
                FROM #product_promotions
                UNION ALL
                SELECT end_dt AS start_dt
                FROM #product_promotions
                UNION ALL
                SELECT GETDATE() AS start_dt
            ) AS SE
        );

    WITH E1 AS (

        SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ),
        TALLY AS (

            SELECT TOP(DATEDIFF(day, @MIN_DATE, @MAX_DATE) + 1)
                ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 AS NumDays
            FROM E1 AS A
            CROSS APPLY E1 AS B
            CROSS APPLY E1 AS C
            CROSS APPLY E1 AS D
    ),
        ALL_DATES AS (

            SELECT DATEADD(day, T.NumDays, @MIN_DATE) AS THE_DATE
            FROM TALLY AS T
    ),
        CURRENT_PRICES AS (

            SELECT
                AD.THE_DATE,
                P.product_id,
                P.product_name,
                CASE
                    WHEN PP.product_id IS NOT NULL THEN PP.promo_price
                    ELSE P.regular_price
                END AS CurrentPrice,
                ROW_NUMBER() OVER(PARTITION BY P.product_id, P.product_name ORDER BY AD.THE_DATE) -
                    ROW_NUMBER() OVER(
                        PARTITION BY P.product_id, P.product_name,
                            CASE
                                WHEN PP.product_id IS NULL THEN 1
                                ELSE 0
                            END
                        ORDER BY AD.THE_DATE) AS GrpNum
            FROM #product AS P
            INNER JOIN ALL_DATES AS AD
                ON P.start_dt <= AD.THE_DATE
            LEFT OUTER JOIN #product_promotions AS PP
                ON P.product_id = PP.product_id
                AND AD.THE_DATE BETWEEN PP.start_dt AND PP.end_dt
            --ORDER BY
            --    P.product_id,
            --    AD.THE_DATE;
    )
    SELECT
        CP.product_id,
        CP.product_name,
        CP.CurrentPrice,
        MIN(CP.THE_DATE) AS MIN_DATE,
        MAX(CP.THE_DATE) AS MAX_DATE
    FROM CURRENT_PRICES AS CP
    GROUP BY
        CP.product_id,
        CP.product_name,
        CP.CurrentPrice,
        CP.GrpNum
    ORDER BY
        CP.product_id,
        CP.product_name,
        MIN(CP.THE_DATE);

    DROP TABLE #product;
    DROP TABLE #product_promotions;

    Steve,
    This looks promising.. I also have a calendar table (with all the dates filled in for 30 years). I'm guessing in this case, I will simply use that calendar table instead of the "tally" table?

  • ganeshmuthuvelu - Monday, September 10, 2018 8:14 AM

    Steve,
    This looks promising.. I also have a calendar table (with all the dates filled in for 30 years). I'm guessing in this case, I will simply use that calendar table instead of the "tally" table?

    Exactly.   Most "gap and island" types of problems become solvable when you're able to subtract a narrower ROW_NUMBER() from a wider range ROW_NUMBER().   In this case, the key event is that within a date sequence, the price changes.   Thus I needed to generate both of those row numbers and do the subtraction.   That's easiest when you have a complete date range, such as a calendar table, or CTE Tally table to generate the list of dates.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Monday, September 10, 2018 9:47 AM

    Exactly.   Most "gap and island" types of problems become solvable when you're able to subtract a narrower ROW_NUMBER() from a wider range ROW_NUMBER().   In this case, the key event is that within a date sequence, the price changes.   Thus I needed to generate both of those row numbers and do the subtraction.   That's easiest when you have a complete date range, such as a calendar table, or CTE Tally table to generate the list of dates.

    Except that this is a packing intervals problem, not a gaps and islands problem.  The two are related, but different.  Here is a solution that approaches it as a packing intervals problem.  It performs better than the gaps and islands solution on this small set of data, mostly because it only requires one sort whereas the gaps and islands requires four.

    ;
    WITH product_dates AS
    (
       
        SELECT p.product_id, CAST(d.dt AS DATE) AS dt
        FROM #product p
        LEFT OUTER JOIN #product_promotions pp
            ON pp.product_id = p.product_id
                AND pp.end_dt > p.start_dt
        CROSS APPLY ( VALUES(p.start_dt), (pp.start_dt), (DATEADD(DAY, 1, pp.end_dt))) d(dt)
        WHERE d.dt IS NOT NULL
    )
    , product_intervals AS
    (
        SELECT pd.product_id, pd.dt AS start_dt, LEAD(pd.dt, 1, '9999-12-30') OVER(PARTITION BY pd.product_id ORDER BY pd.dt) AS end_dt
        FROM product_dates pd
    )
    SELECT p.product_id, p.product_name, cp.current_price, i.start_dt, DATEADD(DAY, -1, NULLIF(i.end_dt, '9999-12-30')) AS end_dt
    FROM #product p
    INNER JOIN product_intervals AS i
        ON p.product_id = i.product_id
    CROSS APPLY
    (
        SELECT MIN(current_price) AS current_price
        FROM
        (
            SELECT p.regular_price AS current_price

            UNION ALL
            SELECT pp.promo_price
            FROM #product_promotions pp
            WHERE pp.start_dt < i.end_dt
                AND pp.end_dt > i.start_dt
        ) p
    ) cp -- current price

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply