Identifying the 3rd qualifying record

  • I have a scenario where the client wants to identify the 3rd successful attempt at contacting a customer within 5 days of each other.  So if we had the following:
    10/11
    10/12
    10/14

    10/14 would be the successful record and I would need to grab that date.  However we could have
    10/1
    10/2
    10/10
    10/11
    10/13

    And now the 5th attempt (10/13) would now be the third qualifying attempt.  I am a little stumped at the best way to approach this.

    Editing to add data:

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL

    DROP TABLE #mytable

    --===== Create the test table with

    CREATE TABLE #mytable

    (

    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, --Is an IDENTITY column on real table

    custID INT,

    DateValue DATETIME

    )

    --===== All Inserts into the IDENTITY column

    SET IDENTITY_INSERT #mytable ON

    --===== Insert the test data into the test table

    INSERT INTO #mytable

    (ID, custID, DateValue)

    SELECT '1','1','09/01/2018 12:00 AM' UNION ALL

    SELECT '2','1','09/02/2018 12:00 AM' UNION ALL

    SELECT '3','1','09/05/2018 12:00 AM' UNION ALL

    SELECT '4','1','10/01/2018 12:00 AM' UNION ALL

    SELECT '5','2','09/08/2018 12:00 AM' UNION ALL

    SELECT '6','2','09/09/2018 12:00 AM' UNION ALL

    SELECT '7','3','01/01/2018 12:00 AM' UNION ALL

    SELECT '8','3','01/04/2018 12:00 AM' UNION ALL

    SELECT '9','3','01/08/2018 12:00 AM' UNION ALL

    SELECT '10','4','05/01/2018 12:00 AM' UNION ALL

    SELECT '11','4','05/08/2018 12:00 AM' UNION ALL

    SELECT '12','4','06/01/2018 12:00 AM' UNION ALL

    SELECT '13','4','06/15/2018 12:00 AM' UNION ALL

    SELECT '14','5','10/01/2018 12:00 AM' UNION ALL

    SELECT '15','6','07/01/2018 12:00 AM' UNION ALL

    SELECT '16','6','07/02/2018 12:00 AM' UNION ALL

    SELECT '17','6','07/04/2018 12:00 AM' UNION ALL

    SELECT '18','6','07/05/2018 12:00 AM' UNION ALL

    SELECT '19','7','10/01/2018 12:00 AM' UNION ALL

    SELECT '20','7','10/05/2018 12:00 AM' UNION ALL

    SELECT '21','8','04/10/2018 12:00 AM' UNION ALL

    SELECT '22','8','04/11/2018 12:00 AM' UNION ALL

    SELECT '23','8','04/13/2018 12:00 AM' UNION ALL

    SELECT '24','9','05/01/2018 12:00 AM' UNION ALL

    SELECT '25','9','05/05/2018 12:00 AM' UNION ALL

    SELECT '26','9','05/19/2018 12:00 AM' UNION ALL

    SELECT '27','10','01/01/2018 12:00 AM' UNION ALL

    SELECT '28','10','01/04/2018 12:00 AM' UNION ALL

    SELECT '29','10','01/05/2018 12:00 AM' UNION ALL

    SELECT '30','11','01/05/2018 12:00 AM' UNION ALL

    SELECT '31','12','03/17/2018 12:00 AM' UNION ALL

    SELECT '32','12','03/18/2018 12:00 AM' UNION ALL

    SELECT '33','12','03/20/2018 12:00 AM' UNION ALL

    SELECT '34','13','02/19/2018 12:00 AM' UNION ALL

    SELECT '35','13','02/20/2018 12:00 AM' UNION ALL

    SELECT '36','14','01/15/2018 12:00 AM' UNION ALL

    SELECT '37','14','06/01/2018 12:00 AM' UNION ALL

    SELECT '38','14','06/04/2018 12:00 AM' UNION ALL

    SELECT '39','14','06/05/2018 12:00 AM' UNION ALL

    SELECT '40','14','07/15/2018 12:00 AM'

    --===== Set the identity insert back to normal

    SET IDENTITY_INSERT #mytable OFF

  • Sounds like a classic gaps and islands problem.  I don't have a tailor-made solution, I'm afraid, but the page I linked to should provide a few pointers.

    John

  • timothy.m.phillips - Monday, October 22, 2018 8:43 AM

    I have a scenario where the client wants to identify the 3rd successful attempt at contacting a customer within 5 days of each other.  So if we had the following:
    10/11
    10/12
    10/14

    10/14 would be the successful record and I would need to grab that date.  However we could have
    10/1
    10/2
    10/10
    10/11
    10/13

    And now the 5th attempt (10/13) would now be the third qualifying attempt.  I am a little stumped at the best way to approach this.

    Would you be able to create some test data in a consumable (ie, ready to paste into SSMS) format, please? Please see the link in my signature if you are unsure how to do this.
    People here enjoy solving puzzles like this – you'll soon have a working solution if you take the time to post the test data.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Monday, October 22, 2018 8:53 AM

    timothy.m.phillips - Monday, October 22, 2018 8:43 AM

    I have a scenario where the client wants to identify the 3rd successful attempt at contacting a customer within 5 days of each other.  So if we had the following:
    10/11
    10/12
    10/14

    10/14 would be the successful record and I would need to grab that date.  However we could have
    10/1
    10/2
    10/10
    10/11
    10/13

    And now the 5th attempt (10/13) would now be the third qualifying attempt.  I am a little stumped at the best way to approach this.

    Would you be able to create some test data in a consumable (ie, ready to paste into SSMS) format, please? Please see the link in my signature if you are unsure how to do this.
    People here enjoy solving puzzles like this – you'll soon have a working solution if you take the time to post the test data.

    And make sure your data includes edge cases such as

    10/1
    10/4
    10/8

    Where each of the dates is within five of the previous date, but the third date is not within five days of the first date.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Phil Parkin - Monday, October 22, 2018 8:53 AM

    timothy.m.phillips - Monday, October 22, 2018 8:43 AM

    I have a scenario where the client wants to identify the 3rd successful attempt at contacting a customer within 5 days of each other.  So if we had the following:
    10/11
    10/12
    10/14

    10/14 would be the successful record and I would need to grab that date.  However we could have
    10/1
    10/2
    10/10
    10/11
    10/13

    And now the 5th attempt (10/13) would now be the third qualifying attempt.  I am a little stumped at the best way to approach this.

    Would you be able to create some test data in a consumable (ie, ready to paste into SSMS) format, please? Please see the link in my signature if you are unsure how to do this.
    People here enjoy solving puzzles like this – you'll soon have a working solution if you take the time to post the test data.

    Thanks.  I have added data to the original post following the format on the link in your signature.

  • timothy.m.phillips - Tuesday, October 23, 2018 2:52 PM

    Phil Parkin - Monday, October 22, 2018 8:53 AM

    timothy.m.phillips - Monday, October 22, 2018 8:43 AM

    I have a scenario where the client wants to identify the 3rd successful attempt at contacting a customer within 5 days of each other.  So if we had the following:
    10/11
    10/12
    10/14

    10/14 would be the successful record and I would need to grab that date.  However we could have
    10/1
    10/2
    10/10
    10/11
    10/13

    And now the 5th attempt (10/13) would now be the third qualifying attempt.  I am a little stumped at the best way to approach this.

    Would you be able to create some test data in a consumable (ie, ready to paste into SSMS) format, please? Please see the link in my signature if you are unsure how to do this.
    People here enjoy solving puzzles like this – you'll soon have a working solution if you take the time to post the test data.

    Thanks.  I have added data to the original post following the format on the link in your signature.

    Here you go:
    --===== If the test table already exists, drop it
    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
        BEGIN
        DROP TABLE #mytable;
        END;

    --===== Create the test table with
    CREATE TABLE #mytable (
        ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, --Is an IDENTITY column on real table
        custID INT,
        DateValue DATETIME
    );

    --===== All Inserts into the IDENTITY column
    SET IDENTITY_INSERT #mytable ON;

    --===== Insert the test data into the test table
    INSERT INTO #mytable (ID, custID, DateValue)
    SELECT '1','1','09/01/2018 12:00 AM' UNION ALL
    SELECT '2','1','09/02/2018 12:00 AM' UNION ALL
    SELECT '3','1','09/05/2018 12:00 AM' UNION ALL
    SELECT '4','1','10/01/2018 12:00 AM' UNION ALL
    SELECT '5','2','09/08/2018 12:00 AM' UNION ALL
    SELECT '6','2','09/09/2018 12:00 AM' UNION ALL
    SELECT '7','3','01/01/2018 12:00 AM' UNION ALL
    SELECT '8','3','01/04/2018 12:00 AM' UNION ALL
    SELECT '9','3','01/08/2018 12:00 AM' UNION ALL
    SELECT '10','4','05/01/2018 12:00 AM' UNION ALL
    SELECT '11','4','05/08/2018 12:00 AM' UNION ALL
    SELECT '12','4','06/01/2018 12:00 AM' UNION ALL
    SELECT '13','4','06/15/2018 12:00 AM' UNION ALL
    SELECT '14','5','10/01/2018 12:00 AM' UNION ALL
    SELECT '15','6','07/01/2018 12:00 AM' UNION ALL
    SELECT '16','6','07/02/2018 12:00 AM' UNION ALL
    SELECT '17','6','07/04/2018 12:00 AM' UNION ALL
    SELECT '18','6','07/05/2018 12:00 AM' UNION ALL
    SELECT '19','7','10/01/2018 12:00 AM' UNION ALL
    SELECT '20','7','10/05/2018 12:00 AM' UNION ALL
    SELECT '21','8','04/10/2018 12:00 AM' UNION ALL
    SELECT '22','8','04/11/2018 12:00 AM' UNION ALL
    SELECT '23','8','04/13/2018 12:00 AM' UNION ALL
    SELECT '24','9','05/01/2018 12:00 AM' UNION ALL
    SELECT '25','9','05/05/2018 12:00 AM' UNION ALL
    SELECT '26','9','05/19/2018 12:00 AM' UNION ALL
    SELECT '27','10','01/01/2018 12:00 AM' UNION ALL
    SELECT '28','10','01/04/2018 12:00 AM' UNION ALL
    SELECT '29','10','01/05/2018 12:00 AM' UNION ALL
    SELECT '30','11','01/05/2018 12:00 AM' UNION ALL
    SELECT '31','12','03/17/2018 12:00 AM' UNION ALL
    SELECT '32','12','03/18/2018 12:00 AM' UNION ALL
    SELECT '33','12','03/20/2018 12:00 AM' UNION ALL
    SELECT '34','13','02/19/2018 12:00 AM' UNION ALL
    SELECT '35','13','02/20/2018 12:00 AM' UNION ALL
    SELECT '36','14','01/15/2018 12:00 AM' UNION ALL
    SELECT '37','14','06/01/2018 12:00 AM' UNION ALL
    SELECT '38','14','06/04/2018 12:00 AM' UNION ALL
    SELECT '39','14','06/05/2018 12:00 AM' UNION ALL
    SELECT '40','14','07/15/2018 12:00 AM';

    --===== Set the identity insert back to normal
    SET IDENTITY_INSERT #mytable OFF;

    WITH ORDERED_DATA AS (

        SELECT *,
            CASE
                WHEN DATEADD(day, 5, LAG(MT.DateValue, 1, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                    AND DATEADD(day, 5, LAG(MT.DateValue, 2, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                        THEN 1
                ELSE 0
            END AS IS_THIRD
        FROM #mytable AS MT
    )
    SELECT
        OD.custID,
        MIN(OD.DateValue)
    FROM ORDERED_DATA AS OD
    WHERE OD.IS_THIRD = 1
    GROUP BY OD.custID
    ORDER BY OD.custID;

    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
        BEGIN
        DROP TABLE #mytable;
        END;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Okay, let's revisit the performance on this:
    --===== If the test table already exists, drop it
    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
        BEGIN
        DROP TABLE #mytable;
        END;

    --===== Create the test table with
    CREATE TABLE #mytable (
        ID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, --Is an IDENTITY column on real table
        custID INT,
        DateValue DATETIME,
    );
    CREATE CLUSTERED INDEX CDX_TEMPDB_mytable_ ON #mytable
        (
        custID ASC,
        DateValue ASC
        );

    --===== All Inserts into the IDENTITY column
    SET IDENTITY_INSERT #mytable ON;

    --===== Insert the test data into the test table
    INSERT INTO #mytable (ID, custID, DateValue)
    SELECT 1, '1', '09/01/2018 12:00 AM' UNION ALL
    SELECT 2, '1', '09/02/2018 12:00 AM' UNION ALL
    SELECT 3, '1', '09/05/2018 12:00 AM' UNION ALL
    SELECT 4, '1', '10/01/2018 12:00 AM' UNION ALL
    SELECT 5, '2', '09/08/2018 12:00 AM' UNION ALL
    SELECT 6, '2', '09/09/2018 12:00 AM' UNION ALL
    SELECT 7, '3', '01/01/2018 12:00 AM' UNION ALL
    SELECT 8, '3', '01/04/2018 12:00 AM' UNION ALL
    SELECT 9, '3', '01/08/2018 12:00 AM' UNION ALL
    SELECT 10, '4', '05/01/2018 12:00 AM' UNION ALL
    SELECT 11, '4', '05/08/2018 12:00 AM' UNION ALL
    SELECT 12, '4', '06/01/2018 12:00 AM' UNION ALL
    SELECT 13, '4', '06/15/2018 12:00 AM' UNION ALL
    SELECT 14, '5', '10/01/2018 12:00 AM' UNION ALL
    SELECT 15, '6', '07/01/2018 12:00 AM' UNION ALL
    SELECT 16, '6', '07/02/2018 12:00 AM' UNION ALL
    SELECT 17, '6', '07/04/2018 12:00 AM' UNION ALL
    SELECT 18, '6', '07/05/2018 12:00 AM' UNION ALL
    SELECT 19, '7', '10/01/2018 12:00 AM' UNION ALL
    SELECT 20, '7', '10/05/2018 12:00 AM' UNION ALL
    SELECT 21, '8', '04/10/2018 12:00 AM' UNION ALL
    SELECT 22, '8', '04/11/2018 12:00 AM' UNION ALL
    SELECT 23, '8', '04/13/2018 12:00 AM' UNION ALL
    SELECT 24, '9', '05/01/2018 12:00 AM' UNION ALL
    SELECT 25, '9', '05/05/2018 12:00 AM' UNION ALL
    SELECT 26, '9', '05/19/2018 12:00 AM' UNION ALL
    SELECT 27, '10', '01/01/2018 12:00 AM' UNION ALL
    SELECT 28, '10', '01/04/2018 12:00 AM' UNION ALL
    SELECT 29, '10', '01/05/2018 12:00 AM' UNION ALL
    SELECT 30, '11', '01/05/2018 12:00 AM' UNION ALL
    SELECT 31, '12', '03/17/2018 12:00 AM' UNION ALL
    SELECT 32, '12', '03/18/2018 12:00 AM' UNION ALL
    SELECT 33, '12', '03/20/2018 12:00 AM' UNION ALL
    SELECT 34, '13', '02/19/2018 12:00 AM' UNION ALL
    SELECT 35, '13', '02/20/2018 12:00 AM' UNION ALL
    SELECT 36, '14', '01/15/2018 12:00 AM' UNION ALL
    SELECT 37, '14', '06/01/2018 12:00 AM' UNION ALL
    SELECT 38, '14', '06/04/2018 12:00 AM' UNION ALL
    SELECT 39, '14', '06/05/2018 12:00 AM' UNION ALL
    SELECT 40, '14', '07/15/2018 12:00 AM';

    --===== Set the identity insert back to normal
    SET IDENTITY_INSERT #mytable OFF;

    WITH ORDERED_DATA AS (

        SELECT *,
            CASE
                WHEN DATEADD(day, 5, LAG(MT.DateValue, 1, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                    AND DATEADD(day, 5, LAG(MT.DateValue, 2, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                    AND LAG(MT.DateValue, 3, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue) IS NULL
                        THEN 1
                ELSE 0
            END AS IS_THIRD
        FROM #mytable AS MT
    )
    SELECT
        OD.custID,
        OD.DateValue
    FROM ORDERED_DATA AS OD
    WHERE OD.IS_THIRD = 1
    ORDER BY OD.custID;

    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
        BEGIN
        DROP TABLE #mytable;
        END;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thank you for your assistance.  This is really close.  Works perfect when the 3 contacts are 1-3 but does not catch an instance where the 2nd, 3rd & 4th are the qualifying ones such as the last example in the table:
    SELECT '36','14','01/15/2018 12:00 AM' UNION ALL
    SELECT '37','14','06/01/2018 12:00 AM' UNION ALL
    SELECT '38','14','06/04/2018 12:00 AM' UNION ALL
    SELECT '39','14','06/05/2018 12:00 AM' UNION ALL

    That is the tricky part that really has me stuck.

  • sgmunson - Wednesday, October 24, 2018 9:37 AM

    Okay, let's revisit the performance on this:
    --===== If the test table already exists, drop it
    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
        BEGIN
        DROP TABLE #mytable;
        END;

    --===== Create the test table with
    CREATE TABLE #mytable (
        ID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, --Is an IDENTITY column on real table
        custID INT,
        DateValue DATETIME,
    );
    CREATE CLUSTERED INDEX CDX_TEMPDB_mytable_ ON #mytable
        (
        custID ASC,
        DateValue ASC
        );

    --===== All Inserts into the IDENTITY column
    SET IDENTITY_INSERT #mytable ON;

    --===== Insert the test data into the test table
    INSERT INTO #mytable (ID, custID, DateValue)
    SELECT 1, '1', '09/01/2018 12:00 AM' UNION ALL
    SELECT 2, '1', '09/02/2018 12:00 AM' UNION ALL
    SELECT 3, '1', '09/05/2018 12:00 AM' UNION ALL
    SELECT 4, '1', '10/01/2018 12:00 AM' UNION ALL
    SELECT 5, '2', '09/08/2018 12:00 AM' UNION ALL
    SELECT 6, '2', '09/09/2018 12:00 AM' UNION ALL
    SELECT 7, '3', '01/01/2018 12:00 AM' UNION ALL
    SELECT 8, '3', '01/04/2018 12:00 AM' UNION ALL
    SELECT 9, '3', '01/08/2018 12:00 AM' UNION ALL
    SELECT 10, '4', '05/01/2018 12:00 AM' UNION ALL
    SELECT 11, '4', '05/08/2018 12:00 AM' UNION ALL
    SELECT 12, '4', '06/01/2018 12:00 AM' UNION ALL
    SELECT 13, '4', '06/15/2018 12:00 AM' UNION ALL
    SELECT 14, '5', '10/01/2018 12:00 AM' UNION ALL
    SELECT 15, '6', '07/01/2018 12:00 AM' UNION ALL
    SELECT 16, '6', '07/02/2018 12:00 AM' UNION ALL
    SELECT 17, '6', '07/04/2018 12:00 AM' UNION ALL
    SELECT 18, '6', '07/05/2018 12:00 AM' UNION ALL
    SELECT 19, '7', '10/01/2018 12:00 AM' UNION ALL
    SELECT 20, '7', '10/05/2018 12:00 AM' UNION ALL
    SELECT 21, '8', '04/10/2018 12:00 AM' UNION ALL
    SELECT 22, '8', '04/11/2018 12:00 AM' UNION ALL
    SELECT 23, '8', '04/13/2018 12:00 AM' UNION ALL
    SELECT 24, '9', '05/01/2018 12:00 AM' UNION ALL
    SELECT 25, '9', '05/05/2018 12:00 AM' UNION ALL
    SELECT 26, '9', '05/19/2018 12:00 AM' UNION ALL
    SELECT 27, '10', '01/01/2018 12:00 AM' UNION ALL
    SELECT 28, '10', '01/04/2018 12:00 AM' UNION ALL
    SELECT 29, '10', '01/05/2018 12:00 AM' UNION ALL
    SELECT 30, '11', '01/05/2018 12:00 AM' UNION ALL
    SELECT 31, '12', '03/17/2018 12:00 AM' UNION ALL
    SELECT 32, '12', '03/18/2018 12:00 AM' UNION ALL
    SELECT 33, '12', '03/20/2018 12:00 AM' UNION ALL
    SELECT 34, '13', '02/19/2018 12:00 AM' UNION ALL
    SELECT 35, '13', '02/20/2018 12:00 AM' UNION ALL
    SELECT 36, '14', '01/15/2018 12:00 AM' UNION ALL
    SELECT 37, '14', '06/01/2018 12:00 AM' UNION ALL
    SELECT 38, '14', '06/04/2018 12:00 AM' UNION ALL
    SELECT 39, '14', '06/05/2018 12:00 AM' UNION ALL
    SELECT 40, '14', '07/15/2018 12:00 AM';

    --===== Set the identity insert back to normal
    SET IDENTITY_INSERT #mytable OFF;

    WITH ORDERED_DATA AS (

        SELECT *,
            CASE
                WHEN DATEADD(day, 5, LAG(MT.DateValue, 1, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                    AND DATEADD(day, 5, LAG(MT.DateValue, 2, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                    AND LAG(MT.DateValue, 3, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue) IS NULL
                        THEN 1
                ELSE 0
            END AS IS_THIRD
        FROM #mytable AS MT
    )
    SELECT
        OD.custID,
        OD.DateValue
    FROM ORDERED_DATA AS OD
    WHERE OD.IS_THIRD = 1
    ORDER BY OD.custID;

    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
        BEGIN
        DROP TABLE #mytable;
        END;

    Thank you for your assistance. This is really close. Works perfect when the 3 contacts are 1-3 but does not catch an instance where the 2nd, 3rd & 4th are the qualifying ones such as the last example in the table:
    SELECT '36','14','01/15/2018 12:00 AM' UNION ALL
    SELECT '37','14','06/01/2018 12:00 AM' UNION ALL
    SELECT '38','14','06/04/2018 12:00 AM' UNION ALL
    SELECT '39','14','06/05/2018 12:00 AM' UNION ALL

    That is the tricky part that really has me stuck.

  • WITH ORDERED_DATA AS (
       SELECT *,
           CASE
               WHEN DATEADD(day, 5, LAG(MT.DateValue, 1, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                   AND DATEADD(day, 5, LAG(MT.DateValue, 2, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                   --AND LAG(MT.DateValue, 3, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue) IS NULL
                       THEN 1
               ELSE 0
           END AS IS_THIRD
        FROM #mytable AS MT
    )
    SELECT
       OD.custID,
       Min(OD.DateValue) As DateValue
    FROM ORDERED_DATA AS OD
    WHERE OD.IS_THIRD = 1
    GROUP By OD.custID
    ORDER BY OD.custID;

    Should do it, I think

  • timothy.m.phillips - Wednesday, October 24, 2018 5:45 PM

    sgmunson - Wednesday, October 24, 2018 9:37 AM

    Okay, let's revisit the performance on this:
    --===== If the test table already exists, drop it
    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
        BEGIN
        DROP TABLE #mytable;
        END;

    --===== Create the test table with
    CREATE TABLE #mytable (
        ID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, --Is an IDENTITY column on real table
        custID INT,
        DateValue DATETIME,
    );
    CREATE CLUSTERED INDEX CDX_TEMPDB_mytable_ ON #mytable
        (
        custID ASC,
        DateValue ASC
        );

    --===== All Inserts into the IDENTITY column
    SET IDENTITY_INSERT #mytable ON;

    --===== Insert the test data into the test table
    INSERT INTO #mytable (ID, custID, DateValue)
    SELECT 1, '1', '09/01/2018 12:00 AM' UNION ALL
    SELECT 2, '1', '09/02/2018 12:00 AM' UNION ALL
    SELECT 3, '1', '09/05/2018 12:00 AM' UNION ALL
    SELECT 4, '1', '10/01/2018 12:00 AM' UNION ALL
    SELECT 5, '2', '09/08/2018 12:00 AM' UNION ALL
    SELECT 6, '2', '09/09/2018 12:00 AM' UNION ALL
    SELECT 7, '3', '01/01/2018 12:00 AM' UNION ALL
    SELECT 8, '3', '01/04/2018 12:00 AM' UNION ALL
    SELECT 9, '3', '01/08/2018 12:00 AM' UNION ALL
    SELECT 10, '4', '05/01/2018 12:00 AM' UNION ALL
    SELECT 11, '4', '05/08/2018 12:00 AM' UNION ALL
    SELECT 12, '4', '06/01/2018 12:00 AM' UNION ALL
    SELECT 13, '4', '06/15/2018 12:00 AM' UNION ALL
    SELECT 14, '5', '10/01/2018 12:00 AM' UNION ALL
    SELECT 15, '6', '07/01/2018 12:00 AM' UNION ALL
    SELECT 16, '6', '07/02/2018 12:00 AM' UNION ALL
    SELECT 17, '6', '07/04/2018 12:00 AM' UNION ALL
    SELECT 18, '6', '07/05/2018 12:00 AM' UNION ALL
    SELECT 19, '7', '10/01/2018 12:00 AM' UNION ALL
    SELECT 20, '7', '10/05/2018 12:00 AM' UNION ALL
    SELECT 21, '8', '04/10/2018 12:00 AM' UNION ALL
    SELECT 22, '8', '04/11/2018 12:00 AM' UNION ALL
    SELECT 23, '8', '04/13/2018 12:00 AM' UNION ALL
    SELECT 24, '9', '05/01/2018 12:00 AM' UNION ALL
    SELECT 25, '9', '05/05/2018 12:00 AM' UNION ALL
    SELECT 26, '9', '05/19/2018 12:00 AM' UNION ALL
    SELECT 27, '10', '01/01/2018 12:00 AM' UNION ALL
    SELECT 28, '10', '01/04/2018 12:00 AM' UNION ALL
    SELECT 29, '10', '01/05/2018 12:00 AM' UNION ALL
    SELECT 30, '11', '01/05/2018 12:00 AM' UNION ALL
    SELECT 31, '12', '03/17/2018 12:00 AM' UNION ALL
    SELECT 32, '12', '03/18/2018 12:00 AM' UNION ALL
    SELECT 33, '12', '03/20/2018 12:00 AM' UNION ALL
    SELECT 34, '13', '02/19/2018 12:00 AM' UNION ALL
    SELECT 35, '13', '02/20/2018 12:00 AM' UNION ALL
    SELECT 36, '14', '01/15/2018 12:00 AM' UNION ALL
    SELECT 37, '14', '06/01/2018 12:00 AM' UNION ALL
    SELECT 38, '14', '06/04/2018 12:00 AM' UNION ALL
    SELECT 39, '14', '06/05/2018 12:00 AM' UNION ALL
    SELECT 40, '14', '07/15/2018 12:00 AM';

    --===== Set the identity insert back to normal
    SET IDENTITY_INSERT #mytable OFF;

    WITH ORDERED_DATA AS (

        SELECT *,
            CASE
                WHEN DATEADD(day, 5, LAG(MT.DateValue, 1, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                    AND DATEADD(day, 5, LAG(MT.DateValue, 2, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
                    AND LAG(MT.DateValue, 3, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue) IS NULL
                        THEN 1
                ELSE 0
            END AS IS_THIRD
        FROM #mytable AS MT
    )
    SELECT
        OD.custID,
        OD.DateValue
    FROM ORDERED_DATA AS OD
    WHERE OD.IS_THIRD = 1
    ORDER BY OD.custID;

    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
        BEGIN
        DROP TABLE #mytable;
        END;

    Thank you for your assistance. This is really close. Works perfect when the 3 contacts are 1-3 but does not catch an instance where the 2nd, 3rd & 4th are the qualifying ones such as the last example in the table:
    SELECT '36','14','01/15/2018 12:00 AM' UNION ALL
    SELECT '37','14','06/01/2018 12:00 AM' UNION ALL
    SELECT '38','14','06/04/2018 12:00 AM' UNION ALL
    SELECT '39','14','06/05/2018 12:00 AM' UNION ALL

    That is the tricky part that really has me stuck.

    Just realized where the problem was....  Try this instead:
    --===== If the test table already exists, drop it
    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
      BEGIN
      DROP TABLE #mytable;
      END;

    --===== Create the test table with
    CREATE TABLE #mytable (
      ID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, --Is an IDENTITY column on real table
      custID INT,
      DateValue DATETIME,
    );
    CREATE CLUSTERED INDEX CDX_TEMPDB_mytable_ ON #mytable
      (
      custID ASC,
      DateValue ASC
      );

    --===== All Inserts into the IDENTITY column
    SET IDENTITY_INSERT #mytable ON;

    --===== Insert the test data into the test table
    INSERT INTO #mytable (ID, custID, DateValue)
    SELECT 1, '1', '09/01/2018 12:00 AM' UNION ALL
    SELECT 2, '1', '09/02/2018 12:00 AM' UNION ALL
    SELECT 3, '1', '09/05/2018 12:00 AM' UNION ALL
    SELECT 4, '1', '10/01/2018 12:00 AM' UNION ALL
    SELECT 5, '2', '09/08/2018 12:00 AM' UNION ALL
    SELECT 6, '2', '09/09/2018 12:00 AM' UNION ALL
    SELECT 7, '3', '01/01/2018 12:00 AM' UNION ALL
    SELECT 8, '3', '01/04/2018 12:00 AM' UNION ALL
    SELECT 9, '3', '01/08/2018 12:00 AM' UNION ALL
    SELECT 10, '4', '05/01/2018 12:00 AM' UNION ALL
    SELECT 11, '4', '05/08/2018 12:00 AM' UNION ALL
    SELECT 12, '4', '06/01/2018 12:00 AM' UNION ALL
    SELECT 13, '4', '06/15/2018 12:00 AM' UNION ALL
    SELECT 14, '5', '10/01/2018 12:00 AM' UNION ALL
    SELECT 15, '6', '07/01/2018 12:00 AM' UNION ALL
    SELECT 16, '6', '07/02/2018 12:00 AM' UNION ALL
    SELECT 17, '6', '07/04/2018 12:00 AM' UNION ALL
    SELECT 18, '6', '07/05/2018 12:00 AM' UNION ALL
    SELECT 19, '7', '10/01/2018 12:00 AM' UNION ALL
    SELECT 20, '7', '10/05/2018 12:00 AM' UNION ALL
    SELECT 21, '8', '04/10/2018 12:00 AM' UNION ALL
    SELECT 22, '8', '04/11/2018 12:00 AM' UNION ALL
    SELECT 23, '8', '04/13/2018 12:00 AM' UNION ALL
    SELECT 24, '9', '05/01/2018 12:00 AM' UNION ALL
    SELECT 25, '9', '05/05/2018 12:00 AM' UNION ALL
    SELECT 26, '9', '05/19/2018 12:00 AM' UNION ALL
    SELECT 27, '10', '01/01/2018 12:00 AM' UNION ALL
    SELECT 28, '10', '01/04/2018 12:00 AM' UNION ALL
    SELECT 29, '10', '01/05/2018 12:00 AM' UNION ALL
    SELECT 30, '11', '01/05/2018 12:00 AM' UNION ALL
    SELECT 31, '12', '03/17/2018 12:00 AM' UNION ALL
    SELECT 32, '12', '03/18/2018 12:00 AM' UNION ALL
    SELECT 33, '12', '03/20/2018 12:00 AM' UNION ALL
    SELECT 34, '13', '02/19/2018 12:00 AM' UNION ALL
    SELECT 35, '13', '02/20/2018 12:00 AM' UNION ALL
    SELECT 36, '14', '01/15/2018 12:00 AM' UNION ALL
    SELECT 37, '14', '06/01/2018 12:00 AM' UNION ALL
    SELECT 38, '14', '06/04/2018 12:00 AM' UNION ALL
    SELECT 39, '14', '06/05/2018 12:00 AM' UNION ALL
    SELECT 40, '14', '07/15/2018 12:00 AM';

    --===== Set the identity insert back to normal
    SET IDENTITY_INSERT #mytable OFF;

    WITH ORDERED_DATA AS (

      SELECT *,
       CASE
        WHEN DATEADD(day, 5, LAG(MT.DateValue, 1, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
          AND DATEADD(day, 5, LAG(MT.DateValue, 2, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) >= MT.DateValue
          AND (
                        LAG(MT.DateValue, 3, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue) IS NULL
                        OR
                        DATEADD(day, 5, LAG(MT.DateValue, 3, NULL) OVER(PARTITION BY MT.custID ORDER BY MT.DateValue)) < MT.DateValue
                        ) THEN 1
        ELSE 0
       END AS IS_THIRD
      FROM #mytable AS MT
    )
    SELECT
      OD.custID,
      OD.DateValue
    FROM ORDERED_DATA AS OD
    WHERE OD.IS_THIRD = 1
    ORDER BY OD.custID;

    IF OBJECT_ID('tempdb..#mytable','U') IS NOT NULL
      BEGIN
      DROP TABLE #mytable;
      END;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • >> I have a scenario where the client wants to identify the 3rd successful attempt at contacting a customer within 5 days of each other. <<

    Why do you think that the non-relational table property could ever be a key in a valid relational model? Why do you think it identifier can ever be a numeric? What math were you going to do on it? Let's try and give you a decent table.

    CREATE TABLE Customer_Contacts
    (customer_id CHAR(10) NOT NULL
    REFERENCES Customers(customer_id)
    ON DELETE CASCADE
    ON UPDATE CASCADE,
    contact_date DATE DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY(customer_id, contact_date)
    );

    see how we use actual columns to construct a valid relational key. See the references clause the ties this occurrence of the customer identifier to the customer’s table. See how the default plugs a date automatically.

    Why are you still using the original proprietary Sybase “SELECT ..UNION ALL” syntax? Microsoft is supported the ANSI/ISO standard “INSERT INTO .. VALUES (<tuple list>)” table constructor for quite a few years now. You also don’t seem to know that we have a date data type now. Nor do you know that the only format allowed in the ANSI ISO standards is “yyyyyyyy-mm-dd” instead of the local dialect you used. Basically, your writing SQL code that you should ofof stopped writing over 20 years ago.

    Let’s try this approach; build a table of 50 or hundred years of time frames of five days. If your actual business rules allow for holidays versus business days, then you can adjust these frames. I’m not going to discuss any tricks for that this post. You can play with it. You might also want to download the PDF of the Rick Snodgrass book on temporal queries in SQL from the University of Arizona website.

    CREATE TABLE Timeframes
    (time_frame_start_date DATE NOT NULL PRIMARY KEY,
    time_frame_end_date DATE NOT NULL
    CHECK (DATEADD(5, DAY, time_frame_start_date)
       = time_frame_end_date)
    );

    you can now write a query that looks at each timeframe, finds contact dates that fall within the start and end of each frame, and counts them by customer_id. Any grouping of (customer_id, time_frame_start_date) that has less than three rows in it can be rejected. Any timeframe that ended before the current date, can be rejected. Within a timeframe, you can use a ROW_NUMBER OVER(PARTITION BY time_frame_start_date) function and look for the row number = 3. I’m not sure what the performance will be, but it should be pretty adjustable

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 - Friday, October 26, 2018 3:16 PM

    >> I have a scenario where the client wants to identify the 3rd successful attempt at contacting a customer within 5 days of each other. <<

    Why do you think that the non-relational table property could ever be a key in a valid relational model? Why do you think it identifier can ever be a numeric? What math were you going to do on it? Let's try and give you a decent table.

    CREATE TABLE Customer_Contacts
    (customer_id CHAR(10) NOT NULL
    REFERENCES Customers(customer_id)
    ON DELETE CASCADE
    ON UPDATE CASCADE,
    contact_date DATE DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY(customer_id, contact_date)
    );

    see how we use actual columns to construct a valid relational key. See the references clause the ties this occurrence of the customer identifier to the customer’s table. See how the default plugs a date automatically.

    Why are you still using the original proprietary Sybase “SELECT ..UNION ALL†syntax? Microsoft is supported the ANSI/ISO standard “INSERT INTO .. VALUES (<tuple list>)†table constructor for quite a few years now. You also don’t seem to know that we have a date data type now. Nor do you know that the only format allowed in the ANSI ISO standards is “yyyyyyyy-mm-dd†instead of the local dialect you used. Basically, your writing SQL code that you should ofof stopped writing over 20 years ago.

    Let’s try this approach; build a table of 50 or hundred years of time frames of five days. If your actual business rules allow for holidays versus business days, then you can adjust these frames. I’m not going to discuss any tricks for that this post. You can play with it. You might also want to download the PDF of the Rick Snodgrass book on temporal queries in SQL from the University of Arizona website.

    CREATE TABLE Timeframes
    (time_frame_start_date DATE NOT NULL PRIMARY KEY,
    time_frame_end_date DATE NOT NULL
    CHECK (DATEADD(5, DAY, time_frame_start_date)
       = time_frame_end_date)
    );

    you can now write a query that looks at each timeframe, finds contact dates that fall within the start and end of each frame, and counts them by customer_id. Any grouping of (customer_id, time_frame_start_date) that has less than three rows in it can be rejected. Any timeframe that ended before the current date, can be rejected. Within a timeframe, you can use a ROW_NUMBER OVER(PARTITION BY time_frame_start_date) function and look for the row number = 3. I’m not sure what the performance will be, but it should be pretty adjustable

    So you've converted the Customer_ID from 4 bytes to 10 bytes.  There's a part of the math you speak of.  If you have non-clustered indexes, you'll find more math involved.  Also, what will you actually use to uniquely identify the customer within those 10 bytes?

    Also, have you tested your 5 day time frame approach for performance?  If you were to actually write the code for your suggestion, I'd be happy to setup a test to compare it to the other methods suggested.

    And, since you cited it, do you have a link for the PDF that you suggested?  Mr. Snodgrass has written a whole bunch of material so it would be nice to know the specific material you speak of.  Is it the following?  If not, please provide the link.
    https://cs.ulb.ac.be/public/_media/teaching/infoh415/teradata_temporal_case_study.pdf

    While you're at it, you should try your "What math were you going to do on it?" rhetoric on Mr. Snodgrass and see how far you get because in the article that I provided the link for above, he creates the following table to demonstrate with in the upper right corner of page 3...


    CREATE MULTISET TABLE Prop_Owner (
    customer_number INTEGER, 
    property_number INTEGER,
    property_VT PERIOD(DATE) NOT NULL AS VALIDTIME,
    property_TT PERIOD (TIMESTAMP(6) WITH TIME ZONE)
    NOT NULL AS TRANSACTIONTIME)
    PRIMARY INDEX(property_number);

    Also, if the link I provided IS the article that you cite, notice that Mr. Snodgrass specifically states that the SQL Language typically has inadequate temporal functionality and goes on to state that he's using Teradata, which does have the necessary functionality.  The problem there is that SQL is not SQL and this is not a Teradata forum.  I suspect the same will hold true if you cite a different link from Mr. Snodgrass.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply