how to find duplcates in a cosecutive period of time

  • i have a table like this

    date     num

    1/1/05   0

    1/2/05   0

    1/3/05   0

    1/4/05   1

    5/1/05   0

    5/3/05   0

    5/4/05   0

    5/5/05   0

    5/6/05   0

    5/7/05   0

    5/8/05   2

    I want to select those nums which do not change within 3 consecutive days? expected result should be 1/1/05 through 1/3/05 with num=0 and 5/3/05 through 5/5/05 with num=0

     

  • What about 5/4, 5/5 & 5/6 and 5/5, 5/6 and 5/7

    This may work if they are to be included:

    declare @Table table (dt datetime, nbr int)

    insert @Table values ('1/1/05', 0)

    insert @Table values ('1/2/05', 0)

    insert @Table values ('1/3/05', 0)

    insert @Table values ('1/4/05', 1)

    insert @Table values ('5/1/05', 0)

    insert @Table values ('5/3/05', 0)

    insert @Table values ('5/4/05', 0)

    insert @Table values ('5/5/05', 0)

    insert @Table values ('5/6/05', 0)

    insert @Table values ('5/7/05', 0)

    insert @Table values ('5/8/05', 2)

    select *

    from @table t1

    inner join @table t2

    on t1.nbr = t2.nbr and t1.dt + 1 = t2.dt

    inner join @table t3

    on t2.nbr = t3.nbr and t2.dt + 1 = t3.dt

  • obviously, that would return 2 records by def 5/4-5/6 and 5/5-5/7

    if you want to omit the second record...

    select *

    from @table t1

    inner join @table t2

    on t1.nbr = t2.nbr and t1.dt + 1 = t2.dt

    inner join @table t3

    on t2.nbr = t3.nbr and t2.dt + 1 = t3.dt

    WHERE NOT EXISTS

    (

    SELECT *

    FROM @table t4

    LEFT JOIN @table t5

    ON t5.nbr = t4.nbr and t5.dt + 1 = t4.dt

    LEFT JOIN @table t6

    ON t6.nbr = t5.nbr and t6.dt + 1 = t5.dt

    WHERE t1.nbr = t4.nbr and t4.dt + 1 = t1.dt

    AND t6.nbr IS NULL

    )

    the above returns 3 day non-intersecting chunks

  • Thanks for the reply. (sorry I didn't count correctly. expected result should be 1/1/05-1/3/05 and 5/1/05-5/7/05, what I was doing!)

    I was able to use the same method, but I'm wondering what if we are requested to find 10 days or even 50 consecutive days? I'm thinking of using two while loop to do so or even create multiple temp tables, but I don't know if there is simple way to do this... 

  • --I had to edit for dupes...

    --I make no promisses on efficiency

    SELECT DISTINCT t1.nbr, t1.dt, t2.dt, DATEDIFF(d, t1.dt, t2.dt) + 1 timeSpan

    FROM @table t1

    INNER JOIN @table t2

    ON t1.nbr = t2.nbr

    AND DATEDIFF(d,t1.dt,t2.dt) + 1 >= 3 --or what ever consecutive #

    WHERE NOT EXISTS -- get the last in a consecutive list

    (

    SELECT *

    FROM @table t3

    WHERE t1.nbr = t3.nbr

    AND DATEDIFF(d,t2.dt,t3.dt) = 1 --no next date

    )

    AND NOT EXISTS -- get the first in a consecutive list

    (

    SELECT *

    FROM @table t0

    WHERE t1.nbr = t0.nbr

    AND DATEDIFF(d,t0.dt,t1.dt) = 1 --no prior date

    )

    AND EXISTS

    (

    SELECT COUNT(*)

    FROM @table t4

    WHERE t4.nbr = t1.nbr

    AND t4.dt BETWEEN t1.dt AND t2.dt

    HAVING COUNT(DISTINCT t4.dt) >= 3 --or what ever consecutive #

    AND COUNT(DISTINCT t4.dt) = DATEDIFF(d, t1.dt, t2.dt) + 1 --amount of records the same as the time-span

    )

  • --okay this mind teaser made an impression on me.

    --I just made an efficiency edit I learned from another thread

    SELECT DISTINCT t1.nbr, t1.dt, t2.dt, DATEDIFF(d, t1.dt, t2.dt) + 1 timeSpan

    FROM @table t1

    INNER JOIN @table t2

    ON t1.nbr = t2.nbr

    AND DATEDIFF(d,t1.dt,t2.dt) + 1 >= 3 --or what ever consecutive #

    WHERE NOT EXISTS -- get the last in a consecutive list

    (

    SELECT MIN(t3.dt)

    FROM @table t3

    WHERE t1.nbr = t3.nbr

    AND t3.dt > t2.dt

    HAVING DATEDIFF(d,t2.dt,MIN(t3.dt)) = 1 --no next date

    )

    AND NOT EXISTS -- get the first in a consecutive list

    (

    SELECT MIN(t0.dt)

    FROM @table t0

    WHERE t1.nbr = t0.nbr

    AND t0.dt = 3 --or what ever consecutive #

    AND COUNT(DISTINCT t4.dt) = DATEDIFF(d, t1.dt, t2.dt) + 1 --amount of records the same as the time-span

    )

  • Thanks for the reply, let me try to understand it, thanks!

  • Hi all,

    I love problems like this - please keep them coming!

    bug, can I just clarify something? You said consecutive days, but then you said you want 1/1/05-1/3/05 and 5/1/05-5/7/05 (in your correction). Should that have been 1/1/05-1/3/05 and 5/3/05-5/7/05 as kevin has given you an answer for?

    Thanks...

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Well, anyway, here's an alternative to Kevin's solution...

    It seems to run marginally faster, but nothing too significant, on my tests, so I guess you'll have to do your own tests if that's important to you.

    SELECT MIN(dtFrom) AS dtFrom, dtTo

    FROM (

        SELECT dtFrom, MAX(dtTo) AS dtTo

        FROM (

            SELECT      dtFrom, dtTo

            FROM        (

                         SELECT a.dt AS dtFrom, b.dt as dtTo

                         FROM @Table a

                             INNER JOIN @Table b ON a.dt < b.dt AND a.nbr = b.nbr AND DATEDIFF(d, a.dt, b.dt)+1 >= 3

                        ) c --all possible from-to date combinations

                        LEFT OUTER JOIN @Table d ON d.dt BETWEEN dtFrom AND dtTo

            GROUP BY    dtFrom, dtTo

            HAVING      DATEDIFF(d, dtFrom, dtTo) = COUNT(*) - 1

            ) d --consecutive from-to date combinations

        GROUP BY dtFrom

         ) e --consecutive from-to date combinations with latest to date

    GROUP BY dtTo --consecutive from-to date combinations with earliest from date

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • looks good, but you forgot to group by nbr.

  • Good point Kevin - I missed that was a requirement.

    Thanks

    I guess I can just add it on at the end (rather than grouping by it)...

    SELECT dtFrom, dtTo, nbr

    FROM @Table t

      INNER JOIN (

        SELECT MIN(dtFrom) AS dtFrom, dtTo

        FROM (

            SELECT dtFrom, MAX(dtTo) AS dtTo

            FROM (

                SELECT      dtFrom, dtTo

                FROM        (

                             SELECT a.dt AS dtFrom, b.dt as dtTo

                             FROM @Table a

                                 INNER JOIN @Table b ON a.dt < b.dt AND a.nbr = b.nbr AND DATEDIFF(d, a.dt, b.dt)+1 >= 3

                            ) c --all possible from-to date combinations

                            LEFT OUTER JOIN @Table d ON d.dt BETWEEN dtFrom AND dtTo

                GROUP BY    dtFrom, dtTo

                HAVING      DATEDIFF(d, dtFrom, dtTo) = COUNT(*) - 1

                ) d --consecutive from-to date combinations

            GROUP BY dtFrom

             ) e --consecutive from-to date combinations with latest to date

        GROUP BY dtTo --consecutive from-to date combinations with earliest from date

                ) f ON t.dt = f.dtFrom

     

    By the way, have you thought about what a good solution would be if the data we wanted was:

    A.

    1 Jan 05 - 3 Jan 05 - 0

    4 Jan 05 - 4 Jan 05 - 1

    1 May 05 - 7 May 05 - 0

    8 May 05 - 8 May 05 - 2

    (i.e. assume the value is unchanged if the value on both sides of a gap is the same)

    Or even:

    B.

    1 Jan 05 -  3 Jan 05 - 0

    4 Jan 05 - 30 Apr 05 - 1

    1 May 05 - 7 May 05 - 0

    8 May 05 - 8 May 05 - 2

    (i.e. assume the value is unchanged until we know different)

    Just for fun...

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply