How to extract overlapping date ranges from a table of date ranges

  • ChrisM@Work (2/18/2016)


    How about starting a new thread with some sample data thrown in?

    See:

    A new Thread with begin and end dates

    Jeff Moden (2/18/2016)


    Just to be sure, let's say we have a table with a million rows of subscriptions across, say, 5 years. What "period" would you be looking for out of that?

    Used your data generations. 10 000 000 rows, subscriptions are less than 15 days. In reallity there is far more spread, but the average is about equal. There is not limit on the lenght of the period.

    Also, the NULL end-date is always a problem for performance. It would be MUCH better to use '9999-01-01' for an indeterminate end-date than a NULL. MUCH better.

    This is not in the design and is not going to happen.

    But then even this would cause some problems. (Small datetime does not support 9999, what to do about the startdate should there also be a date in the past. When calculation periods these have to be filtered out.) As said better, but not going to happen.

    And, if that comes to pass, make sure you do NOT use '9999-12-31'. Because there is no valid date after that, it screws up some rather standard/convenient/accurate/fast code if you do.

    This is a very sensible remark, but why not use 9000-01-01. This contains more 'zero's' so to me it is more logical :-).

    Ben

  • ben.brugman (2/18/2016)


    This is a very sensible remark, but why not use 9000-01-01. This contains more 'zero's' so to me it is more logical :-).

    Only because SQL Server implicitly converts the string of '9999' to '9999-01-01' when compared to DATETIME.

    Sorry you're stuck with SMALLDATETIME. Now I understand why you used the replacement end-date that you did. I'll take a look at the other thread.

    As for start dates that occur in the past, it doesn't matter. The code from the article you read will still do the trick. As for start dates with a NULL, I can't imagine that those would be correct but what would you want to do with them and what will they have for end-dates?

    I'll also take a look at the other thread you posted the link for tonight.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/18/2016)


    ben.brugman (2/18/2016)


    This is a very sensible remark, but why not use 9000-01-01. This contains more 'zero's' so to me it is more logical :-).

    Only because SQL Server implicitly converts the string of '9999' to '9999-01-01' when compared to DATETIME.

    Sorry Jeff, my remark was more geared towards; It's difficult to come to a consensus about which date's should be used as an indication of a date which is not (yet) known. 99990101 is an excelent choice, but it's a choice.

    Even MS has some problems with dates, see the choices made for dates in Excel and SQL-server. They are not consistent.

    (Problem Excel knows the date 29 februar 1900 and SQLserver does not).

    Problems with unknown dates can be very variable. Not known, not known but in the future, not known but in the past, not known but after the 'start' date.

    Even the date of birth gives a problem, because sometimes the day is not know and sometimes even the month is not know.

    Different people make different solutions for these type's of problems.

    Using the NULL might hamper the performance and a lot of mistakes are made with the NULL. But at least it's clear that it's not an actual date.

    Ben

  • ben.brugman (2/19/2016)


    Jeff Moden (2/18/2016)


    ben.brugman (2/18/2016)


    This is a very sensible remark, but why not use 9000-01-01. This contains more 'zero's' so to me it is more logical :-).

    Only because SQL Server implicitly converts the string of '9999' to '9999-01-01' when compared to DATETIME.

    Sorry Jeff, my remark was more geared towards; It's difficult to come to a consensus about which date's should be used as an indication of a date which is not (yet) known. 99990101 is an excelent choice, but it's a choice.

    Even MS has some problems with dates, see the choices made for dates in Excel and SQL-server. They are not consistent.

    (Problem Excel knows the date 29 februar 1900 and SQLserver does not).

    Problems with unknown dates can be very variable. Not known, not known but in the future, not known but in the past, not known but after the 'start' date.

    Even the date of birth gives a problem, because sometimes the day is not know and sometimes even the month is not know.

    Different people make different solutions for these type's of problems.

    Using the NULL might hamper the performance and a lot of mistakes are made with the NULL. But at least it's clear that it's not an actual date.

    Ben

    You're definitely preaching to the choir here. πŸ™‚ That's why I nip all of that in the bud when we design something new or need to fix something old. It helps prevent "Paralysis by Analysis". It sometimes requires a bat or sledgehammer to get folks to listen, though. The sledgehammer usually works but I sometimes have to break out the new improved 4 banded pork chop launcher to really make my point. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/19/2016)


    That's why I nip all of that in the bud when we design something new or need to fix something old.

    Started working with computers in 1974. He that is 42 years ago, I should celebrate that this year.

    First 10 years were studie and fun.

    Next 15-20 years I was building a DBMS and telling others how to use it (and sometimes doing some serious bud nipping).

    RDBMSses became actual feasable and got introduced in our surroundings, still some good bud nipping.

    But now the bud nipping days are over. For a number of reasons (outsourcing being one of them) we (and I) have less control over the design and usage of the databases. By the time somebody 'notices' a design mistake or a 'wrong' choice, the design has been set in stone and we just have to live with it. Sometimes that is frustrating. But it is still challenging to come up with solutions within the given constraints.

    Ben

  • Jeff Moden (2/11/2016)


    Itzik Ben-Gan wrote about this quite a while back but he took the post down and haven't seen it reposted. Might be around somewhere.

    Is this the article you're referencing? New Solution to the Packing Intervals Problem

    Also here is the older article referenced in the one above... Packing Intervals[/url]

  • High Jason...

    Yes. The old article is what I was talking about. Hadn't seen the new one. Unfortunately, I'm like a lot of humans in that if I have something that works well, I don't always look for improved versions. :blush: I like the old article better because of the graphics that accompany the explanation but, of course, I like the new one better for the code.

    Either way, thank you for scaring up the links. I appreciate it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/21/2016)


    High Jason...

    Yes. The old article is what I was talking about. Hadn't seen the new one. Unfortunately, I'm like a lot of humans in that if I have something that works well, I don't always look for improved versions. :blush: I like the old article better because of the graphics that accompany the explanation but, of course, I like the new one better for the code.

    Either way, thank you for scaring up the links. I appreciate it.

    The new version is incredibly simple Jeff, even more so when you replace the MAX() with LAG(). It's so simple that once folks have seen it and it's been explained, they can write it from scratch, anytime. You can't say that about old versions.

    Here's the crib notes I used for teaching the method to our team just last week:

    IF 0 = 1 BEGIN

    drop table #ME;create table #ME (memid int , EffectiveDate datetime , termdate datetime)

    Insert into #ME values ('123','3-Dec-16','10-Jan-17')

    Insert into #ME values ('123','11-Jan-17','8-Feb-17')

    Insert into #ME values ('123','7-Feb-17','5-Mar-17')

    Insert into #ME values ('123','8-Mar-17','15-Apr-17')

    Insert into #ME values ('123','16-Apr-17','24-May-17')

    Insert into #ME values ('124','3-Dec-16','10-Jan-17')

    Insert into #ME values ('124','11-Jan-17','8-Feb-17')

    Insert into #ME values ('124','7-Feb-17','5-Mar-17')

    Insert into #ME values ('124','8-Mar-17','15-Apr-17')

    Insert into #ME values ('124','16-Apr-17','24-May-17')

    ;

    CREATE UNIQUE CLUSTERED INDEX ucx_Stuff ON #ME (memid, EffectiveDate, termdate); END

    -- Standard query (uses a sort to facilitate a fast streaming aggregate)

    SELECT memid, grp,

    EffectiveDate = MIN(EffectiveDate),

    termdate = MAX(termdate)

    FROM ( -- e

    SELECT *, grp = SUM(x.Flag) OVER (PARTITION BY memid ORDER BY EffectiveDate, termdate ROWS UNBOUNDED PRECEDING)

    FROM ( -- d

    SELECT *, LastEndDate = DATEADD(DAY,1,LAG(Termdate,1) OVER(PARTITION BY memid ORDER BY EffectiveDate, termdate))

    FROM #ME

    ) d

    CROSS APPLY (SELECT Flag = CASE WHEN EffectiveDate <= LastEndDate THEN 0 ELSE 1 END) x

    ) e

    GROUP BY memid, grp;

    -- Eliminate the CROSS APPLY, the plan is exactly the same

    SELECT memid, grp,

    EffectiveDate = MIN(EffectiveDate),

    termdate = MAX(termdate)

    FROM (

    SELECT *, grp = SUM(CASE WHEN EffectiveDate <= LastEndDate THEN 0 ELSE 1 END) OVER (PARTITION BY memid ORDER BY EffectiveDate, termdate ROWS UNBOUNDED PRECEDING)

    FROM (

    SELECT *, LastEndDate = DATEADD(DAY,1,LAG(Termdate,1) OVER(PARTITION BY memid ORDER BY EffectiveDate, termdate))

    FROM #ME

    ) d

    ) e

    GROUP BY memid, grp;

    -- Get rid of the sort by hinting a hash aggregate

    SELECT memid, grp,

    EffectiveDate = MIN(EffectiveDate),

    termdate = MAX(termdate)

    FROM ( -- e

    SELECT *, grp = SUM(x.Flag) OVER (PARTITION BY memid ORDER BY EffectiveDate, termdate ROWS UNBOUNDED PRECEDING)

    FROM ( -- d

    SELECT *, LastEndDate = DATEADD(DAY,1,LAG(Termdate,1) OVER(PARTITION BY memid ORDER BY EffectiveDate, termdate))

    FROM #ME

    ) d

    CROSS APPLY (SELECT Flag = CASE WHEN EffectiveDate <= LastEndDate THEN 0 ELSE 1 END) x

    ) e

    GROUP BY memid, grp

    OPTION(HASH GROUP);

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (12/21/2016)


    High Jason...

    Yes. The old article is what I was talking about. Hadn't seen the new one. Unfortunately, I'm like a lot of humans in that if I have something that works well, I don't always look for improved versions. :blush: I like the old article better because of the graphics that accompany the explanation but, of course, I like the new one better for the code.

    Either way, thank you for scaring up the links. I appreciate it.

    Glad to be of service. πŸ™‚

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply