Group Islands of Contiguous Dates (SQL Spackle)

  • Well I just stumbled upon this article.I tried to do it using quirky update method and seems to be working but haven't tested it on a huge no of rows though.

    --=============================================================================

    -- Create the test data. This is NOT a part of the solution.

    -- This is virually instantaneous.

    --=============================================================================

    --===== Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL

    DROP TABLE #MyHead

    ;

    GO

    --===== Create the test table

    CREATE TABLE #MyHead

    (SomeDate DATETIME, id int DEFAULT(0))

    ;

    --===== Populate the test table with test data

    INSERT INTO #MyHead

    (SomeDate)

    SELECT '2010-01-01' UNION ALL --1st "Group" of dates (StartDate and EndDate)

    SELECT '2010-01-01' UNION ALL --Duplicate date

    SELECT '2010-01-03' UNION ALL --2nd "Group" of dates (StartDate and EndDate)

    SELECT '2010-01-05' UNION ALL --3rd "Group" of dates (StartDate)

    SELECT '2010-01-06' UNION ALL --3rd "Group" of dates (EndDate)

    SELECT '2010-01-10' UNION ALL --4th "Group" of dates (StartDate)

    SELECT '2010-01-10' UNION ALL --Duplicate date

    SELECT '2010-01-11' UNION ALL --4th "Group" of dates

    SELECT '2010-01-11' UNION ALL --Duplicate date

    SELECT '2010-01-11' UNION ALL --Duplicate date

    SELECT '2010-01-12' --4th "Group" of dates (EndDate)

    ;

    declare @ordse int=0

    declare @somedate datetime=''

    update #MyHead set @ordse=ID=case when somedate=@somedate+1 or @somedate=somedate then @ordse+1 else @ordse-1 end,@somedate=somedate

    select min(somedate)min,max(somedate)max,DATEDIFF(dd,min(SomeDate)-1,max(SomeDate))Diff from

    (

    select *,id-ROW_NUMBER()over(order by (select 1))id1 from #MyHead

    )t group by id1 order by min(SomeDate)

    drop table #MyHead

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • venoym (12/8/2010)


    Great post Jeff!

    I had a thought... this could be extended to account for Times in addition to dates... just move the calculation to the lowest significant field (from day to second). I had to code an algorithm in VB to find the "islands" and the "water" for a scheduling application, this may well drop the need for that...

    Again, Great Job!

    My apologies for the late reply. Thanks for the feedback, venoym!

    Before you think of replacing your VB function, remember that this code solves a very specific problem where the dates are grouped only if contiguous dates are available. It won't currently handle overlapping date "Ranges" where you have a StartDate and EndDate which is typically required for scheduling problems.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rstarr-916208 (12/8/2010)


    I ran a test on 500 dates over the span of four years. It returned in 6 seconds. In the environment I come from that would be sufficient for any practical problems we might come up with.

    Oh, be careful when making such evaluations. First, 6 seconds for only 500 rows violates every SLA for performance that I've ever come across especially for GUI code where most of the SLA's I've been exposed to require no more than 1 second to return.

    The other danger is that someone in a hurry may borrow your code not knowing how long it takes. If they don't actually test for performance, the end user will be the one doing the performance testing for you and taking 6 seconds for only 500 rows is going to be a problem for them. If they run into only 1000 rows, they'll find that code takes exponentially longer times measured in minutes instead of seconds.

    Please be careful with justifying code performance based simply on number of rows... it always ends up biting you in the end.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sachin Nandanwar (1/16/2011)


    Well I just stumbled upon this article.I tried to do it using quirky update method and seems to be working but haven't tested it on a huge no of rows though.

    --=============================================================================

    -- Create the test data. This is NOT a part of the solution.

    -- This is virually instantaneous.

    --=============================================================================

    --===== Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL

    DROP TABLE #MyHead

    ;

    GO

    --===== Create the test table

    CREATE TABLE #MyHead

    (SomeDate DATETIME, id int DEFAULT(0))

    ;

    --===== Populate the test table with test data

    INSERT INTO #MyHead

    (SomeDate)

    SELECT '2010-01-01' UNION ALL --1st "Group" of dates (StartDate and EndDate)

    SELECT '2010-01-01' UNION ALL --Duplicate date

    SELECT '2010-01-03' UNION ALL --2nd "Group" of dates (StartDate and EndDate)

    SELECT '2010-01-05' UNION ALL --3rd "Group" of dates (StartDate)

    SELECT '2010-01-06' UNION ALL --3rd "Group" of dates (EndDate)

    SELECT '2010-01-10' UNION ALL --4th "Group" of dates (StartDate)

    SELECT '2010-01-10' UNION ALL --Duplicate date

    SELECT '2010-01-11' UNION ALL --4th "Group" of dates

    SELECT '2010-01-11' UNION ALL --Duplicate date

    SELECT '2010-01-11' UNION ALL --Duplicate date

    SELECT '2010-01-12' --4th "Group" of dates (EndDate)

    ;

    declare @ordse int=0

    declare @somedate datetime=''

    update #MyHead set @ordse=ID=case when somedate=@somedate+1 or @somedate=somedate then @ordse+1 else @ordse-1 end,@somedate=somedate

    select min(somedate)min,max(somedate)max,DATEDIFF(dd,min(SomeDate)-1,max(SomeDate))Diff from

    (

    select *,id-ROW_NUMBER()over(order by (select 1))id1 from #MyHead

    )t group by id1 order by min(SomeDate)

    drop table #MyHead

    I realize the intentions are good here and thank you for that but there are a couple of problems with the code there. For one, it breaks several of the rules for doing a Quirky Update. It's tough enough for me to defend the use of the Quirky Update as it is. If you're going to use it and post such solutions, please follow the rules for its use. Thanks.

    Second, although the Quirky Update does the job, isn't a panacea and there's simply no need no need for it here. It requires the use of an extra column and would necessarily require the copying of data from a permanent table to a Temp Table if the column couldn't be added to the permanent table.

    Last but not least, since you still do a SELECT with aggregates, I believe you'll find that the Quirky Update method is actually a bit slower than conventional methods, in this case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok, I consider myself duly chastised.

    By the way, what does SLA mean?

  • I realize the intentions are good here and thank you for that but there are a couple of problems with the code there. For one, it breaks several of the rules for doing a Quirky Update. It's tough enough for me to defend the use of the Quirky Update as it is. If you're going to use it and post such solutions, please follow the rules for its use. Thanks.

    Thank you for your feedback.I just thought of trying it in a different way.

    If I remember properly I think you had an article on performance implications of quirky update on sqlservercentral.com

    Can you please post that link ?

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • rstarr-916208 (1/16/2011)


    Ok, I consider myself duly chastised.

    By the way, what does SLA mean?

    SLA is Service Level Agreement. Basically, a performance and quality agreement between the vendor/programmer and company commissioning the work or service.

  • tskelley (1/17/2011)


    rstarr-916208 (1/16/2011)


    Ok, I consider myself duly chastised.

    By the way, what does SLA mean?

    SLA is Service Level Agreement. Basically, a performance and quality agreement between the vendor/programmer and company commissioning the work or service.

    Thank you for the cover on that. It's one of those abbreviations that I use so often everyday with people who know what it is, I sometimes forget there may be those who don't.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sachin Nandanwar (1/17/2011)


    I realize the intentions are good here and thank you for that but there are a couple of problems with the code there. For one, it breaks several of the rules for doing a Quirky Update. It's tough enough for me to defend the use of the Quirky Update as it is. If you're going to use it and post such solutions, please follow the rules for its use. Thanks.

    Thank you for your feedback.I just thought of trying it in a different way.

    If I remember properly I think you had an article on performance implications of quirky update on sqlservercentral.com

    Can you please post that link ?

    Sure... and sorry... I get a little touchy about people not following all the rules because of the huge amount of heat I've taken about it. Here's the link:

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    Please be advised of a couple of things about that article... it's in the process of being updated for a March re-release because I screwed up on the execution plan explanation, an extremely unlikely "break" has been discovered, and just as quick as that was discovered, a method for guaranteeing the operation even past that "break" using a simple "inline" data-checker has also been discovered. So far, retesting hasn't required any changes in the rules but I'm also not done with the retesting on the new guaranteed method.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sure... and sorry... I get a little touchy about people not following all the rules because of the huge amount of heat I've taken about it. Here's the link:

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    Sure.. no issues.I don't mind.:-)

    Thanks again for your feedback and the link.Greatly appreciate it.

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Oh, that row_number trick is so beautiful it makes me want to cry!

  • quickdraw (6/27/2011)


    Oh, that row_number trick is so beautiful it makes me want to cry!

    If you'd like to see a similar "Row_Number Trick" on steroids 😉 to solve the problem when the dates and times aren't contiguous and are truly overlapping, check out Itzik's article on the subject. The man's use of simple mathematics is something to behold. Here's the link:

    http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx

    That site does require a membership to read the full article just as SQLServerCentral does. And, like SQLServerCentral, membership is free and safe and they only need your email address. They don't sell your email address nor give it to "interested parties" unless you allow them to by not unchecking some of the "agreement" boxes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • How will you expand the solution when you have StartDate EndDate fields in your table and you want continuous intervals, e.g.

    01/01/2010 - 01/15/2010

    01/16/2010 - 02/10/2010

    The above two intervals should come as

    01/01/2010 - 02/10/2010

    I've never seen a blog explaining this more complicated case, although I have seen and tried myself to solve this problem.

  • Naomi N (11/30/2011)


    How will you expand the solution when you have StartDate EndDate fields in your table and you want continuous intervals, e.g.

    01/01/2010 - 01/15/2010

    01/16/2010 - 02/10/2010

    The above two intervals should come as

    01/01/2010 - 02/10/2010

    I've never seen a blog explaining this more complicated case, although I have seen and tried myself to solve this problem.

    Naomi you'll want to start a separate thread to discuss this, but the trick is to use a Tally/Calendar table to fill/generate the dates between the two dates...it's a closely related idea to the Tally Splitting functionality.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Naomi N (11/30/2011)


    How will you expand the solution when you have StartDate EndDate fields in your table and you want continuous intervals, e.g.

    01/01/2010 - 01/15/2010

    01/16/2010 - 02/10/2010

    The above two intervals should come as

    01/01/2010 - 02/10/2010

    I've never seen a blog explaining this more complicated case, although I have seen and tried myself to solve this problem.

    There's a similar thread here

    http://www.sqlservercentral.com/Forums/Topic1125847-392-1.aspx

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 15 posts - 31 through 45 (of 62 total)

You must be logged in to reply to this topic. Login to reply