Group Islands of Contiguous Dates (SQL Spackle)

Question

Group Islands of Contiguous Dates (SQL Spackle)

Viewing 15 posts - 31 through 45 (of 62 total)

You must be logged in to reply to this topic. Login to reply

Sachin Nandanwar SSCertifiable Points: 7699 More actions · Answer 1

Well I just stumbled upon this article.I tried to do it using quirky update method and seems to be working but haven't tested it on a huge no of rows though.

--=============================================================================

-- Create the test data. This is NOT a part of the solution.

-- This is virually instantaneous.

--=============================================================================

--===== Conditionally drop the test table to make reruns easier.

IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL

DROP TABLE #MyHead

;

GO

--===== Create the test table

CREATE TABLE #MyHead

(SomeDate DATETIME, id int DEFAULT(0))

;

--===== Populate the test table with test data

INSERT INTO #MyHead

(SomeDate)

SELECT '2010-01-01' UNION ALL --1st "Group" of dates (StartDate and EndDate)

SELECT '2010-01-01' UNION ALL --Duplicate date

SELECT '2010-01-03' UNION ALL --2nd "Group" of dates (StartDate and EndDate)

SELECT '2010-01-05' UNION ALL --3rd "Group" of dates (StartDate)

SELECT '2010-01-06' UNION ALL --3rd "Group" of dates (EndDate)

SELECT '2010-01-10' UNION ALL --4th "Group" of dates (StartDate)

SELECT '2010-01-10' UNION ALL --Duplicate date

SELECT '2010-01-11' UNION ALL --4th "Group" of dates

SELECT '2010-01-11' UNION ALL --Duplicate date

SELECT '2010-01-12' --4th "Group" of dates (EndDate)

;

declare @ordse int=0

declare @somedate datetime=''

update #MyHead set @ordse=ID=case when somedate=@somedate+1 or @somedate=somedate then @ordse+1 else @ordse-1 end,@somedate=somedate

select min(somedate)min,max(somedate)max,DATEDIFF(dd,min(SomeDate)-1,max(SomeDate))Diff from

(

select *,id-ROW_NUMBER()over(order by (select 1))id1 from #MyHead

)t group by id1 order by min(SomeDate)

drop table #MyHead

--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 2

venoym (12/8/2010)
Great post Jeff!
I had a thought... this could be extended to account for Times in addition to dates... just move the calculation to the lowest significant field (from day to second). I had to code an algorithm in VB to find the "islands" and the "water" for a scheduling application, this may well drop the need for that...
Again, Great Job!

My apologies for the late reply. Thanks for the feedback, venoym!

Before you think of replacing your VB function, remember that this code solves a very specific problem where the dates are grouped only if contiguous dates are available. It won't currently handle overlapping date "Ranges" where you have a StartDate and EndDate which is typically required for scheduling problems.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 3

rstarr-916208 (12/8/2010)
I ran a test on 500 dates over the span of four years. It returned in 6 seconds. In the environment I come from that would be sufficient for any practical problems we might come up with.

Oh, be careful when making such evaluations. First, 6 seconds for only 500 rows violates every SLA for performance that I've ever come across especially for GUI code where most of the SLA's I've been exposed to require no more than 1 second to return.

The other danger is that someone in a hurry may borrow your code not knowing how long it takes. If they don't actually test for performance, the end user will be the one doing the performance testing for you and taking 6 seconds for only 500 rows is going to be a problem for them. If they run into only 1000 rows, they'll find that code takes exponentially longer times measured in minutes instead of seconds.

Please be careful with justifying code performance based simply on number of rows... it always ends up biting you in the end.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 4

Sachin Nandanwar (1/16/2011)
Well I just stumbled upon this article.I tried to do it using quirky update method and seems to be working but haven't tested it on a huge no of rows though.
--=============================================================================
-- Create the test data. This is NOT a part of the solution.
-- This is virually instantaneous.
--=============================================================================
--===== Conditionally drop the test table to make reruns easier.
IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
DROP TABLE #MyHead
;
GO
--===== Create the test table
CREATE TABLE #MyHead
(SomeDate DATETIME, id int DEFAULT(0))
;
--===== Populate the test table with test data
INSERT INTO #MyHead
(SomeDate)
SELECT '2010-01-01' UNION ALL --1st "Group" of dates (StartDate and EndDate)
SELECT '2010-01-01' UNION ALL --Duplicate date
SELECT '2010-01-03' UNION ALL --2nd "Group" of dates (StartDate and EndDate)
SELECT '2010-01-05' UNION ALL --3rd "Group" of dates (StartDate)
SELECT '2010-01-06' UNION ALL --3rd "Group" of dates (EndDate)
SELECT '2010-01-10' UNION ALL --4th "Group" of dates (StartDate)
SELECT '2010-01-10' UNION ALL --Duplicate date
SELECT '2010-01-11' UNION ALL --4th "Group" of dates
SELECT '2010-01-11' UNION ALL --Duplicate date
SELECT '2010-01-11' UNION ALL --Duplicate date
SELECT '2010-01-12' --4th "Group" of dates (EndDate)
;
declare @ordse int=0
declare @somedate datetime=''
update #MyHead set @ordse=ID=case when somedate=@somedate+1 or @somedate=somedate then @ordse+1 else @ordse-1 end,@somedate=somedate
select min(somedate)min,max(somedate)max,DATEDIFF(dd,min(SomeDate)-1,max(SomeDate))Diff from
(
select *,id-ROW_NUMBER()over(order by (select 1))id1 from #MyHead
)t group by id1 order by min(SomeDate)
drop table #MyHead

I realize the intentions are good here and thank you for that but there are a couple of problems with the code there. For one, it breaks several of the rules for doing a Quirky Update. It's tough enough for me to defend the use of the Quirky Update as it is. If you're going to use it and post such solutions, please follow the rules for its use. Thanks.

Second, although the Quirky Update does the job, isn't a panacea and there's simply no need no need for it here. It requires the use of an extra column and would necessarily require the copying of data from a permanent table to a Temp Table if the column couldn't be added to the permanent table.

Last but not least, since you still do a SELECT with aggregates, I believe you'll find that the Quirky Update method is actually a bit slower than conventional methods, in this case.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

rstarr-916208 SSC Enthusiast Points: 119 More actions · Answer 5

Ok, I consider myself duly chastised.

By the way, what does SLA mean?

Sachin Nandanwar SSCertifiable Points: 7699 More actions · Answer 6

I realize the intentions are good here and thank you for that but there are a couple of problems with the code there. For one, it breaks several of the rules for doing a Quirky Update. It's tough enough for me to defend the use of the Quirky Update as it is. If you're going to use it and post such solutions, please follow the rules for its use. Thanks.

Thank you for your feedback.I just thought of trying it in a different way.

If I remember properly I think you had an article on performance implications of quirky update on sqlservercentral.com

Can you please post that link ?

--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server

tskelley SSCommitted Points: 1870 More actions · Answer 7

rstarr-916208 (1/16/2011)
Ok, I consider myself duly chastised.
By the way, what does SLA mean?

SLA is Service Level Agreement. Basically, a performance and quality agreement between the vendor/programmer and company commissioning the work or service.

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 8

tskelley (1/17/2011)
rstarr-916208 (1/16/2011)
Ok, I consider myself duly chastised.
By the way, what does SLA mean?
SLA is Service Level Agreement. Basically, a performance and quality agreement between the vendor/programmer and company commissioning the work or service.

Thank you for the cover on that. It's one of those abbreviations that I use so often everyday with people who know what it is, I sometimes forget there may be those who don't.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 9

Sachin Nandanwar (1/17/2011)
I realize the intentions are good here and thank you for that but there are a couple of problems with the code there. For one, it breaks several of the rules for doing a Quirky Update. It's tough enough for me to defend the use of the Quirky Update as it is. If you're going to use it and post such solutions, please follow the rules for its use. Thanks.
Thank you for your feedback.I just thought of trying it in a different way.
If I remember properly I think you had an article on performance implications of quirky update on sqlservercentral.com
Can you please post that link ?

Sure... and sorry... I get a little touchy about people not following all the rules because of the huge amount of heat I've taken about it. Here's the link:

http://www.sqlservercentral.com/articles/T-SQL/68467/

Please be advised of a couple of things about that article... it's in the process of being updated for a March re-release because I screwed up on the execution plan explanation, an extremely unlikely "break" has been discovered, and just as quick as that was discovered, a method for guaranteeing the operation even past that "break" using a simple "inline" data-checker has also been discovered. So far, retesting hasn't required any changes in the rules but I'm also not done with the retesting on the new guaranteed method.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sachin Nandanwar SSCertifiable Points: 7699 More actions · Answer 10

Sure... and sorry... I get a little touchy about people not following all the rules because of the huge amount of heat I've taken about it. Here's the link:
http://www.sqlservercentral.com/articles/T-SQL/68467/

Sure.. no issues.I don't mind.:-)

Thanks again for your feedback and the link.Greatly appreciate it.

--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server

quickdraw SSC Enthusiast Points: 151 More actions · Answer 11

Oh, that row_number trick is so beautiful it makes me want to cry!

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 12

quickdraw (6/27/2011)
Oh, that row_number trick is so beautiful it makes me want to cry!

If you'd like to see a similar "Row_Number Trick" on steroids 😉 to solve the problem when the dates and times aren't contiguous and are truly overlapping, check out Itzik's article on the subject. The man's use of simple mathematics is something to behold. Here's the link:

http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx

That site does require a membership to read the full article just as SQLServerCentral does. And, like SQLServerCentral, membership is free and safe and they only need your email address. They don't sell your email address nor give it to "interested parties" unless you allow them to by not unchecking some of the "agreement" boxes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Naomi N Say Hey Kid Points: 695 More actions · Answer 13

How will you expand the solution when you have StartDate EndDate fields in your table and you want continuous intervals, e.g.

01/01/2010 - 01/15/2010

01/16/2010 - 02/10/2010

The above two intervals should come as

01/01/2010 - 02/10/2010

I've never seen a blog explaining this more complicated case, although I have seen and tried myself to solve this problem.

Lowell SSC Guru Points: 323518 More actions · Answer 14

Naomi N (11/30/2011)
How will you expand the solution when you have StartDate EndDate fields in your table and you want continuous intervals, e.g.
01/01/2010 - 01/15/2010
01/16/2010 - 02/10/2010
The above two intervals should come as
01/01/2010 - 02/10/2010
I've never seen a blog explaining this more complicated case, although I have seen and tried myself to solve this problem.

Naomi you'll want to start a separate thread to discuss this, but the trick is to use a Tally/Calendar table to fill/generate the dates between the two dates...it's a closely related idea to the Tally Splitting functionality.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 15

Naomi N (11/30/2011)
How will you expand the solution when you have StartDate EndDate fields in your table and you want continuous intervals, e.g.
01/01/2010 - 01/15/2010
01/16/2010 - 02/10/2010
The above two intervals should come as
01/01/2010 - 02/10/2010
I've never seen a blog explaining this more complicated case, although I have seen and tried myself to solve this problem.

There's a similar thread here

http://www.sqlservercentral.com/Forums/Topic1125847-392-1.aspx

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537