December 29, 2010 at 5:01 pm
Hello everybody,
I got a TSQL question here. My dilemma is that I need to collapse a set of overlapping date ranges (within one day of each other). Obviously one can use cursors and join methods, but they proved to be lacking in performance.
After scouring the SQLsphere, I came up with this sample algorithm/query using an update statement, which takes milliseconds to complete against 1M+ rows. The only concern I have is that in order for this query to work consistently, the update statement must traverse the table in the same order as the clustered index, because the algorithm uses data from the previous row to determine which date ranges to collapse in the current row, and vice-versa.
So my question is: will the update statement consistently pass over the table in the same order as the clustered index (since, as we know, the data is physically stored in the same order as the clustered index)?
I say consistently, because so far this is working against tables with millions of rows. But I'm concerned that because this method is not necessarily "sanctioned", what are the chances of it breaking in the future?
Thanks in advance!!!!
-- prepare sample data
if object_id('tempdb..#test') is not null
drop table #test
create table #test(
a varbinary(20),
d1 datetime,
d2 datetime,
seq int)
CREATE CLUSTERED INDEX [ic_test] ON #test
(
a ASC,
d1 asc,
d2 asc
) ON [PRIMARY]
GO
insert into #test
select 2, '03/20/2010', '03/31/2010', null union all
select 1, '03/9/2010', '03/13/2010', null union all
select 1, '03/15/2010', '03/30/2010', null union all
select 2, '03/20/2010', '03/31/2010', null union all
select 1, '03/8/2010', '03/12/2010', null union all
select 1, '03/12/2010', '03/13/2010', null union all
select 2, '04/01/2010', '04/02/2010', null
declare @DateFrom datetime, @DateTo datetime, @PrevDate datetime, @a varbinary(20), @seq int
select top 1
@seq = 0
, @DateFrom = d1
, @DateTo = d2
, @a = a
from #test
order
by a, d1
update #test
set @seq = case
when @a = a and (d1 between @DateFrom and dateadd(day, 1, @DateTo) or @PrevDate between d1 and dateadd(day, 1, d2)) then @seq
else @seq + 1
end
, @a = a
, @DateFrom = d1
, @DateTo = d2
, @PrevDate = d1
, seq = @seq
option (maxdop 1)
select * from #test option (maxdop 1)
select a, min(d1), max(d2), seq from #test
group by a, seq
order by seq
December 29, 2010 at 5:13 pm
You are basically using the "quirky update" which is explained in detail by Jeff Moden in this article, http://www.sqlservercentral.com/articles/T-SQL/68467/. I think he addresses your concerns in the article.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 29, 2010 at 5:56 pm
Excellent! Thank you for pointing me in the right direction. This article does address all my concerns. I did a search on this great site, but I used search terms relevant to my situation. I would have never thought to search for "Quirky Update" 🙂
December 29, 2010 at 6:37 pm
Langston Montgomery (12/29/2010)
Excellent! Thank you for pointing me in the right direction. This article does address all my concerns. I did a search on this great site, but I used search terms relevant to my situation. I would have never thought to search for "Quirky Update" 🙂
... which is pretty much the issue of all search engines (including BOL, the SQL Server help system): as long as you know what to look for, they are really useful. But until you kow the right term to search for, it's a slightly different story....
Regarding the quirky update: there has been another improvement to make it even more bullet proof: Some refer to it as SafetyCheck.
IIRC, there's another thread around where the safety check method is discussed in detail. But I can't find it at the moment.
All I can remember is: it will make the quirky update even more secure than the method described in the article Jack pointed you at.
December 29, 2010 at 7:02 pm
I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.
It is definitely easier to find something if you know what you are looking for. I actually found it because I knew Jeff had written the article I was looking for so I found all the articles he has written using the Authors link on the left. Although a search for SQL Server Quirky Update returns a lot of rows.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 29, 2010 at 7:26 pm
After reading Jeff's full article on "Quirky Update", and by pure luck, I am following all the rules (including the Safety Check). Well, the MAXDOP 1 option just made sense to me naturally as I know parallelism can affect ordering. But the partitioning and TABLOCKX hint was not apparent to me. These gotchas and the others seem to mostly be solved by copying the data into a temp table, which is what I was already doing in the interest of modularity and code reuse. I guess coding from a "best-practices" standpoint has some advantages from time to time 🙂
Thanks again for your help and dedication.
December 30, 2010 at 2:15 am
Langston Montgomery (12/29/2010)
(since, as we know, the data is physically stored in the same order as the clustered index)?
Myth, not true.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 30, 2010 at 2:39 am
Jack Corbett (12/29/2010)
I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post....
Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).
December 30, 2010 at 9:31 am
This post (in the discussion of the article) references both the safety check and the modification to it, and includes the resultant code sample with both applied.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
December 30, 2010 at 9:39 am
LutzM (12/30/2010)
Jack Corbett (12/29/2010)
I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post....
Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).
Jack, I apologize. I completely overlooked your reference to the discussion of the article... :blush:
December 30, 2010 at 9:42 am
LutzM (12/30/2010)
LutzM (12/30/2010)
Jack Corbett (12/29/2010)
I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post....
Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).
Jack, I apologize. I completely overlooked your reference to the discussion of the article... :blush:
No worries. If you read and processed everything I wrote you'd find all my mistakes...:w00t:
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 30, 2010 at 12:10 pm
GilaMonster (12/30/2010)
Langston Montgomery (12/29/2010)
(since, as we know, the data is physically stored in the same order as the clustered index)?Myth, not true.
Care to elaborate?
December 30, 2010 at 12:22 pm
Langston Montgomery (12/30/2010)
GilaMonster (12/30/2010)
Langston Montgomery (12/29/2010)
(since, as we know, the data is physically stored in the same order as the clustered index)?Myth, not true.
Care to elaborate?
Sure.
The clustered index enforces the logical order of the data, the next and previous page pointers on each page provide a way to read the clustered index in the order of the clustering key (just as nonclustered indexes do for their key order). It does not in any way enforce the physical order. If the index is perfectly defragmented and the data file is not fragmented then the logical order will match the physical order of the data on disk, but that's a special case and far from normal.
Indexes - logical order of data. There's many other things that influence the physical order.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 30, 2010 at 1:58 pm
GilaMonster (12/30/2010)
Sure.
The clustered index enforces the logical order of the data, the next and previous page pointers on each page provide a way to read the clustered index in the order of the clustering key (just as nonclustered indexes do for their key order). It does not in any way enforce the physical order. If the index is perfectly defragmented and the data file is not fragmented then the logical order will match the physical order of the data on disk, but that's a special case and far from normal.
Indexes - logical order of data. There's many other things that influence the physical order.
Thanks for checking me on that Gila. I used the wrong language.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply