Update order of a table with a clustered index

Question

Update order of a table with a clustered index

Langston Montgomery

SSC Eights!

Points: 934
More actions
December 29, 2010 at 5:01 pm

#231517

Hello everybody,
I got a TSQL question here. My dilemma is that I need to collapse a set of overlapping date ranges (within one day of each other). Obviously one can use cursors and join methods, but they proved to be lacking in performance.
After scouring the SQLsphere, I came up with this sample algorithm/query using an update statement, which takes milliseconds to complete against 1M+ rows. The only concern I have is that in order for this query to work consistently, the update statement must traverse the table in the same order as the clustered index, because the algorithm uses data from the previous row to determine which date ranges to collapse in the current row, and vice-versa.
So my question is: will the update statement consistently pass over the table in the same order as the clustered index (since, as we know, the data is physically stored in the same order as the clustered index)?
I say consistently, because so far this is working against tables with millions of rows. But I'm concerned that because this method is not necessarily "sanctioned", what are the chances of it breaking in the future?
Thanks in advance!!!!
-- prepare sample data
if object_id('tempdb..#test') is not null
drop table #test
create table #test(
a varbinary(20),
d1 datetime,
d2 datetime,
seq int)
CREATE CLUSTERED INDEX [ic_test] ON #test
(
a ASC,
d1 asc,
d2 asc
) ON [PRIMARY]
GO
insert into #test
select 2, '03/20/2010', '03/31/2010', null union all
select 1, '03/9/2010', '03/13/2010', null union all
select 1, '03/15/2010', '03/30/2010', null union all
select 2, '03/20/2010', '03/31/2010', null union all
select 1, '03/8/2010', '03/12/2010', null union all
select 1, '03/12/2010', '03/13/2010', null union all
select 2, '04/01/2010', '04/02/2010', null
declare @DateFrom datetime, @DateTo datetime, @PrevDate datetime, @a varbinary(20), @seq int
select top 1
@seq = 0
, @DateFrom = d1
, @DateTo = d2
, @a = a
from #test
order
by a, d1
update #test
set @seq = case
when @a = a and (d1 between @DateFrom and dateadd(day, 1, @DateTo) or @PrevDate between d1 and dateadd(day, 1, d2)) then @seq
else @seq + 1
end
, @a = a
, @DateFrom = d1
, @DateTo = d2
, @PrevDate = d1
, seq = @seq
option (maxdop 1)
select * from #test option (maxdop 1)
select a, min(d1), max(d2), seq from #test
group by a, seq
order by seq

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Jack Corbett SSC Guru Points: 184394 More actions · Answer 1

You are basically using the "quirky update" which is explained in detail by Jeff Moden in this article, http://www.sqlservercentral.com/articles/T-SQL/68467/. I think he addresses your concerns in the article.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Langston Montgomery SSC Eights! Points: 934 More actions · Answer 2

Excellent! Thank you for pointing me in the right direction. This article does address all my concerns. I did a search on this great site, but I used search terms relevant to my situation. I would have never thought to search for "Quirky Update" 🙂

LutzM SSC Guru Points: 107049 More actions · Answer 3

Langston Montgomery (12/29/2010)
Excellent! Thank you for pointing me in the right direction. This article does address all my concerns. I did a search on this great site, but I used search terms relevant to my situation. I would have never thought to search for "Quirky Update" 🙂

... which is pretty much the issue of all search engines (including BOL, the SQL Server help system): as long as you know what to look for, they are really useful. But until you kow the right term to search for, it's a slightly different story....

Regarding the quirky update: there has been another improvement to make it even more bullet proof: Some refer to it as SafetyCheck.

IIRC, there's another thread around where the safety check method is discussed in detail. But I can't find it at the moment.

All I can remember is: it will make the quirky update even more secure than the method described in the article Jack pointed you at.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jack Corbett SSC Guru Points: 184394 More actions · Answer 4

I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.

It is definitely easier to find something if you know what you are looking for. I actually found it because I knew Jeff had written the article I was looking for so I found all the articles he has written using the Authors link on the left. Although a search for SQL Server Quirky Update returns a lot of rows.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Langston Montgomery SSC Eights! Points: 934 More actions · Answer 5

After reading Jeff's full article on "Quirky Update", and by pure luck, I am following all the rules (including the Safety Check). Well, the MAXDOP 1 option just made sense to me naturally as I know parallelism can affect ordering. But the partitioning and TABLOCKX hint was not apparent to me. These gotchas and the others seem to mostly be solved by copying the data into a temp table, which is what I was already doing in the interest of modularity and code reuse. I guess coding from a "best-practices" standpoint has some advantages from time to time 🙂

Thanks again for your help and dedication.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Langston Montgomery (12/29/2010)
(since, as we know, the data is physically stored in the same order as the clustered index)?

Myth, not true.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

LutzM SSC Guru Points: 107049 More actions · Answer 7

Jack Corbett (12/29/2010)
I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.
...

Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

WayneS SSC Guru Points: 95461 More actions · Answer 8

This post (in the discussion of the article) references both the safety check and the modification to it, and includes the resultant code sample with both applied.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

LutzM SSC Guru Points: 107049 More actions · Answer 9

LutzM (12/30/2010)
Jack Corbett (12/29/2010)
I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.
...
Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).

Jack, I apologize. I completely overlooked your reference to the discussion of the article... :blush:

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jack Corbett SSC Guru Points: 184394 More actions · Answer 10

LutzM (12/30/2010)
LutzM (12/30/2010)
Jack Corbett (12/29/2010)
I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.
...
Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).
Jack, I apologize. I completely overlooked your reference to the discussion of the article... :blush:

No worries. If you read and processed everything I wrote you'd find all my mistakes...:w00t:

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Langston Montgomery SSC Eights! Points: 934 More actions · Answer 11

GilaMonster (12/30/2010)
Langston Montgomery (12/29/2010)
(since, as we know, the data is physically stored in the same order as the clustered index)?
Myth, not true.

Care to elaborate?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 12

Langston Montgomery (12/30/2010)
GilaMonster (12/30/2010)
Langston Montgomery (12/29/2010)
(since, as we know, the data is physically stored in the same order as the clustered index)?
Myth, not true.
Care to elaborate?

Sure.

The clustered index enforces the logical order of the data, the next and previous page pointers on each page provide a way to read the clustered index in the order of the clustering key (just as nonclustered indexes do for their key order). It does not in any way enforce the physical order. If the index is perfectly defragmented and the data file is not fragmented then the logical order will match the physical order of the data on disk, but that's a special case and far from normal.

Indexes - logical order of data. There's many other things that influence the physical order.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Langston Montgomery SSC Eights! Points: 934 More actions · Answer 13

GilaMonster (12/30/2010)
Sure.
The clustered index enforces the logical order of the data, the next and previous page pointers on each page provide a way to read the clustered index in the order of the clustering key (just as nonclustered indexes do for their key order). It does not in any way enforce the physical order. If the index is perfectly defragmented and the data file is not fragmented then the logical order will match the physical order of the data on disk, but that's a special case and far from normal.
Indexes - logical order of data. There's many other things that influence the physical order.

Thanks for checking me on that Gila. I used the wrong language.