February 4, 2010 at 1:18 am
Hi Jeff,
As indicated in my previous message, I somehow lost sight of this discussion. Sorry for my sudden disappearance from this topic.
First, thank you for optimizing my alternative running totals code even further. I hope you don't mind me using the optimized version of this code (with attribution, of course) for future presentations, blog posts, articles, etc.
When reading over all the new posts, I didn't find any reaction to one of the points I raised. It's about the claim that nobody has yet managed to break the quirky update. I did post at least two bits of code in this thread that did break it. You dismissed the first as using a JOIN; I objected to that, as an IN with a constant list rather than a subquery is not equivalent to a JOIN, but to a series of OR'ed conditions, and then posted another code snippet that breaks it without using IN or OR. If you ever responded to my objection or the last snippet, I overlooked the reaction - can you please tell me the page number to re-read? And if you never responded, than I hope you will do that this time.
For your convenience, here is the last code I posted again:
DECLARE @Counter INT;
SET @Counter = 0;
UPDATE dbo.TransactionDetail
SET @Counter = AccountRunningCount = @Counter + 1
FROM dbo.TransactionDetail WITH (TABLOCKX)
WHERE TransactionDetailID BETWEEN 120000 AND 120010
OPTION (MAXDOP 1);
GO
--===== Select all the rows in order by the clustered index
SELECT *
FROM dbo.TransactionDetail
WHERE TransactionDetailID BETWEEN 120000 AND 120010
ORDER BY AccountID, Date, TransactionDetailID;
February 4, 2010 at 2:10 am
Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).
That particular use of UPDATE in this context is only asking for trouble.
Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?
My two cents.
Tony.
February 4, 2010 at 5:53 am
Hi Hugo,
You overlooked my post which addressed the 'problem'. It was quite long, so I'm quite shocked that you missed it 😉
Extract: Paul White to Jeff Moden
How much do you regret retiring the INDEX(0) specification present in your original article? That simple hint does away with Hugo's attempt to break it using a WHERE clause or an IN statement. In addition to that, this method has always been about persisting a running total for all records in the set. If you want to do a small subset (enough to invoke an unordered update) you'd probably use another method. That said, adding the INDEX(0) back in solves all those worries....
DECLARE @Counter INTEGER;
SET @Counter = 0;
BEGIN TRANSACTION;
UPDATE TD
SET @Counter = AccountRunningCount = @Counter + 1
FROM dbo.TransactionDetail TD WITH (TABLOCKX, INDEX(0))
WHERE TransactionDetailID BETWEEN 120000 AND 120010 -- IN (12,123,1234,23,234,2345,34,345,3456)
OPTION (MAXDOP 1);
SELECT *
FROM dbo.TransactionDetail
WHERE TransactionDetailID BETWEEN 120000 AND 120010
ORDER BY AccountID, Date, TransactionDetailID;
ROLLBACK TRANSACTION;
BTW, your RANK() solution is much more clever than I originally gave it credit for. My sincere apologies for that; I hadn't fully appreciated the subtleties until relatively recently.
My only reservation about it is that its performance depends somewhat on the data. If the data is split into many groups - AccountIDs in Jeff's sample data - it absolutely flies. On the other end of the scale, where there is only one group, it behaves like an extremely slow cursor, and is in fact roundly beaten by same.
For running total calculations where the data is not written back to an existing column in the source, your method beats the quirky update by a small margin in my tests (test rig available). I am currently unable to produce a SQLCLR routine which beats either of these methods on their home turf, although I can get very, very close 🙂
Paul
February 4, 2010 at 5:55 am
tony rogerson (2/4/2010)
Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).That particular use of UPDATE in this context is only asking for trouble.
Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?
My two cents.
Tony.
Thanks for the visit, Tony and I appreciate the words of caution which are similar to the same cautions I gave in the article. And, yes, I agree it's an undocumented feature as I also said in the article. If it ever breaks then I too will use a cursor or maybe even a "CLR" as a workaround.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 4, 2010 at 5:57 am
Relying on query hints to force a side effect feature is just madness - you are relying on the behaviour of the engine in specific product versions (and builds).
Seriously - don't use the side effect feature of the UPDATE like this.
It was originally put in the product to capture back into a local variable so we don't have to use an UPDATE and seperate SELECT.
Tony.
February 4, 2010 at 6:00 am
tony rogerson (2/4/2010)
Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).
*This is not a knee-jerk anti-cursor reaction*
Tony, are you serious? Hugo's method beats the cursor method in 99.9% of cases. SQLCLR methods beat the cursor in 99.9% of cases. Quirky Update beats the cursor in 100% of cases but requires care. I'm sure there are other methods too that have far more merit than even the most keenly optimized LOCAL READ_ONLY FORWARD_ONLY STATIC cursor!
Cursors do suck, and they discourage set-based thinking. As John McEnroe famously said: "You cannot be serious!"
😛
Paul
February 4, 2010 at 6:03 am
Jeff Moden (2/4/2010)
...I too will use [...] a "CLR"
At last! :laugh: 😛 😀 :w00t:
And I have it in writing. Even if it did require extreme editing skills.
Paul
February 4, 2010 at 6:06 am
Jeff Moden (2/4/2010)
tony rogerson (2/4/2010)
Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).That particular use of UPDATE in this context is only asking for trouble.
Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?
My two cents.
Tony.
Thanks for the visit, Tony and I appreciate the words of caution which are similar to the same cautions I gave in the article. And, yes, I agree it's an undocumented feature as I also said in the article. If it ever breaks then I too will use a cursor or maybe even a "CLR" as a workaround.
Hi Jeff - I do respect the amount of work you have put into this - I know the effort involved, I've done it so many times myself.
But, to publically recommend this method is just irresponsible to be frank.
Unless you are going to go through every single version and sp and cu update of the product in order to make sure they have not changed something that breaks the undocumented feature you are relying on then the fact it works on the one version, one build is mute.
I know I'll get flamed for posting this - but this is the level of problem this will cause! Not everybody is at your level, a lot of people will just use it.
Also - the warning is just that other people have been bitching that this feature is undocumented rather than an up front warning that "never ever use this in a production environment as you are relying on undocumented behaviour that may change right under your feet on the next service pack or cumulative update you apply".
If you do that and put it in big red bold letters right at the start of your article then I'd be happy!
Tony.
February 4, 2010 at 6:07 am
Not in my testing they don't!
I've presented a number of times at UK user group meeting on running totals; I have an example that proves the opposite.
The cursor was faster - its all about how you right your code.
Tony.
February 4, 2010 at 6:17 am
tony rogerson (2/4/2010)
I've presented a number of times at UK user group meeting on running totals; I have an example that proves the opposite. The cursor was faster - its all about how you right [sic] your code.
Please share 😉
February 4, 2010 at 6:18 am
Hugo Kornelis (2/4/2010)
Hi Jeff,As indicated in my previous message, I somehow lost sight of this discussion. Sorry for my sudden disappearance from this topic.
First, thank you for optimizing my alternative running totals code even further. I hope you don't mind me using the optimized version of this code (with attribution, of course) for future presentations, blog posts, articles, etc.
Absolutely not a problem. Thanks for asking. Glad I could make a contribution. Thanks, Hugo.
When reading over all the new posts, I didn't find any reaction to one of the points I raised. It's about the claim that nobody has yet managed to break the quirky update. I did post at least two bits of code in this thread that did break it. You dismissed the first as using a JOIN; I objected to that, as an IN with a constant list rather than a subquery is not equivalent to a JOIN, but to a series of OR'ed conditions, and then posted another code snippet that breaks it without using IN or OR. If you ever responded to my objection or the last snippet, I overlooked the reaction - can you please tell me the page number to re-read? And if you never responded, than I hope you will do that this time.
For your convenience, here is the last code I posted again:
DECLARE @Counter INT;
SET @Counter = 0;
UPDATE dbo.TransactionDetail
SET @Counter = AccountRunningCount = @Counter + 1
FROM dbo.TransactionDetail WITH (TABLOCKX)
WHERE TransactionDetailID BETWEEN 120000 AND 120010
OPTION (MAXDOP 1);
GO
--===== Select all the rows in order by the clustered index
SELECT *
FROM dbo.TransactionDetail
WHERE TransactionDetailID BETWEEN 120000 AND 120010
ORDER BY AccountID, Date, TransactionDetailID;
My apologies. You're correct. I didn't address that. I'll go back and look at your original post to make sure I get the context right.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 4, 2010 at 6:22 am
Its on my list to blog about but I'm concentrating on my MSc at the moment, here you are....
If you hunt around the presentation may be online - I did the talk at SQL Bits in 2008. Hence the USE below 🙂
USE SQLBits20080301
go
/***
A look at RUNNING totals
Run these with the Actual Query Plan
SET STATISTICS IO ON
SET STATISTICS TIME ON
****/
create table #amts (
id int not null primary key clustered,
tran_date smalldatetime not null,
amount decimal(28,2) not null
)
create index ncidx_tran_date on #amts( tran_date )
truncate table #amts
insert #amts ( id, tran_date, amount ) values( 1, '20080208', 10 )
insert #amts ( id, tran_date, amount ) values( 2, '20080212', 10 )
insert #amts ( id, tran_date, amount ) values( 3, '20080208', 10 )
insert #amts ( id, tran_date, amount ) values( 4, '20080211', 10 )
insert #amts ( id, tran_date, amount ) values( 5, '20080212', 10 )
insert #amts ( id, tran_date, amount ) values( 6, '20080212', 10 )
-- Wrong....
select *,
rolling = coalesce(
( select sum( amount )
from #amts a2
where a2.tran_date <= a1.tran_date
and a2.id <> a1.id ), 0 )
+ amount
from #amts a1
order by tran_date, id
-- Wrong....
select *,
rolling = coalesce(
( select sum( amount )
from #amts a2
where a2.tran_date <= a1.tran_date
and a2.id < a1.id ), 0 )
+ amount
from #amts a1
order by tran_date, id
-- Wrong....
select *,
rolling = coalesce(
( select sum( amount )
from #amts a2
where a2.tran_date <= a1.tran_date
), 0 )
+ amount
from #amts a1
order by tran_date, id
-- Correct....
select *,
rolling = coalesce(
( select sum( amount )
from #amts a2
where a2.tran_date < a1.tran_date
or ( a2.tran_date = a1.tran_date
and a2.id < a1.id )
), 0 )
+ amount
from #amts a1
order by tran_date, id
go
-- Without the sub-query...
select a.*, dt.running_total + a.amount
from (
select a1.id, running_total = coalesce( sum( a2.amount ), 0 )
from #amts a1
left outer join #amts a2 on a2.tran_date < a1.tran_date
or ( a2.tran_date = a1.tran_date
and a2.id < a1.id )
group by a1.id
) as dt
inner join #amts a on a.id = dt.id
order by tran_date, id
/***
A look at using a CTE to help
***/
-- First we need a sequence
-- Notice the Sequence Generator Step
with Trans ( row_no, id, tran_date, amount )
as (
select row_no = row_number() over( order by tran_date, id ), id, tran_date, amount
from #amts
)
select *
from Trans
go
-- Now the real query
-- Notice 3!!! Sequence Generator Steps - notice where the Filter step is
with Trans ( row_no, id, tran_date, amount )
as (
select row_no = row_number() over( order by tran_date, id ), id, tran_date, amount
from #amts
)
select t1.*,
running_amount = coalesce(
( select sum( amount )
from Trans t2
where t2.row_no < t1.row_no ), 0 )
+ amount
from Trans t1
order by row_no
go
/************************
On Real Data Volumes
*********************/
-- Rows in the limited data set... - check IO
select count(*) from transactions where account_id between 1000 and 1020
select * from transactions where account_id between 1000 and 1020
-- 588
-- CTE way - row_number() approach
-- Problematic - row_number() is calculated on entire table rather than just the ones we want!
-- Bug / Optimiser short fall?
-- Sub Query with CTE - ROW_NUMBER() just corrects the out of order id (surrogate key)
with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )
as (
select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id
from Transactions
where account_id between 1000 and 1020
)
select t1.*,
running_amount = coalesce(
( select sum( amount )
from Trans t2
where t2.row_no < t1.row_no
and t2.account_id = t1.account_id ), 0 )
+ amount
from Trans t1
order by account_id, row_no
go
-- Using a temporary table instead...
select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id
into #t
from Transactions
where account_id between 1000 and 1020;
create unique clustered index clidx on #t ( account_id, row_no )
; with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )
as (
select *
from #t
)
select t1.*,
running_amount = coalesce(
( select sum( amount )
from Trans t2
where t2.row_no < t1.row_no
and t2.account_id = t1.account_id ), 0 )
+ amount
from Trans t1
order by account_id, row_no
go
drop table #t
/***
The other methods on real data...
****/
-- Just using a sub-query and using a surrogate key (id column)
select *,
rolling = coalesce(
( select sum( amount )
from Transactions a2
where a2.account_id = a1.account_id
and ( a2.tran_date < a1.tran_date
or ( a2.tran_date = a1.tran_date
and a2.id < a1.id ) )
), 0 )
+ amount
from Transactions a1
where account_id between 1000 and 1020
order by account_id, tran_date, id
go
-- Self join approach
select a.*, dt.running_total + a.amount
from (
select a1.id, running_total = coalesce( sum( a2.amount ), 0 )
from Transactions a1
left outer join Transactions a2 on a2.account_id = a1.account_id
and ( a2.tran_date < a1.tran_date
or ( a2.tran_date = a1.tran_date
and a2.id < a1.id ) )
where a1.account_id between 1000 and 1020
group by a1.id
) as dt
inner join Transactions a on a.id = dt.id
order by account_id, tran_date, id
go
-- Cursor - using UPDATE method
set nocount on
create table #Trans(
row_no int not null,
account_id int not null,
id int not null,
amount decimal(28,2) not null,
rolling_total decimal(28,2) not null,
salesperson_id int not null,
tran_date smalldatetime not null,
clear_date smalldatetime not null,
transaction_types_id int not null,
primary key clustered( account_id, row_no )
)
insert #Trans (
row_no,
account_id,
id,
amount,
rolling_total,
salesperson_id,
tran_date,
clear_date,
transaction_types_id )
select row_no = row_number() over( partition by account_id order by tran_date, id ),
account_id,
id,
amount,
0,
salesperson_id,
tran_date,
clear_date,
transaction_types_id
from Transactions
where account_id between 1000 and 5000
declare trans_cur cursor forward_only for
select account_id, row_no, amount
from #Trans
order by account_id, row_no
for update of rolling_total
declare @account_id int,
@row_no int,
@amount decimal(28,2),
@rolling_total decimal(28,2)
open trans_cur
fetch next from trans_cur into @account_id, @row_no, @amount
begin tran
while @@fetch_status = 0
begin
if @row_no = 1 -- initialise on each partition
set @rolling_total = @amount
else
set @rolling_total = @rolling_total + @amount
update #Trans
set rolling_total = @rolling_total
where current of trans_cur
fetch next from trans_cur into @account_id, @row_no, @amount
end
deallocate trans_cur
commit tran
select id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, amount, rolling_total
from #Trans
order by account_id, tran_date, id
drop table #Trans
go
-- Cursor - INSERT method
set nocount on
create table #Trans(
account_id int not null,
id int not null,
amount decimal(28,2) not null,
rolling_total decimal(28,2) not null,
salesperson_id int not null,
tran_date smalldatetime not null,
clear_date smalldatetime not null,
transaction_types_id int not null
)
declare trans_cur cursor fast_forward for
select account_id,
row_no = row_number() over( partition by account_id order by tran_date, id ),
id,
amount,
salesperson_id,
tran_date,
clear_date,
transaction_types_id
from Transactions
where account_id between 1000 and 5000
order by account_id, row_no
declare @account_id int,
@row_no int,
@id int,
@amount decimal(28,2),
@salesperson_id int,
@tran_date smalldatetime,
@clear_date smalldatetime,
@transaction_types_id int,
@rolling_total decimal(28,2)
open trans_cur
fetch next from trans_cur
into @account_id, @row_no, @id, @amount,
@salesperson_id, @tran_date, @clear_date, @transaction_types_id
begin tran
while @@fetch_status = 0
begin
if @row_no = 1 -- initialise on each partition
set @rolling_total = @amount
else
set @rolling_total = @rolling_total + @amount
insert #Trans ( account_id, id, amount, rolling_total, salesperson_id, tran_date, clear_date, transaction_types_id )
values( @account_id, @id, @amount, @rolling_total, @salesperson_id, @tran_date, @clear_date, @transaction_types_id )
fetch next from trans_cur
into @account_id, @row_no, @id, @amount,
@salesperson_id, @tran_date, @clear_date, @transaction_types_id
end
deallocate trans_cur
commit tran
select id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, amount, rolling_total
from #Trans
order by account_id, tran_date, id
drop table #Trans
go
with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )
as (
select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id
from Transactions
where account_id between 1000 and 5000
)
select t1.*,
running_amount = coalesce(
( select sum( amount )
from Trans t2
where t2.row_no < t1.row_no
and t2.account_id = t1.account_id ), 0 )
+ amount
from Trans t1
order by account_id, row_no
go
The set up script....
drop table accounts_active
drop table accounts_inactive
drop table accounts
drop table transactions
drop table account_statuses
drop table transaction_types
create table account_statuses (
id int not null identity constraint pk_account_statuses primary key clustered,
status_code char(50) not null constraint uk_accounts_active_status_code unique,
status_desc varchar(50) not null constraint uk_accounts_active_status_desc unique
)
insert account_statuses ( status_code, status_desc ) values( 'ONHLD', 'On Hold' )
insert account_statuses ( status_code, status_desc ) values( 'ACTIV', 'Active' )
insert account_statuses ( status_code, status_desc ) values( 'LAP3M', 'Lapsed 3 months' )
insert account_statuses ( status_code, status_desc ) values( 'LAP6M', 'Lapsed 6 months' )
insert account_statuses ( status_code, status_desc ) values( 'BADCR', 'Bad Credit' )
insert account_statuses ( status_code, status_desc ) values( 'LAPSD', 'Lapsed >12 months' )
insert account_statuses ( status_code, status_desc ) values( 'FRAUD', 'Fraud' )
go
create table transaction_types (
id int not null identity constraint pk_transaction_types primary key clustered,
transaction_type_code char(50) not null constraint uk_transaction_type_status_code unique,
transaction_type_desc varchar(50) not null constraint uk_transaction_type_status_desc unique,
sign_amount smallint not null check ( sign_amount in ( -1, 1 ) )
)
insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYCC', 'Pay by Credit Card', 1 )
insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYCH', 'Pay by Cheque', 1 )
insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYMY', 'Pay by Cash', 1 )
insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'REFUD', 'Refund', -1 )
insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'ADJST', 'Adjustment', -1 )
go
create table accounts (
id int not null identity constraint pk_accounts primary key clustered,
first_name varchar(50) not null,
last_name varchar(50) not null,
dob smalldatetime not null
constraint ck_accounts_dob check( dob = cast( convert( char(8), dob, 112 ) as smalldatetime ) ),
account_no varchar(50) not null constraint uk_accounts unique
)
go
insert accounts( first_name, last_name, dob, account_no )
select FirstName, LastName, dateadd( month, floor( rand(cast(newid() as varbinary)) * 100 ), '19701004' ) , right( '0000000000' + cast( contactid as varchar(50) ), 10 )
from AdventureWorks.Person.Contact
create table accounts_active (
account_id int not null references accounts( id ),
account_statuses_id int not null references account_statuses( id ),
constraint pk_accounts_active primary key clustered( account_id )
)
go
create table accounts_inactive (
account_id int not null references accounts( id ),
account_statuses_id int not null references account_statuses ( id ),
constraint pk_accounts_inactive primary key clustered( account_id )
)
go
select top 10000 id, 1
from accounts
order by newid()
union all
select top 1000 id, 2
from account_statuses
where status_code in ( 'ONHLD', 'ACTIV' )
order by newid()
insert accounts_active ( account_id, account_statuses_id )
select *
from (
select top 10000 id, s = 1
from accounts
order by newid() ) as d
insert accounts_active ( account_id, account_statuses_id )
select id, s
from (
select top 1000 id, s = 2
from accounts a
where id not in ( select account_id from accounts_active )
order by newid() ) as d
insert accounts_inactive ( account_id, account_statuses_id )
select id, s
from (
select top 1000 id, s = 6
from accounts a
where id not in ( select account_id from accounts_active )
and id not in ( select account_id from accounts_inactive )
order by newid() ) as d
create table individual(
id int not null identity constraint pk_individual primary key clustered,
first_name varchar(50) not null,
last_name varchar(50) not null
)
insert individual( first_name, last_name )
select FirstName, LastName
from AdventureWorks.Person.Contact
drop table sales_commission
drop function dbo.fn_check_sales_commission_window
go
create function dbo.fn_check_sales_commission_window (
@sales_commission_id int,
@individual_id int,
@comm_start_date smalldatetime,
@comm_end_date smalldatetime )
returns varchar(3)
as
begin
declare @status varchar(3)
if exists (
select *
from sales_commission
where individual_id = @individual_id
and id <> @sales_commission_id
and ( @comm_start_date between comm_start_date and coalesce( comm_end_date, @comm_start_date )
or @comm_end_date between comm_start_date and coalesce( comm_end_date, @comm_end_date ) )
)
set @status = 'BAD'
else
set @status = 'OK'
return @status
end
go
create table sales_commission (
id int not null identity constraint pk_sales_commission primary key clustered,
individual_id int not null references individual( id ),
comm_rate decimal( 5, 2 ) not null,
comm_start_date smalldatetime not null check( comm_start_date = cast( convert( char(8), comm_start_date, 112 ) as smalldatetime ) ),
constraint ck_sales_commission_window check( dbo.fn_check_sales_commission_window( id, individual_id, comm_start_date, comm_end_date ) = 'OK' ),
comm_end_date smalldatetime null check( comm_end_date is null or comm_end_date = cast( convert( char(8), comm_end_date, 112 ) as smalldatetime ) ),
constraint uk_sales_commission unique ( individual_id, comm_start_date ), -- really do RI in a trigger to prevent window overlaps.
)
go
insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )
select top 50 id, 20, '20070410', '20080110'
from individual
insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )
select top 50 id, 20, '20080111', '20080210'
from individual
insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )
select top 50 id, 10, '20080211', null
from individual
select * from sales_commission order by comm_start_date, comm_end_date
create table transactions (
id int not null identity constraint pk_transactions primary key clustered,
salesperson_id int not null references individual( id ),
tran_date smalldatetime not null
constraint ck_transactions_tran_date check( tran_date = cast( convert( char(8), tran_date, 112 ) as smalldatetime ) ), -- Force date only
clear_datesmalldatetime not null
constraint ck_transactions_clear_date check( clear_date = cast( convert( char(8), clear_date, 112 ) as smalldatetime ) ), -- Force date only
amount decimal(28,2) not null check( amount > 0 ), -- adjustments done via tran_type
transaction_types_id int not null references transaction_types( id ),
account_id int not null references accounts ( id )
)
go
print 105 % 100
set nocount on
declare @i int
set @i = 1
declare @dt smalldatetime
begin tran
while @i <= 4000000
begin
set @dt = cast( convert( char(8), dateadd( day, -floor( rand() * 100 ), current_timestamp ), 112 ) as smalldatetime )
if @i % 100 = 0
begin
commit tran
begin tran
end
insert transactions (
salesperson_id,
tran_date,
clear_date,
amount,
transaction_types_id,
account_id )
select ( select top 1 individual_id from sales_commission order by newid() ),
@dt,
dateadd( day, 14, @dt ),
( select top 1 id from sales_commission order by newid() ),
( select top 1 id from transaction_types order by newid() ),
( select top 1 id from accounts order by newid() )
set @i = @i + 1
end
commit tran
February 4, 2010 at 6:43 am
Tony, I know you're busy - hey we all are, but I have to say (from a quick scan of your script) that it doesn't seem to feature any of:
1. Hugo's set-based iteration method
2. SQLCLR (procedure or TVF)
3. Quirky Update
My contention was (and is) that those three methods are all superior to a cursor. Yes, there are even slower ways (triangular joins, recursive CTEs) but that doesn't make the cursor good or cool 😛
I know you have issues with ranking functions at the moment (which I also happen to disagree with you about by the way) but leaving that aside for a moment, if your main problems with the Quirky Update are (a) the lack of a guarantee on update order, and (b) the fact that the behaviour might change at any moment, let me just say:
1. This undocumented method has worked for at least ten years in all versions, service packs, cumulative updates, and what-have-you.
2. Yes, you would need to test that the latest release hasn't broken it. Don't we usually do that anyway?
3. Documented features have a history of breaking too.
4. The increase in performance may be worth the small extra new-version testing that is required, for some folks.
5. There is no law requiring people to use the method.
Anyway, onto the thing about the ranking function. If you would feel happier with a slight tweak to the method that throws an error if a row tries to be quirky-updated out of sequence, consider this:
DECLARE @PrevAccountID INT;
DECLARE @AccountRunningTotal MONEY;
DECLARE @AccountRunningCount INT;
DECLARE @Sequence INT;
SET @Sequence = 0;
BEGIN TRAN;
WITH Source
AS (
SELECT Sequence = ROW_NUMBER() OVER (ORDER BY TD.AccountID ASC, TD.Date ASC, TD.TransactionDetailID ASC),
TD.AccountID,
TD.AccountRunningCount,
TD.AccountRunningTotal,
TD.Amount,
TD.Date,
TD.NCID,
TD.TransactionDetailID
FROM dbo.TransactionDetail TD WITH (INDEX(0))
)
UPDATE Source WITH (TABLOCKX)
SET @AccountRunningTotal = AccountRunningTotal =
CASE
WHEN AccountID = @PrevAccountID
THEN @AccountRunningTotal+Amount
ELSE Amount
END,
@AccountRunningCount = AccountRunningCount =
CASE
WHEN AccountID = @PrevAccountID
THEN @AccountRunningCount + 1
ELSE 1
END,
@Sequence =
CASE
WHEN Sequence = @Sequence + 1 THEN Sequence -- row is in the expected sequence
ELSE 1/0 -- produce an error
END,
@PrevAccountID = AccountID
OPTION (MAXDOP 1);
SELECT *
FROM dbo.TransactionDetail
ORDER BY
AccountID, Date, TransactionDetailID
ROLLBACK;
If you run that (on the test rig in the article) you will see that all that is added is a Segment and Sequence Project for the ROW_NUMBER() - no sort is required of course. The majority of the execution time is the rollback 🙂
The truly paranoid reader could extend this basic method using the OUTPUT clause of the UPDATE statement to write the PK, sequence, and running totals to another table. Only if no error was produced would the reader go on to change the data. Something along those lines anyway - I lack the time currently for a full implementation, and lack sufficient paranoia...
Someone is also bound to notice that the above method also works on heaps - in fact on any table that either (a) has a clustered index in the Quirky Update order; or (b) is a heap. Non-clustered indexes make no never mind.
Paul
February 4, 2010 at 6:52 am
Paul White (2/4/2010)
1. This undocumented method has worked for at least ten years in all versions, service packs, cumulative updates, and what-have-you.2. Yes, you would need to test that the latest release hasn't broken it. Don't we usually do that anyway?
3. Documented features have a history of breaking too.
4. The increase in performance may be worth the small extra new-version testing that is required, for some folks.
5. There is no law requiring people to use the method.
Point 1 - I'm sure all the people that got caught out with TOP 100 PERCENT in a VIEW would disagree; and yes - I know the feature has been there for a long time - I can't remember when I first came across it - probably 6.5; not sure it was in 4.21, possibly 6.0; TOP 100 PERCENT in a VIEW was introduced in 7.0?? It's something like that..
Point 2 - No, the majority of people do not do a full system test before upgrading to the next service pack or cu. That is a reality out in the field; the majority of people are not huge enterprise companies that can justify that resource and expense.
Point 3 - Name a couple that were'nt subsequently fixed.
Point 4 - doh
Point 5 - Yes, there is a law - its the law of practicalities; there are many many folk out there that are not full time database developers or administrators and will see and just implement the "feature" rather than doing it properly.
Anyway - I think we are at a point of complete disagreement on this.
I'm on the side of stability, supportability and maintainability - with this approach, are you honestly arguing that this undocumented feature is the correct method to use?
Seriously now - not just to pick a side that disagrees with my point.
Tony.
February 4, 2010 at 6:54 am
Paul White (2/4/2010)
Hi Hugo,You overlooked my post which addressed the 'problem'. It was quite long, so I'm quite shocked that you missed it 😉
Guilty as charged 😉 I didn't have the time to read all the new posts in this thread, so I mainly focused on finding Jeff's post, and then quickscanning those posts to see if they addressed my point.
I did not repeat the testruns, but I have no doubt that the INDEX(0) hint will solve the issue, just as you suggested. However, I will maintain that this actually enforces my objections, rather than weakening it.
The first time Jeff wrote about this issue, the list of rules for using the "quirky update" was rather short - I believe he had three or four points on them at that time (and I believe that the INDEX(0) hint was still included then). That resulted in a discussion of proportions even bigger than this one, with several people posting code examples that broke his code, all without breaking any of the rules that were valid then.
Now, Jeff has rewritten his article, and the list of rules has grown to 10 points. Okay, 9 if you don't count the "TEST" rule (which is valid for ALL code, not just the quirky update). Each of the new rules was apparently designed to remove one or more of the counter-examples that were posted to his original article. And yet, the method is still not fully reliable, as I have demonstrated by posting my code.
Your reaction is to impose yet another rule, to counter my code. But I fail to see how this would increase anyone's faith in this method - a method that can only keep up it's claim of reliability by adding more and more rules as counterexamples are found. I could try to find another example to break it, even with the INDEX(0) hint. I might find one. Maybe now,, maybe only when the next service pack is released, who knows. And if I did, I have no doubt that you, or Jeff, or any other proponent of this method, will add yet another rule to the list. A list that, in my opinion, could just as easily be paraphrased and shortened as a single rule: "Unless it fails (which might happen under various circumstances, not all of which have been identified yet)". And indeed - a claim that this method works reliable "unless it fails" is one that I cannot contradict.
So, if you want to build your production systems on a method that "works unless it doesn't", go ahead. But if you, or even Jeff, a SQL author who deservedly has a lot of respect in the SQL Server community, endorses it, then I think you are leading people with a lesser understanding into very dangerous territory. And I agree with Tony's comments that there should be a warning, in big red flashy letters, at the start of this article: "This method might fail under unexpected circumstances".
Viewing 15 posts - 106 through 120 (of 307 total)
You must be logged in to reply to this topic. Login to reply