October 28, 2009 at 6:35 pm
I was surprised recently when a query that updates a record from a recordset with multiple rows succeeded without giving an error and seemed to arbitrarily select the row used in the update.
If you attempt to update a record from a subquery expression that returns multiple rows, you get an error, as you would expect. When the multiple rows are from a joined subquery however (using the FROM clause of the update statement), the update succeeds and one of the rows is used for the update.
My expectation was that both updates would fail. I'm pretty certain that some other database engines would fail in both situations (e.g. Ingres?). Does anyone have an explanation for why the update succeeds and which row gets used for the update?
Here is some code to demonstrate:
-- Create tables.
if exists (select name from [dbo].sysobjects
where name = 'temp_account' and type = 'U')
drop table temp_account
if exists (select name from [dbo].sysobjects
where name = 'temp_transaction' and type = 'U')
drop table temp_transaction
create table temp_account
(account_key varchar(10) not null,
account_name varchar(40) null,
balance01 money null,
balance02 money null)
create table temp_transaction
(tid integer identity (1,1) not null,
account_key varchar(10) not null,
trans_type varchar(1) not null,
trans_amount money not null)
-- Test data.
insert into temp_account
(account_key,
account_name,
balance01,
balance02)
values ('TEST001','Test Account 1',0,0)
insert into temp_account
(account_key,
account_name,
balance01,
balance02)
values ('TEST002','Test Account 2',0,0)
insert into temp_transaction
(account_key,
trans_type,
trans_amount
)
values ('TEST001','P',120)
insert into temp_transaction
(account_key,
trans_type,
trans_amount
)
values ('TEST001','P',222)
insert into temp_transaction
(account_key,
trans_type,
trans_amount
)
values ('TEST001','R',-40)
-- Update single record from multiple rows - succeeds and arbitrarily selects row to use in update.
-- Balance is set to 342 which is sum for P transaction type. R transactions are ignored.
-- Or is balance first updated by R transactions, then P (row-by-row)?
update temp_account
set balance01 = ta.balance01 + rs1.trans_amount
from temp_account as ta inner join
(select tt.account_key,
tt.trans_type,
sum(tt.trans_amount) as trans_amount
from temp_transaction as tt
where tt.trans_type in ('P','R')
group by tt.account_key, tt.trans_type
) as rs1
on ta.account_key = rs1.account_key
select *
from temp_account
-- Update from multiple rows - fails with error message, as expected.
-- Msg 512, Level 16, State 1, Line 1
-- Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
update temp_account
set balance02 = (select sum(tt.trans_amount) as trans_amount
from temp_transaction as tt
where tt.trans_type in ('P','R')
and tt.account_key = ta.account_key
group by tt.trans_type)
from temp_account as ta
select *
from temp_account
October 28, 2009 at 9:05 pm
The problem is in your code. Please note the changes I have made in both versions.
update temp_account set
balance01 = ta.balance01 + rs1.trans_amount
from
temp_account as ta inner join
(select
tt.account_key,
-- tt.trans_type,
sum(tt.trans_amount) as trans_amount
from
temp_transaction as tt
where
tt.trans_type in ('P','R')
group by
tt.account_key
) as rs1
on ta.account_key = rs1.account_key
select *
from temp_account
update temp_account set
balance02 = isnull(
( select
sum(tt.trans_amount) as trans_amount
from
temp_transaction as tt
where
tt.trans_type in ('P','R')
and tt.account_key = ta.account_key
group by
tt.account_key), 0)
from
temp_account as ta
select *
from temp_account
October 28, 2009 at 9:11 pm
If you attempt to update a record from a subquery expression that returns multiple rows, you get an error, as you would expect.
The situation you are describing here is one in which the subquery is being used as an expression. That is to say it is being used in a context where only a single value is expected to be returned, as if from another column, a function, a formula, a CASE expression, etc.
UPDATE OrderLog
SET OrderTotal = (select sum(amt) from OrderDetail where OrderID = 1075)
-- not -- SET OrderTotal = (select sum(amt), count(*) from OrderDetail where OrderID = 1075)
-- not -- SET OrderTotal = (select amt from OrderDetail where OrderID = 1075)
-- (in situations where any orders would have multiple amts)
WHERE OrderID = 1075
In this context, receiving multiple values from an expression is unacceptable, and the error results. An error would result if it were multiple columns being returned as well as multiple rows.
However, in the second case there are multiple rows in the input set. But the target column is only being set to a single value. Looking at the example below, it appears that SQL is arbitrarily deciding which row to get that single value from. I can't cite authority for this, but I would bet money that it is simply using the first valid source value it finds. This makes sense if you think about it, because waiting for the rest of the valid rows in the set would be just as arbitrary and would waste resources.
declare @Target table (rowID int identity(1,1), data varchar(100))
insert into @target
select 'Apples'
declare @Source table (data varchar(100))
insert into @Source
select 'Oranges' union all
select 'Bananas' union all
select 'Watermelons'
select 'Before' as [@source],* from @source
select 'Before' as [@target],* from @target
update @target
set data = src.data
output deleted.data as [from], inserted.data as [to]
from @target t
cross join @source src
where t.rowID =1
select 'After' as [@target],* from @target
It is the developer's responsibility to ensure that the correct value is used to update, and I can sympathize with your desire for an error message to keep unexpected multiple rows out. But this is the way it works.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 28, 2009 at 9:22 pm
Lynn Pettis (10/28/2009)
The problem is in your code. Please note the changes I have made in both versions.
Hi Lynn, and thanks. Perhaps I wasn't clear about my purpose. I know the code is wrong and what the problems are. I was just surprised that any of the update statements in my example code work. I thought they would both cause an error and perhaps SQL Server is mis-behaving. The results of the update that works seem arbitrary - something that is unusual, in my experience, with SQL programming and set theory.
October 28, 2009 at 9:32 pm
It is the developer's responsibility to ensure that the correct value is used to update, and I can sympathize with your desire for an error message to keep unexpected multiple rows out. But this is the way it works.
Yes, I think you are right, it is taking the first value it comes across, which is surprising to me. Thanks for your response.
October 28, 2009 at 9:52 pm
You're welcome. Best of luck to you, Glen.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 28, 2009 at 10:07 pm
Glen Harland (10/28/2009)
It is the developer's responsibility to ensure that the correct value is used to update, and I can sympathize with your desire for an error message to keep unexpected multiple rows out. But this is the way it works.
Yes, I think you are right, it is taking the first value it comes across, which is surprising to me. Thanks for your response.
As with many other such subtle points, it IS unfortunately documented, but rather poorly. If you read through BOL's description of the UPDATE statement, some where buried in the middle is a statement to the effect that "if I row is going to be updated as a result of a one to many join, the update occurs based on the first row it encounters, and that this row is then locked and cannot be updated any further" by that same update statement.
I know that doesn't help your current conundrum.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 28, 2009 at 10:23 pm
Thanks for everyone's comments.
Just to illustrate that the SQL Server behaviour is not necessarily obvious ...
With the test data I used above, the following SQL Server query results in 120 (first row found):
UPDATE temp_account
SET temp_account.balance01 = temp_account.balance01+temp_transaction.trans_amount
from temp_account INNER JOIN temp_transaction
ON temp_account.account_key=temp_transaction.account_key
The same query in Sybase ASA results in 302 - the sum of the three rows.
In Access 2003, the following equivalent query also results in 302 (sum of the three rows):
UPDATE temp_account INNER JOIN temp_transaction
ON temp_account.account_key=temp_transaction.account_key
SET temp_account.balance01 = temp_account.balance01+temp_transaction.trans_amount;
So SQL Server takes the first row, Sybase ASA and Access 2003 add the rows up, and from memory, Ingres reports an error (Ingres circa 1998 - I can't verify this since I don't have access to the software).
Is SQL Server the only database that arbitrarily takes the first row? Is there a standard? Does it matter? 😉
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply