Semantics of simple-seeming UPDATE statement with FROM clause

Question

Semantics of simple-seeming UPDATE statement with FROM clause

Mark Dalley

Hall of Fame

Points: 3010
More actions
January 26, 2017 at 7:42 am

#317845

Dear Experts
I am very unhappy with my current understanding of the UPDATE T-SQL statemet in SQL Server. Specifically, there are things about the FROM clause that I don't get.
OK, so UPDATE with FROM is used when I want to update one table (e.g Table1) using the values found in some table expression generated by the FROM clause. This expresion may be a table itself, or it may be some query that returns rows. If I have a single column (strL say) whose values I wish to update, I put it in the SET clause, for instance SET strL=Table2.strF, where Table2.strF is a column in the FROM clause. Hopefully all obvious so far.
My question is this: how does SQL Server decide which row returned by the FROM clause is used to
update the row in the destination table?
The answer to the question often *seems* obvious because the table expression in the FROM clause involves
the table whose name appears after the UPDATE, and the table expression involves an inner join on a primary key.
However, I haven't yet found an explicit statement (in BOL or anywhere else) stating what the determining factor or
factors are.
Partly to assure myself that I wasn't missing something obvious, I set up the following:IF OBJECT_ID ('dbo.Table1', 'U') IS NOT NULL DROP TABLE dbo.Table1; GO IF OBJECT_ID ('dbo.Table2', 'U') IS NOT NULL DROP TABLE dbo.Table2; GO CREATE TABLE dbo.Table1 (ColSeq int NOT NULL, strL varchar(1) NOT NULL); GO CREATE TABLE dbo.Table2 (Forename varchar(20) PRIMARY KEY NOT NULL, strF varchar(1) NOT NULL); GO INSERT INTO dbo.Table1 VALUES(1, 'A'), (2, 'B'), (3,'C'), (4, 'D'), (5, 'E'), (6,'F'); INSERT INTO dbo.Table2 VALUES('Will','W'), ('Sam','S'), ('Ned','N'), ('Tom','T'), ('John','J'), ('Gary','G'); GO select * from dbo.Table1; go select * from dbo.Table2 go
The contents of the new tables were exactly as expected:
ColSeq    strL
1    A
2    B
3    C
4    D
5    E
6    F
Forename    strF
Gary    G
John    J
Ned    N
Sam    S
Tom    T
Will    W
The point here is that there is no particular relationship between the two tables, though they do have the same number of rows.
I then executed the statement:
UPDATE dbo.Table1 SET strL = dbo.Table2.strF FROM dbo.Table2; GO SELECT colSeq, strL FROM Table1 GO
The result was:
ColSeq    strL
1    G
2    G
3    G
4    G
5    G
6    G
I have no good explanation for this - unless it was the SQL Server was doing the minimum amount of work.
So - what are the factors that determine how the table and the table expression are linked? I have to have it
spelled out - I don't want to be performing inadvertant garbage updates because of a misunderstanding.
Yours, seeking enlightenment
MarkD

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Ed Wagner SSC Guru Points: 287026 More actions · Answer 1

What SQL did was to take one row from the source table and updated the target with the value. Take a look at your query plan and look at your Nested Loops operator. The little red circle indicates that there's a warning there. In this case, it's warning you that there's no join predicate. Without a relationship between the two, SQL can't match the rows from the two tables.

What is it that you wanted to see happen? I'm just guessing here, but if you wanted each row in dbo.Table1 to be updated with a different value from dbo.Table2, you have to define some kind of relationship between them. Here's an example of using a row number (based on no particular sort order) to join them.

WITH cteTarget AS ( SELECT strL, RN = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM dbo.Table1 ), cteSource AS ( SELECT strF, RN = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM dbo.Table2 ) UPDATE cteTarget SET strL = s.strF FROM cteSource s WHERE cteTarget.RN = s.RN;

Like I said, this is only a guess, but my real hope is that it helps illustrate the point.

BTW, my sincerest compliments on explaining the problem and providing a great script to reproduce the it.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Thom A SSC Guru Points: 99143 More actions · Answer 2

Your result is to be expected, as you're not defining any JOINs. SQL needs to now what records are related to what, otherwise it has no idea.

It this case, you're attempting to set a single value (the value strL) to a range of value from strF. In this case SQL will take the first value that it evaluates, which happens to be G in your example. As it is doing this process to every row, every row gets the value G.

This is why SQL is call an RDMS (Relational database management system). The key is that the data is RELATIONAL. As your data above has no relation, then it defeats the object.

If you were to turn your data into something relational, it might look something like this:

IF OBJECT_ID ('dbo.Table1', 'U') IS NOT NULL DROP TABLE dbo.Table1; GO IF OBJECT_ID ('dbo.Table2', 'U') IS NOT NULL DROP TABLE dbo.Table2; GO CREATE TABLE dbo.Table1 (T1ID INT IDENTITY(1,1), ColSeq int NOT NULL, strL varchar(1) NOT NULL); GO CREATE TABLE dbo.Table2 (T2ID INT IDENTITY(1,1), T1ID INT, Forename varchar(20) PRIMARY KEY NOT NULL, strF varchar(1) NOT NULL); GO INSERT INTO dbo.Table1 (ColSeq, strL) VALUES(1, 'A'), (2, 'B'), (3,'C'), (4, 'D'), (5, 'E'), (6,'F'); GO

INSERT INTO dbo.Table2 (T1ID, Forename, strF) VALUES(6, 'Will','W'), (5, 'Sam','S'), (4, 'Ned','N'), (3, 'Tom','T'), (2, 'John','J'), (1, 'Gary','G'); GO

select * from dbo.Table1; go select * from dbo.Table2; go

UPDATE dbo.Table1 SET Table1.strL = T2.strF FROM dbo.Table2 T2 JOIN dbo.Table1 T1 ON T2.T1ID = T1.T1ID; GO

SELECT colSeq, strL FROM Table1; GO

Notice that I've now joined Table1 and Table2, using ID values. This is what a relationship means, as the data in one table is now related to data in another table.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

LinksUp SSCertifiable Points: 6527 More actions · Answer 3

Mark Dalley - Thursday, January 26, 2017 7:42 AM
My question is this: how does SQL Server decide which row returned by the FROM clause is used to
update the row in the destination table?
The point here is that there is no particular relationship between the two tables, though they do have the same number of rows.
I then executed the statement:
UPDATE dbo.Table1 SET strL = dbo.Table2.strF FROM dbo.Table2; GO SELECT colSeq, strL FROM Table1 GO
The result was:
ColSeq    strL
1    G
2    G
3    G
4    G
5    G
6    G
I have no good explanation for this - unless it was the SQL Server was doing the minimum amount of work.
So - what are the factors that determine how the table and the table expression are linked? I have to have it
spelled out - I don't want to be performing inadvertant garbage updates because of a misunderstanding.
MarkD

As you stated there is no particular relationship between the 2 tables. If there is no relationship, there is no way to to update any particular row. So in your test case scenario you told SQL to update all rows in Table1 with row(s) from Table2. Since there is no relationship, SQL takes just the first row from Table2 and assigns it to Table1. Try removing the primary key from Table2 and re-run you scripts. What do you get? A random row would have been just as correct as taking the first row.

As was stated, you have to define a relationship between the 2 tables, even if it is an artificial one, in order for the update to occur properly.

__________________________________________________________________________________________________________
How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

Mark Dalley Hall of Fame Points: 3010 More actions · Answer 4

Thanks guys for the really rapid responses.

I am well aware of relational concepts. My example contained two non-related tables on purpose,
as a step on the way to teasing out understanding of a concept. I admit that it wouldn't really make sense
to do this without defining some sort of relationship between them.

From what you are saying Thom, the result of the UPDATE above isn't really defined. It picked the first record by
primary key value, but could equally well have picked any other. Is that right?

If so, I find it a bit unsettling that no warning or error message was produced. UPDATE can do drastic stuff, and if
it is going to do something daft, I'd like a bit of notice.

MarkD

Neil Burton SSC-Insane Points: 23176 More actions · Answer 5

This is why SQL is call an RDMS (Relational database management system). The key is that the data is RELATIONAL. As your data above has no relation, then it defeats the object.

I don't mean to split hairs Thom but that's not technically why a Relational database is called Relational. A database relation is a set of tuples representing the same type of entity. The relation in this sense is a table of rows.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher

How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Mark Dalley Hall of Fame Points: 3010 More actions · Answer 6

BTW, I meant to post this in the SQL Server 2012 T-SQL forum - my apologies.

MarkD

Thom A SSC Guru Points: 99143 More actions · Answer 7

Mark Dalley - Thursday, January 26, 2017 8:26 AM
From what you are saying Thom, the result of the UPDATE above isn't really defined. It picked the first record by
primary key value, but could equally well have picked any other. Is that right?
If so, I find it a bit unsettling that no warning or error message was produced. UPDATE can do drastic stuff, and if
it is going to do something daft, I'd like a bit of notice.
MarkD

Effectively yes, it could be random. Your key just meant that SQL had a default ORDER is could choose from.

SQL isn't designed to hold your hand when you tell it to do things. If you tell it to update 1 Million rows with the same value, it'll do it, as you asked it to. In a production system, where updating millions of rows at a time is a day to day occurrence, it would be awful if SQL asked you everytime "Are you sure you want to do this?".

This is why DEV and TEST environments are so important. SQL doesn't question your actions, as it's "assumed" that the person has enough knowledge to know what they are doing, and have tested their code. If they make a mistake, it isn't SQL's fault it did what was asked of it.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Thom A SSC Guru Points: 99143 More actions · Answer 8

BWFC - Thursday, January 26, 2017 8:30 AM
This is part of the reason why SQL is call an RDMS (Relational database management system). The key is that the data is RELATIONAL. As your data above has no relation, then it defeats the object.
I don't mean to split hairs Thom but that's not technically why a Relational database is called Relational. A database relation is a set of tuples representing the same type of entity. The relation in this sense is a table of rows.

Would you prefer the above amendment? ^_^

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Neil Burton SSC-Insane Points: 23176 More actions · Answer 9

Thom A - Thursday, January 26, 2017 8:36 AM
BWFC - Thursday, January 26, 2017 8:30 AM
This is part of the reason why SQL is call an RDMS (Relational database management system). The key is that the data is RELATIONAL. As your data above has no relation, then it defeats the object.
I don't mean to split hairs Thom but that's not technically why a Relational database is called Relational. A database relation is a set of tuples representing the same type of entity. The relation in this sense is a table of rows.
Would you prefer to above amendment? ^_^

It's better, but if CELKO catches you, don't say you weren't warned 😀

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher

How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Thom A SSC Guru Points: 99143 More actions · Answer 10

BWFC - Thursday, January 26, 2017 8:40 AM
It's better, but if CELKO catches you, don't say you weren't warned 😀

Eh, I do sometimes enjoy his one man crusade posts 🙂

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Ed Wagner SSC Guru Points: 287026 More actions · Answer 11

Mark Dalley - Thursday, January 26, 2017 8:26 AM
Thanks guys for the really rapid responses.
I am well aware of relational concepts. My example contained two non-related tables on purpose,
as a step on the way to teasing out understanding of a concept. I admit that it wouldn't really make sense
to do this without defining some sort of relationship between them.
From what you are saying Thom, the result of the UPDATE above isn't really defined. It picked the first record by
primary key value, but could equally well have picked any other. Is that right?
If so, I find it a bit unsettling that no warning or error message was produced. UPDATE can do drastic stuff, and if
it is going to do something daft, I'd like a bit of notice.
MarkD

Your UPDATE statement was perfectly valid; it just didn't do what you thought it would do. This is completely normal - that's why we have development environments. 😉
It did generate a warning in the query plan, but I hope that SQL never throws an error based on what It thnks is right or wrong versus what it interprets you're trying to do. The optimizer isn't perfect.

BTW, Thom is exactly right - the order could be random., which could be a disaster.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Mark Dalley Hall of Fame Points: 3010 More actions · Answer 12

Thom A - Thursday, January 26, 2017 8:34 AM
Mark Dalley - Thursday, January 26, 2017 8:26 AM
From what you are saying Thom, the result of the UPDATE above isn't really defined. It picked the first record by
primary key value, but could equally well have picked any other. Is that right?
If so, I find it a bit unsettling that no warning or error message was produced. UPDATE can do drastic stuff, and if
it is going to do something daft, I'd like a bit of notice.
MarkD
Effectively yes, it could be random. Your key just meant that SQL had a default ORDER is could choose from.
SQL isn't designed to hold your hand when you tell it to do things. If you tell it to update 1 Million rows with the same value, it'll do it, as you asked it to. In a production system, where updating millions of rows at a time is a day to day occurrence, it would be awful if SQL asked you everytime "Are you sure you want to do this?".
This is why DEV and TEST environments are so important. SQL doesn't question your actions, as it's "assumed" that the person has enough knowledge to know what they are doing, and have tested their code. If they make a mistake, it isn't SQL's fault it did what was asked of it.

I realise that SQL Server doesn't hold my hand, ouch, yes. My point is, if I ask it to do something meaningless, it might at least come back and and say "What???" rather than just going ahead and doing *something* involving a potentially massive update of rows.

MarkD

Mark Dalley Hall of Fame Points: 3010 More actions · Answer 13

To sum up then, what people seem to be saying is that...

In order for the UPDATE to make sense, the table expression in the FROM must necessarily
involve the table whose name appears after the UPDATE keyword, and must be of such a form
that each record that is being updated relates to exactly one record in the table from which
the new value is being taken?

I know that BOL has an example where the result of the UPDATE is not actually defined, although - once again -
no error or warning is emitted.

Is that a fair description?

MarkD

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 14

Mark Dalley - Thursday, January 26, 2017 9:23 AM
To sum up then, what people seem to be saying is that...
In order for the UPDATE to make sense, the table expression in the FROM must necessarily
involve the table whose name appears after the UPDATE keyword, and must be of such a form
that each record that is being updated relates to exactly one record in the table from which
the new value is being taken?
I know that BOL has an example where the result of the UPDATE is not actually defined, although - once again -
no error or warning is emitted.
Is that a fair description?
MarkD

Mark

Yes. The UPDATE...FROM syntax can be dangerous if you unknowingly use it where there's a one-to-many relationship. As you have observed, there's no error message - it just chooses one of the "many" values to update the "one" value to. The two examples on this page are two ways of doing the same thing; the first will warn if there's more than one value for the update, while the second won't.

John