Updating only changed columns

Question

Updating only changed columns

Nobody Real

SSC Veteran

Points: 213
More actions
April 7, 2008 at 2:39 am

#184261

Hi, I've been doing SQL development for years, but never really became an 'expert' in it, if you know what I mean.
Recently, I've come across a small problem and I was hoping to get some feedback on possible ways to deal with it.
I have a pre-existing database that includes some triggers that update certain fields if other fields change. The application i'm working on needs to copy data from an XML file into several tables. The problem is that if I set a field to the exact same value, the trigger still thinks the data changed (yes, it was poorly writen and I don't have control over it).
The only solution I can think of is to read each record, compare the fields to see which field changes, then build a custom SQL string to only update the changed fields. This, however, seems a) like a lot of work, b) defeats many of the optimization features of SQL Server, and c) creates the possibility of more errors.
Can anyone think of a better solution?
Example:
Table1
int id,
datetime recordtime,
char datetimechanged,
int foo,
char foochanged,
int bar,
char barchanged
Now, when the datetime changes (or foo, or bar), the trigger sets the changed flag. So when I update foo or bar (or both) I only want to update the fields that actually changed. However, I can't know that without first reading the record and comparing the values, then building a custom SQL string based on what has and hasn't changed.
The actual table is signficantly larger, so simply hard coding variations is impractical.
Is there anyway to build a SQL Update that can test the fields during the update and not update them if they are the same, but still update fields that have changed? Is there a common teqnique to do this?
I'm thinking something along the lines of:
UPDATE Table1 SET recordtime=@rtime, foo=@foo, bar=@bar WHERE DATEDIFF(second, recordtime, @rtime) > 0 OR foo <> @Foo OR bar <> @bar
However, that statement will update all 3 if any of them are different.
Any ideas?
Thanks.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Andras Belokosztolszki SSC-Insane Points: 22031 More actions · Answer 1

The cost of updating a column to the same value is not as bad as it sounds. You could check the old and new value yourself, but so does SQL Server. When you modify one or more columns to the same values (and none of these columns are part of the clustered key), SQL Server detects this, and will not even write a log record to the transaction log). (only if a clustered column was part of the update statement (in which case the whole row is dropped, and a new one is inserted, even if all the columns were the same, i.e. the update has not changed anything).

Regards,

Andras

Andras Belokosztolszki, MCPD, PhD
GoldenGate Software

Nobody Real SSC Veteran Points: 213 More actions · Answer 2

Thanks for your response, but perhaps i wasn't clear enough. My problem is not the cost of updating a column to the same value, but rather that the trigger doesn't check to see if the value is the same, and thus sets the "changed" flag in the record. I have no control over this and cannot change the trigger.

In other words, the trigger sets the changed field even if I update the field with the same value. I do not want this.

GSquared SSC Guru Points: 260824 More actions · Answer 3

The way to do this is something on this order:

create trigger Updates on dbo.Table

after update

as

if update(Col1) or update(Col2) or update(Col3)

update dbo.Table

set Changed = 1

from inserted

inner join deleted

on inserted.id = deleted.id

where(inserted.col1 != deleted.col1

or

inserted.col2 != deleted.col2

or

inserted.col3 != deleted.col3)

and table1.id = inserted.id

Plug in your trigger name, table name, and column names, and you're good to go.

(Personally, I don't like using this kind of logging, but if you need it, this should do the job.)

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Nobody Real SSC Veteran Points: 213 More actions · Answer 4

Thanks. I am aware of how to write the trigger, that's not my problem. As I said, I cannot change the existing trigger, so i'm looking for a way to get around the problem in my update code.

GSquared SSC Guru Points: 260824 More actions · Answer 5

Since you can't change the trigger, the only thing I can I can think of is breaking up the update code into batches.

if (select Col1 from dbo.Table where id = @id) != @Col1_variable

update dbo.Table set Col1 = @Col1

if (select Col2 from dbo.Table where id = @id) != @Col2_variable

update dbo.Table set Col2 = @Col2

It'll be slow, and probably isn't worth the end result, but I can't think of any other way to do this.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

GSquared (4/7/2008)
Since you can't change the trigger, the only thing I can I can think of is breaking up the update code into batches.
if (select Col1 from dbo.Table where id = @id) != @Col1_variable
update dbo.Table set Col1 = @Col1
if (select Col2 from dbo.Table where id = @id) != @Col2_variable
update dbo.Table set Col2 = @Col2
It'll be slow, and probably isn't worth the end result, but I can't think of any other way to do this.

update dbo.Table set

Col1 = @Col1_Variable

where

Col1 <> @Col1_Variable

and id = @id

update dbo.Table set

Col2 = @Col2_Variable

where

Col2 <> @Col2_Variable

and id = @id

That is how I would code these updates instead of using an if then construct.

😎

GSquared SSC Guru Points: 260824 More actions · Answer 7

Yeah, Lynn's solution is better. One less table lookup per column-variable pair.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Jan Wagner-460740 SSC Eights! Points: 963 More actions · Answer 8

Hi,

I need to import 900k rows with 210 columns each out of a text file. As this file contains the whole data, not only changed data I need to check wich rows I need to update. This is the way I do on my production since > 2 years:

1. Loading the text file into a IO table which has the same schema as my destination table.

2. IO and destination tables are both having a persisted calculated column that contains the hashbytes() value of all columns I need to check for updates, lets say it name is [check].

3. To perform an update on the destination table I just use an update statement which uses an inner join with WHERE io.[check] <> destination.[check].

All updates are done in one transaction, and only columns are updated that really changed. In earlier versions I had all columns in my WHERE clause, which caused in very poor performance.

Just my two cents.

Regards,

Jan

Nobody Real SSC Veteran Points: 213 More actions · Answer 9

That's an interestings solution. May I ask how you calculate your hash? That is, what algorithms do you use, and how do you do it?

Jan Wagner-460740 SSC Eights! Points: 963 More actions · Answer 10

Hi,

I'm using the hashbytes function of SQL Server 2005, this is a example:

hashbytes('md5', isnull([column_1], '') + isnull([column_2], '') + ...)

There are a few things you need to know to get this work:

If you are using datetime columns, convert them to nvarchar using a style which does not contains spaces, e.g: isnull(CONVERT([nvarchar],[MATURITY],(126)),'') If you dont do, you can not create the column as persisted, took me 2 days to find that out.

Convert every column which is not nvarchar into nvarchar.

Always replace NULL values with an empty string, as anything + NULL always is NULL.

Best Regards,

Jan

Jacob Luebbers SSCarpal Tunnel Points: 4864 More actions · Answer 11

D'OH! Jan already posted while I composed my reply - ignore this....

Check out either CHECKSUM() or HashBytes() in BOL. CHECKSUM's been around for ages and computes a CRC (CRC32 I think, or maybe a true checksum) over the whole row or a set of columns/expressions. HashBytes() is new in SQL2005 and computes a hash using your choice of algorithm (MD2, MD4, MD5, SHA1 but not SHA256 unfortunately) using a varchar or varbinary input. To compute it over multiple columns you'd need to do something like concatenate their values together as a varchar/varbinary and feed it to HashBytes().

I'm sure Jan can tell you the exact approach used...

Regards,

Jacob

Jan Wagner-460740 SSC Eights! Points: 963 More actions · Answer 12

I tried using checksum at first, but this doesn't work. In some cases checksum for two rows is the same although they differ in one column. This happenend on my side causing a lot of confusion (checksum was built over 200 columns). I think MS describes this also on bol or msdn.

Best Regards,

Jan

Jacob Luebbers SSCarpal Tunnel Points: 4864 More actions · Answer 13

The same is true (though to a lesser extent) with any of the algorithms supported by HashBytes(). There is no guarantee that two of your rows that have the same HashBytes() value are the same - it's just that the chances of it happening are very small, especially with the newer algorithms with longer hash lengths like the SHA family. You're not relying on the hash to provide guarantee that the row hasn't changed though are you? Eg you're only using it to minimise the amount of rows that need to be manually compared column-to-column... so a hash collision is not dangerous to your validation process, rather simply causes a more thorough check to be done, right?

Regards,

Jacob

Robert-378556 SSCertifiable Points: 5917 More actions · Answer 14

Is there anyway to build a SQL Update that can test the fields during the update and not update them if they are the same, but still update fields that have changed? Is there a common teqnique to do this?

Of course, but you have to build the update statement based on changed values:

for each field in record do

if oldvalue<>newvalue begin

add 'field=newvalue' to SET clause

add 'field=oldvalue' to WHERE clause

end;

You have to add primary key to where clause too.

How to do it exactly depends on front end tool, language and components used.