April 7, 2008 at 2:39 am
Hi, I've been doing SQL development for years, but never really became an 'expert' in it, if you know what I mean.
Recently, I've come across a small problem and I was hoping to get some feedback on possible ways to deal with it.
I have a pre-existing database that includes some triggers that update certain fields if other fields change. The application i'm working on needs to copy data from an XML file into several tables. The problem is that if I set a field to the exact same value, the trigger still thinks the data changed (yes, it was poorly writen and I don't have control over it).
The only solution I can think of is to read each record, compare the fields to see which field changes, then build a custom SQL string to only update the changed fields. This, however, seems a) like a lot of work, b) defeats many of the optimization features of SQL Server, and c) creates the possibility of more errors.
Can anyone think of a better solution?
Example:
Table1
int id,
datetime recordtime,
char datetimechanged,
int foo,
char foochanged,
int bar,
char barchanged
Now, when the datetime changes (or foo, or bar), the trigger sets the changed flag. So when I update foo or bar (or both) I only want to update the fields that actually changed. However, I can't know that without first reading the record and comparing the values, then building a custom SQL string based on what has and hasn't changed.
The actual table is signficantly larger, so simply hard coding variations is impractical.
Is there anyway to build a SQL Update that can test the fields during the update and not update them if they are the same, but still update fields that have changed? Is there a common teqnique to do this?
I'm thinking something along the lines of:
UPDATE Table1 SET recordtime=@rtime, foo=@foo, bar=@bar WHERE DATEDIFF(second, recordtime, @rtime) > 0 OR foo <> @Foo OR bar <> @bar
However, that statement will update all 3 if any of them are different.
Any ideas?
Thanks.
April 7, 2008 at 2:59 am
The cost of updating a column to the same value is not as bad as it sounds. You could check the old and new value yourself, but so does SQL Server. When you modify one or more columns to the same values (and none of these columns are part of the clustered key), SQL Server detects this, and will not even write a log record to the transaction log). (only if a clustered column was part of the update statement (in which case the whole row is dropped, and a new one is inserted, even if all the columns were the same, i.e. the update has not changed anything).
Regards,
Andras
April 7, 2008 at 3:35 am
Thanks for your response, but perhaps i wasn't clear enough. My problem is not the cost of updating a column to the same value, but rather that the trigger doesn't check to see if the value is the same, and thus sets the "changed" flag in the record. I have no control over this and cannot change the trigger.
In other words, the trigger sets the changed field even if I update the field with the same value. I do not want this.
April 7, 2008 at 11:51 am
The way to do this is something on this order:
create trigger Updates on dbo.Table
after update
as
if update(Col1) or update(Col2) or update(Col3)
update dbo.Table
set Changed = 1
from inserted
inner join deleted
on inserted.id = deleted.id
where(inserted.col1 != deleted.col1
or
inserted.col2 != deleted.col2
or
inserted.col3 != deleted.col3)
and table1.id = inserted.id
Plug in your trigger name, table name, and column names, and you're good to go.
(Personally, I don't like using this kind of logging, but if you need it, this should do the job.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 7, 2008 at 12:13 pm
Thanks. I am aware of how to write the trigger, that's not my problem. As I said, I cannot change the existing trigger, so i'm looking for a way to get around the problem in my update code.
April 7, 2008 at 12:17 pm
Since you can't change the trigger, the only thing I can I can think of is breaking up the update code into batches.
if (select Col1 from dbo.Table where id = @id) != @Col1_variable
update dbo.Table set Col1 = @Col1
if (select Col2 from dbo.Table where id = @id) != @Col2_variable
update dbo.Table set Col2 = @Col2
It'll be slow, and probably isn't worth the end result, but I can't think of any other way to do this.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 7, 2008 at 12:30 pm
GSquared (4/7/2008)
Since you can't change the trigger, the only thing I can I can think of is breaking up the update code into batches.
if (select Col1 from dbo.Table where id = @id) != @Col1_variable
update dbo.Table set Col1 = @Col1
if (select Col2 from dbo.Table where id = @id) != @Col2_variable
update dbo.Table set Col2 = @Col2
It'll be slow, and probably isn't worth the end result, but I can't think of any other way to do this.
update dbo.Table set
Col1 = @Col1_Variable
where
Col1 <> @Col1_Variable
and id = @id
update dbo.Table set
Col2 = @Col2_Variable
where
Col2 <> @Col2_Variable
and id = @id
That is how I would code these updates instead of using an if then construct.
😎
April 7, 2008 at 12:33 pm
Yeah, Lynn's solution is better. One less table lookup per column-variable pair.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 8, 2008 at 2:47 am
Hi,
I need to import 900k rows with 210 columns each out of a text file. As this file contains the whole data, not only changed data I need to check wich rows I need to update. This is the way I do on my production since > 2 years:
1. Loading the text file into a IO table which has the same schema as my destination table.
2. IO and destination tables are both having a persisted calculated column that contains the hashbytes() value of all columns I need to check for updates, lets say it name is [check].
3. To perform an update on the destination table I just use an update statement which uses an inner join with WHERE io.[check] <> destination.[check]
.
All updates are done in one transaction, and only columns are updated that really changed. In earlier versions I had all columns in my WHERE clause, which caused in very poor performance.
Just my two cents.
Regards,
Jan
April 8, 2008 at 10:27 am
That's an interestings solution. May I ask how you calculate your hash? That is, what algorithms do you use, and how do you do it?
April 9, 2008 at 3:14 am
Hi,
I'm using the hashbytes function of SQL Server 2005, this is a example:
hashbytes('md5', isnull([column_1], '') + isnull([column_2], '') + ...)
There are a few things you need to know to get this work:
If you are using datetime columns, convert them to nvarchar using a style which does not contains spaces, e.g: isnull(CONVERT([nvarchar],[MATURITY],(126)),'')
If you dont do, you can not create the column as persisted, took me 2 days to find that out.
Convert every column which is not nvarchar into nvarchar.
Always replace NULL values with an empty string, as anything + NULL always is NULL.
Best Regards,
Jan
April 9, 2008 at 3:41 am
D'OH! Jan already posted while I composed my reply - ignore this....
Check out either CHECKSUM() or HashBytes() in BOL. CHECKSUM's been around for ages and computes a CRC (CRC32 I think, or maybe a true checksum) over the whole row or a set of columns/expressions. HashBytes() is new in SQL2005 and computes a hash using your choice of algorithm (MD2, MD4, MD5, SHA1 but not SHA256 unfortunately) using a varchar or varbinary input. To compute it over multiple columns you'd need to do something like concatenate their values together as a varchar/varbinary and feed it to HashBytes().
I'm sure Jan can tell you the exact approach used...
Regards,
Jacob
April 9, 2008 at 3:46 am
I tried using checksum at first, but this doesn't work. In some cases checksum for two rows is the same although they differ in one column. This happenend on my side causing a lot of confusion (checksum was built over 200 columns). I think MS describes this also on bol or msdn.
Best Regards,
Jan
April 9, 2008 at 7:44 pm
The same is true (though to a lesser extent) with any of the algorithms supported by HashBytes(). There is no guarantee that two of your rows that have the same HashBytes() value are the same - it's just that the chances of it happening are very small, especially with the newer algorithms with longer hash lengths like the SHA family. You're not relying on the hash to provide guarantee that the row hasn't changed though are you? Eg you're only using it to minimise the amount of rows that need to be manually compared column-to-column... so a hash collision is not dangerous to your validation process, rather simply causes a more thorough check to be done, right?
Regards,
Jacob
April 10, 2008 at 3:25 am
Is there anyway to build a SQL Update that can test the fields during the update and not update them if they are the same, but still update fields that have changed? Is there a common teqnique to do this?
Of course, but you have to build the update statement based on changed values:
for each field in record do
if oldvalue<>newvalue begin
add 'field=newvalue' to SET clause
add 'field=oldvalue' to WHERE clause
end;
You have to add primary key to where clause too.
How to do it exactly depends on front end tool, language and components used.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply