April 15, 2021 at 3:12 pm
I’ve inherited a query that on first glance should be re-written. The main part of which is performing an update on a table (200m+ rows) in a while and using nested selects in a not exists. I realise this is quite vague so can provide more detail.
The query has a tendency to fill my transaction log. Db is in simple recovery mode, so stupid question time. Other than re-write the script to update in batches, is there anything I could do to minimise the logging to the transaction log, or a way of estimating the correct size of the log for this particular query.
thanks!
April 15, 2021 at 3:19 pm
I(1) Other than re-write the script to update in batches, is there anything I could do to minimise the logging to the transaction log, or (2) a way of estimating the correct size of the log for this particular query.
thanks!
(1) (A) Consider data compression, incl. page compression, if not already in use. (B) Make sure you minimize the number of page splits caused by UPDATEs as best you can.
(2) Not any easy way that I know of. Row compression will usually reduce the data logged (since log data is also then row compressed), but that depends on the specific column types being updated.
Edit: Yes, more details from you would be needed to be any more specific.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 15, 2021 at 8:41 pm
3) Make sure you update only values which have actually changed.
_____________
Code for TallyGenerator
April 15, 2021 at 8:43 pm
(3) SQL does that itself, so that's only a very minor consideration.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2021 at 2:08 pm
(3) SQL does that itself, so that's only a very minor consideration.
I might be misinterpreting what you're saying above but, at face value, I have to ask...
How so? Since when does T-SQL decide that a value in a row containing, for example, the letter "A" is supposed to be updated to an "A". What do you think there is in T-SQL that says "Oh... that has the same value as what I'm trying to update so I'm not going to update that particular value"?
Or do you mean something else?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 16, 2021 at 3:13 pm
n/a.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2021 at 3:48 pm
ScottPletcher wrote:(3) SQL does that itself, so that's only a very minor consideration.
I might be misinterpreting what you're saying above but, at face value, I have to ask...
How so? Since when does T-SQL decide that a value in a row containing, for example, the letter "A" is supposed to be updated to an "A". What do you think there is in T-SQL that says "Oh... that has the same value as what I'm trying to update so I'm not going to update that particular value"?
Or do you mean something else?
No, I meant exactly that SQL generally does not log updates of a column to the same value. SQL does not log a modification to the data page, does not modify the page, and thus does not mark page(s) as dirty.
There are a some exceptions:
if snapshot is enabled, then full logging and page updates are always done: yet another reason I urge you to active snapshot only if you really, really need it;
updating a LOB column > 1 page, unless the update is exactly "lob_column = lob_column";
updating of a clustering key column(s) to the same value.
I believe SQL still does update the totals in sys.dm_db_index_operational_stats to reflect the "UPDATE", even though no data pages were actually modified / rewritten.
As to timing, since at least SQL 2005 (perhaps even before, although I think additional bypasses were added for 2005).
Why would SQL waste time and resources to "update" a column from 'A' to 'A' when a simple comparison could prevent that?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2021 at 4:58 pm
Scott, could you please cite some authority to that effect? I couldn't find it with some quick googling, so I ran an experiment. The code below is showing the the exact same cpu times and statistics whether or not the target value is being changed. Not calling you a liar, just trying to understand what is actually happening. I always thought it didn't check the columns before update because it would cost CPU cycles to do all the compares and usually *something* is going to change.
if object_id(N'tempdb..#UpdateTest') is not null drop table #UpdateTest
if object_id(N'tempdb..#Source') is not null drop table #Source
-- create and populate table to be updated
create table #UpdateTest(col1 int, col2 int, col3 int)
insert into #UpdateTest
select top 20000 N, N+15, N*2
from dbo.tally -- or vtally
create unique clustered index pk#UpdateTest on #UpdateTest(col1)
-- create an populate a source table with identical data
select *
into #Source
from #updateTest
create unique clustered index pk#Source on #Source(col1)
-- update using identical values
set statistics time, io on;
update u
set col1 = s.col1, col2 = s.col2, col3 = s.col3
from #source s
join #UpdateTest u on u.col1 = s.col1
set statistics time, io off;
-- update where col3 changes
set statistics time, io on;
update u
set col1 = s.col1, col2 = s.col2, col3 = 100
from #source s
join #UpdateTest u on u.col1 = s.col1
set statistics time, io off;
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 16, 2021 at 5:11 pm
The code below is showing the the exact same cpu times and statistics whether or not the target value is being changed. Not calling you a liar, just trying to understand what is actually happening. I always thought it didn't check the columns before update because it would cost CPU cycles to do all the compares and usually *something* is going to change.
if object_id(N'tempdb..#UpdateTest') is not null drop table #UpdateTest
if object_id(N'tempdb..#Source') is not null drop table #Source
-- create and populate table to be updated
create table #UpdateTest(col1 int, col2 int, col3 int)
insert into #UpdateTest
select top 20000 N, N+15, N*2
from dbo.tally -- or vtally
create unique clustered index pk#UpdateTest on #UpdateTest(col1)
-- create an populate a source table with identical data
select *
into #Source
from #updateTest
create unique clustered index pk#Source on #Source(col1)
-- update using identical values
set statistics time, io on;
update u
set col1 = s.col1, col2 = s.col2, col3 = s.col3
from #source s
join #UpdateTest u on u.col1 = s.col1
set statistics time, io off;
-- update where col3 changes
set statistics time, io on;
update u
set col1 = s.col1, col2 = s.col2, col3 = 100
from #source s
join #UpdateTest u on u.col1 = s.col1
set statistics time, io off;
That matches one of the exceptions I gave earlier:
Updating any clustering key column(s) does cause I/O.
Remove the clus key column from your UPDATEs and try again.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2021 at 5:12 pm
I'll have to check my internal notes for outer references. My notes just have the relevant details for me, not where I got them from.
But I'm rather busy now, so it will be a bit before I can provide any link(s).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2021 at 6:42 pm
Same result. Identical timings and statistics. Potential update to a unique clustered index column does cause additional I/O, but it's because the query plan has to include a SPLIT/SORT/COLLAPSE sequence to deal with the possibility that duplicate keys could be generated. The sort is half the work of the index-update query plan.
Don't worry about researching the question. I can do some more digging over the weekend.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 16, 2021 at 6:53 pm
The data still has to be read -- how else would SQL know that the proposed new value was the same as the value in the table itself?
It's the writes/dirty pages that get avoided.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2021 at 6:54 pm
OK, found one of the links:
https://www.sql.kiwi/2010/08/the-impact-of-non-updating-updates.html
I assume Paul White is an authoritative enough source for you :wink:.
I mentioned another source in my ref notes but can't find it. My internal (tech) notes are in pretty good shape, since I need to use them constantly. My ref notes, i.e. the original sources/ideas for the internal notes, are disorganized (that's the polite way of putting it).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2021 at 7:09 pm
Jeff Moden wrote:ScottPletcher wrote:(3) SQL does that itself, so that's only a very minor consideration.
I might be misinterpreting what you're saying above but, at face value, I have to ask...
How so? Since when does T-SQL decide that a value in a row containing, for example, the letter "A" is supposed to be updated to an "A". What do you think there is in T-SQL that says "Oh... that has the same value as what I'm trying to update so I'm not going to update that particular value"?
Or do you mean something else?
No, I meant exactly that SQL generally does not log updates of a column to the same value. SQL does not log a modification to the data page, does not modify the page, and thus does not mark page(s) as dirty.
There are a some exceptions: if snapshot is enabled, then full logging and page updates are always done: yet another reason I urge you to active snapshot only if you really, really need it; updating a LOB column > 1 page, unless the update is exactly "lob_column = lob_column"; updating of a clustering key column(s) to the same value.
I believe SQL still does update the totals in sys.dm_db_index_operational_stats to reflect the "UPDATE", even though no data pages were actually modified / rewritten.
As to timing, since at least SQL 2005 (perhaps even before, although I think additional bypasses were added for 2005).
Why would SQL waste time and resources to "update" a column from 'A' to 'A' when a simple comparison could prevent that?
Thanks, Scott. I'll check it out.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 16, 2021 at 7:10 pm
Paul is certainly a worthwhile authority. I will study his article later. I did notice this quote, which was emphasized:
A clustered table will always produce full logging and dirty buffer pages if (any part of) the clustering key is updated.
But I also notice that his article didn't cover the use case I created above. He only tested setting a column to a constant, or setting a column to itself. It probably holds true in the case of setting a column to the value of a column from a source table, but I will use his methodology to check.
Thanks. 🙂
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply