UPDATE when the values are the same

  • Sergiy (8/15/2012)


    1. UPDATE actually writes to the pages, regardless if the new values are the same or different from the old values.

    But it writes to the pages in memory.

    I disagree. If it did, the page would be marked dirty after the update. It is not as I showed with the DMV (I can prove the same thing with DBCC Page, it's just a lot harder)

    2. CHECKPOINT compares binary contents of the pages in memory and writes to disk only those ones which have become different after completed transactions.

    Checkpoint just writes dirty pages to disk. It doesn't check the pages and it's quite happy to write out pages dirtied by transactions that have not committed.

    If checkpoint had to take the page in memory, read the matching page from disk, compare the two and only write out changed pages it would be massively less efficient than it is and would require a lot of extra memory.

    3. DMV shows the outcome of CHECKPOINT, not UPDATE, therefore the pages updated in memory don't appear there.

    The DMV is buffer descriptors (ie buffer pool), it's a dmv that shows pages in memory, it does not have anything to do with checkpoint, it does not look at the pages on disk.

    Per BoL:

    Returns information about all the data pages that are currently in the SQL Server buffer pool. The output of this view can be used to determine the distribution of database pages in the buffer pool according to database, object, or type.

    and

    is_modified bit 1 = Page has been modified after it was read from the disk.

    As I said earlier, the update runs, it marks all rows as updated in the rows affected (even in cases it didn't change the values), those rows will appear in the trigger pseudo-tables. Locks will have to be taken, execution plan will have to run the update, SQL doesn't know the rows are the same until the update actually runs. There is simply an optimisation that does not log updates where the value has not changed and does not mark the pages as dirty until a value on it is actually changed.

    Your odd examples, no rows qualify for the update at all so there's no transaction to begin or commit and no rows to take locks on. Hence SQL can optimise away the update even earlier and not even begin a transaction.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This theme was covered in details here

    I encourage you to read it.Paul White: Page Free Space - The Impact of Non-Updating Updates


    I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
    Blog: http://somewheresomehow.ru[/url]
    Twitter: @SomewereSomehow

  • GilaMonster (8/16/2012)


    until the update actually runs.

    What do you mean by that?

    What's exactly happening when update is running?

    does not log updates where the value has not changed and does not mark the pages as dirty until a value on it is actually changed.

    Value changed by what?

    What's happening during update that it's necessary to check if "a value on it is actually changed"?

    _____________
    Code for TallyGenerator

  • Update runs = query execution engine processes the update plan that was generated by the query optimiser. Runs the various operators in the query plan

    Value changed by the query execution engine processing the update. It's just an optimisation that SQL has not to do unnecessary work when an update does not change the value of the column

    Edit: See Paul's blog post, he comes to the same conclusions I did.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You miss the point.

    Once again:

    UPDATE test

    SET name = 'Tom'

    WHERE NAME = 'Tom'

    Does this one writes into the page?

    You say it does when the value in Name= 'TOM' and it does not when the value in Name = 'Tom'

    But according to the rules for the collation 'TOM' = 'Tom' !

    And WHERE clause in the query perfectly confirms that the value in the column is no changed.

    Query plan evaluates both versions of 'Tom' as identical and acts accordingly.

    Both times it's 1 row which goes to the final "UPDATE" operation, and both times (1 row affected) is reported.

    So, what is changed when we change from 'TOM' to 'Tom' and is not changed when we change from 'Tom' to 'Tom'?

    _____________
    Code for TallyGenerator

  • Sergiy (8/16/2012)


    You miss the point.

    Once again:

    UPDATE test

    SET name = 'Tom'

    WHERE NAME = 'Tom'

    Does this one writes into the page?

    ...

    Basically does it cause physical I/O operation on the disk?

    My answer to that is: YES, it may at least on SQL2005.

    The answer comes from experience with large ETL processes.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Sergiy (8/16/2012)


    You miss the point.

    Once again:

    UPDATE test

    SET name = 'Tom'

    WHERE NAME = 'Tom'

    Does this one writes into the page?

    Yes it does. The optimisation is going to be looking at the binary value of the column before after to determine whether it can apply the optimisation of not having to log the change and not having to dirty the page.

    While in case insensitive collation the comparison TOM = Tom returns true, they are not the same value and they cannot be considered the same on the data page, hence this is an update that does change the column and hence must be logged and must modify the data page in memory. (because doing things like taking the ASCII value of the 3rd character will return different answers for the two values)

    So, what is changed when we change from 'TOM' to 'Tom' and is not changed when we change from 'Tom' to 'Tom'?

    TOM to Tom is a update that modified the value. Tom to Tom is an update that does not change the value and hence allows SQL to optimise away actually doing the update to the page and to the log.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/16/2012)


    Yes it does.

    Every time?

    The optimisation is going to be looking at the binary value of the column before after

    Before and after WHAT?

    While in case insensitive collation the comparison TOM = Tom returns true, they are not the same value and they cannot be considered the same on the data page

    Optimiser surely does not know that.

    The query clearly says

    SET name = 'Tom' WHERE name = 'Tom'

    So, from the logic of the query new value is equal to the old one.

    TOM to Tom is a update that modified the value.

    Says who?

    WHERE name ('TOM') = 'Tom' - if the values would not be the same for SQL Server it would not be updating ther record.

    So, SQL Server first considers the values identical and later it changes its mind and starts thinking they are different.

    When and why does it happen?

    _____________
    Code for TallyGenerator

  • Sergiy (8/16/2012)


    GilaMonster (8/16/2012)


    Yes it does.

    Every time?

    I'm not going to be stupid enough to claim each and every case.

    The optimisation is going to be looking at the binary value of the column before after

    Before and after WHAT?

    The update. Before being the value that goes into the deleted pseudo-table and after being the value that goes into the inserted pseudo-table in a trigger or output clause.

    While in case insensitive collation the comparison TOM = Tom returns true, they are not the same value and they cannot be considered the same on the data page

    Optimiser surely does not know that.

    The query clearly says

    SET name = 'Tom' WHERE name = 'Tom'

    So, from the logic of the query new value is equal to the old one.

    I never claimed the optimiser knew that. I said earlier, the optimiser still generates a plan for the update, the update still occurs, there is simply an optimisation built into either the query processor or the storage engine (probably the storage engine) that allows for the log record not to be written and the page not to be marked dirty when the old value on the page exactly matches the new value that update is setting (exactly meaning binary values are identical). That is all.

    I am not claiming, nor have I ever claimed that the update does not run. I am not claiming nor have I ever claimed that locks are not taken. I have said that before.

    TOM to Tom is a update that modified the value.

    Says who?

    WHERE name ('TOM') = 'Tom' - if the values would not be the same for SQL Server it would not be updating ther record.

    So, SQL Server first considers the values identical and later it changes its mind and starts thinking they are different.

    When and why does it happen?

    I'm trying to figure out if you are just not understanding or if you're trying to find me wrong somewhere. Which is it?

    As for equality, in case insensitive collation, the comparison TOM = Tom returns true. That is a logical rule that the query engine follows. They however are different binary values when stored in the data page (not hard to test and prove, you can do that yourself).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'd like to nominate this as a candidate for the 'esoteric thread of 2012' award ๐Ÿ™‚

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (8/16/2012)


    I'd like to nominate this as a candidate for the 'esoteric thread of 2012' award ๐Ÿ™‚

    +1

    The proofs are educational - but sooner or later someone's conclusions will be shown to be incorrect.

    Sergiy: the data is written

    Gail: the data is not written

    Paul W: it depends

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Paul showed that for a heap (the only case in discussion here), the page is not dirtied and the update log record not written when an update occurs that does not change the row in any way (as in the exact binary representation of the row as written on the page, not the logical equivalence that the query processor uses).

    The cases he showed where it that was not the case was for an clustered index key because that update is split into delete/insert and when one or both of the row versioning isolation levels are in effect and sometimes with LOB columns.

    I'm not disagreeing with Paul's conclusions, the case in question here is a small part of the tests he did and his blog post agrees with my conclusions in the only case we are talking about here.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/16/2012)


    Paul showed that for a heap (the only case in discussion here), the page is not dirtied and the update log record not written when an update occurs that does not change the row in any way (as in the exact binary representation of the row as written on the page, not the logical equivalence that the query processor uses).

    The cases he showed where it that was not the case was for an clustered index key because that update is split into delete/insert and when one or both of the row versioning isolation levels are in effect and sometimes with LOB columns.

    I'm not disagreeing with Paul's conclusions, the case in question here is a small part of the tests he did and his blog post agrees with my conclusions in the only case we are talking about here.

    +1

    It's interesting that in only the cheapest case has update to self been optimized out. Perhaps checking for change in a LOB column is more expensive than optimizing out - but for a CI, there is so much activity associated with a value change that I'm surprised there's no check.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • GilaMonster (8/16/2012)


    I'm not going to be stupid enough to claim each and every case.

    Yeah, but you're trying to play stupid by not understanding that we have only 2 cases in this particular example.

    :rolleyes:

    The update. Before being the value that goes into the deleted pseudo-table and after being the value that goes into the inserted pseudo-table in a trigger or output clause.

    What is "update"?

    pseudo-tables appear as a result of completed update (AFTER trigger, right?). Comparing of values in those pseudo-tables is irrelevant to this discussion, because we are trying to find out what happens DURING update, before the pseudo-tables can be generated.

    I never claimed the optimiser knew that. I said earlier, the optimiser still generates a plan for the update, the update still occurs, there is simply an optimisation built into either the query processor or the storage engine (probably the storage engine) that allows for the log record not to be written and the page not to be marked dirty when the old value on the page exactly matches the new value that update is setting (exactly meaning binary values are identical). That is all.

    Here we go!

    "update still occurs" - thank you!

    So you agree that the page is written to.

    Like I said from the beginning, it's the storage engine (I name it CHECKPOINT, probably not the best choice of word) which checks the content of the UPDATED PAGE and finds out that the content is still the same as in the version of the page stored on the disk, so there is no point in flushing it to disk and add a record to the log file.

    But as you just agreed - UPDATE with the same value actually writes to the page containing the updated value.

    _____________
    Code for TallyGenerator

  • Sergiy (8/16/2012)


    Here we go!

    "update still occurs" - thank you!

    So you agree that the page is written to.

    Nope. Two different things.

    But as you just agreed - UPDATE with the same value actually writes to the page containing the updated value.

    No, I did not agree.

    The update occurred. The query processor ran the update plan, the plan that was generated by the query optimiser. That is not a matter up for discussion, we can look at extended events or profiler and see the event occurring.

    The page was not marked as dirty, the update log record was not written when the update does not change the value of the row (as in the exact binary representation of the row on the page). I've proven it several times, so has Paul.

    I'm done.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 16 through 30 (of 159 total)

You must be logged in to reply to this topic. Login to reply