UPDATE when the values are the same

  • Does an UPDATE query overwrite existing values if the already existing value is the same as the one your updating with?

  • I think what you mean is if you have a data set like this:

    CREATE TABLE USERS ( TYPEID INT,USERNAME VARCHAR(30),)

    INSERT VALUES (1,'JSMITH')

    INSERT VALUES (2,'JDOE')

    INSERT VALUES (1,'ASMITH')

    and you do something like this:

    UPDATE USERS

    SET TYPEID = 3

    WHERE TYPEID = 1

    will the update statement update both JSMITH and ASMITH? The simple answer is yes, it will.

  • CELKO (8/12/2012)


    Does an UPDATE query overwrite existing values if the already existing value is the same as the one your updating with?

    Who cares?

    Triggers, for example. transaction logs. Users sitting in frount of the frozen UI waiting while the database finishes the useless activity.

    Update dbo.Event

    SET Status = 1

    WHERE Eventdate < GETDATE()

    and Status = 0

    Without checking for the status value update will take forever on a big history table and add a big chunk to the Transaction Log file.

    If there is a FOR UPDATE trigger it will be executed for all records having Eventdate in the past.

    With Status = 0 added to the query it will actually update only recently added/processed records which have not been updated before. And trigger(s) will be run only for that smaller subset.

    When I fixed similar query on one of production databases it removed 8GB of data added daily to the TRN file. Not to mention easing the pressure on the server, improving performance not only this but other systems having databases hosted on the same SQL Server.

    _____________
    Code for TallyGenerator

  • Ask a simple yes/no question and get a few tangent replies, damn.

    Its quite simple. You have a table as follows.

    CREATE TABLE Test (

    ID INT IDENTITY(1,1)

    Name VARCHAR(100)

    )

    INSERT INTO Test (Name) VALUES ('Tom')

    UPDATE Test SET Name = 'Tom' WHERE ID = 1

    Does the update statement overwrite Tom even though Tom is already the value.

  • Why are you asking, and at what level of the engine are you interested in? (because the behaviour is interesting and complex when you get down to the deep internals)

    At a high level, yes it will. The row qualifies for the update, so the row gets updated, it will count towards the rows affected and the row will appear in the inserted and deleted pseudo-tables in triggers (and in the OUTPUT clause)

    When it comes to logging and to the actual modifications of the pages, it's more complex. It's not a simple yes/no question.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Okay so getting a little more specific. Does the SQL Server edit the MDF file and overwrite the current value of that record or does it carry out a check to see if their the same to save writing to the HDD something that's already there.

  • Well... To start all changes are made in memory and not to disk, and the changes to the files on disk are done later (by a background process) at a page level, not a row level...

    As far as I can tell, updating a row and not actually changing the values does not mark the page as dirty, meaning it won't be written back to disk when the checkpoint process runs.

    I'm not saying it isn't ever marked dirty, but in my simple tests with a single row in a table, updating the name to itself did not appear to mark the page as dirty.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yeah sorry I appreciate its all done in memory but I was referring to the write back stage.

    That's pretty much answered my question, cheers.

  • chris 76827 (8/13/2012)


    Yeah sorry I appreciate its all done in memory but I was referring to the write back stage.

    That's pretty much answered my question, cheers.

    That does not mean that you should not do the check though:

    update x

    set y = 3

    where y <> 3 or y is null

    will run faster, in my experience, than

    update x

    set y = 3

    especially if many of the rows are already 3.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • GilaMonster (8/13/2012)


    As far as I can tell, updating a row and not actually changing the values does not mark the page as dirty, meaning it won't be written back to disk when the checkpoint process runs.

    Actually, it does.

    I can offer you a simple test.

    Create a new database with FULL recovery mode. Create a single table in it.

    Insert several rows of data.

    Then run a script having a loop with a single statement:

    UPDATE MyTable

    Set ColValue = ColValue

    And watch the Log file growing.

    What means that pages are certainly written to disk.

    _____________
    Code for TallyGenerator

  • Sergiy (8/13/2012)


    GilaMonster (8/13/2012)


    As far as I can tell, updating a row and not actually changing the values does not mark the page as dirty, meaning it won't be written back to disk when the checkpoint process runs.

    Actually, it does.

    I can offer you a simple test.

    Create a new database with FULL recovery mode. Create a single table in it.

    Insert several rows of data.

    Then run a script having a loop with a single statement:

    UPDATE MyTable

    Set ColValue = ColValue

    And watch the Log file growing.

    What means that pages are certainly written to disk.

    Nope. There will be operations, like checkpoints, transactions, background processes, but the update itself is neither logged (it's not actually a data modification) nor are the pages dirtied in the process.

    Trivial to prove.

    CREATE TABLE [dbo].[Test](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [Name] [varchar](100) NULL

    ) ON [PRIMARY]

    insert into Test (Name) Values ('Tom')

    Checkpoint -- all dirty pages to disk, truncate log

    Now, identify the page no of that table's data page (exercise left for the reader)

    -- update the row to itself

    CHECKPOINT -- just to be sure and to allow for multiple tests

    BEGIN TRANSACTION

    UPDATE test SET name = name

    SELECT [Current LSN] ,

    Operation ,

    Context ,

    [Transaction ID] ,

    AllocUnitId ,

    AllocUnitName ,

    [Checkpoint Begin] ,

    [Num Transactions] ,

    [Checkpoint End] ,

    [Dirty Pages] ,

    [Log Record] FROM fn_dblog(NULL, null)

    SELECT * FROM sys.dm_os_buffer_descriptors AS bd WHERE page_id = '3783' -- the sole data page for this table

    COMMIT

    Results:

    The log has nothing more than the checkpoint and the begin transaction, there's no LOP_MODIFY_ROW which would be there if the update was logged. (if the query had been after the commit, there's be a LOP_COMMIT_XACT as well). It's the checkpoint, begin and commit which would have caused the log to grow in your test.

    The DMV showing the state of the pages in memory shows that the page that the table's sole row it on is unmodified after the update.

    Now there will likely be cases where this is not true, but for this trivial case, updating the row to itself neither dirties the page nor logs the update.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • CHECKPOINT -- just to be sure and to allow for multiple tests

    DECLARE @I int

    SET @I = 10

    WHILE @I > 0

    BEGIN

    BEGIN TRANSACTION

    UPDATE test SET name = name

    SELECT [Current LSN] ,

    Operation ,

    Context ,

    [Transaction ID] ,

    AllocUnitId ,

    AllocUnitName ,

    [Checkpoint Begin] ,

    [Num Transactions] ,

    [Checkpoint End] ,

    [Dirty Pages] ,

    [Log Record] FROM fn_dblog(NULL, null)

    COMMIT

    SET @I = @I - 1

    END

    Now - run it and watch the number of records in fn_dblog growing by 2 after each cycle.

    _____________
    Code for TallyGenerator

  • Yes as I showed in the screenshot in the post right above there's the begin xact and commit xact log records (which would be your two per cycle).

    This is what's is in the log after that test of 10 updates:

    The update itself is not logged. If it was, there would be a LOP_MODIFY_ROW for each update (10 of them), which there is not.

    (btw, you'd get the begin and commit even if there was no explicit begin and commit, as all modifications are in implicit transactions)

    If I change the query and update name to "Bob", the log records look like this

    That Lop_Modify_Row is the log record for the actual update, and it is missing in the case where the row is updated to itself, hence we can conclude that the update is not logged when updating the value to itself.

    The question was not, does the update cause any log records to be written to the log. The question was, does the update dirty the page if the update makes no changes. It does not, as I showed with the DMV previously.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The log file increaseing in size merely implies the mdf file is being written to, it's not really conclusive evidence that it is being writtent to. Just because one grows it doesn't mean the other must surely.

  • GilaMonster (8/15/2012)


    The question was not, does the update cause any log records to be written to the log. The question was, does the update dirty the page if the update makes no changes. It does not, as I showed with the DMV previously.

    I would not be so sure.

    I changed the script to bring out more info:

    USE [TEST]

    GO

    CREATE TABLE [dbo].[Test](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [Name] [varchar](100) NULL

    ) ON [PRIMARY]

    insert into Test (Name) Values ('TOM')

    insert into Test (Name) Values ('Andy')

    insert into Test (Name) Values ('Bob')

    insert into Test (Name) Values ('Tim')

    Checkpoint -- all dirty pages to disk, truncate log

    SELECT OBJECT_ID('[dbo].[Test]') ObjID, * FROM TEST

    Checkpoint -- all dirty pages to disk, truncate log

    UPDATE test

    SET name = 'Tom'

    WHERE NAME = 'Tom'

    SELECT [Current LSN] ,

    Operation ,

    Context ,

    [Transaction ID] ,

    AllocUnitId ,

    AllocUnitName ,

    [Checkpoint Begin] ,

    [Num Transactions] ,

    [Checkpoint End] ,

    [Dirty Pages] ,

    [Log Record] FROM fn_dblog(NULL, null)

    Checkpoint -- all dirty pages to disk, truncate log

    SELECT * FROM TEST

    DECLARE @I int

    SET @I = 4

    WHILE @I > 0

    BEGIN

    BEGIN TRANSACTION

    UPDATE test

    SET name = NAME

    WHERE @I%2 = 0 OR name <> NAME

    EXEC sp_lock

    COMMIT

    SELECT [Current LSN] ,

    Operation ,

    Context ,

    [Transaction ID] ,

    AllocUnitId ,

    AllocUnitName ,

    [Checkpoint Begin] ,

    [Num Transactions] ,

    [Checkpoint End] ,

    [Dirty Pages] ,

    [Log Record] FROM fn_dblog(NULL, null)

    Checkpoint -- all dirty pages to disk, truncate log

    SET @I = @I - 1

    END

    GO

    Please run the code with Execution Plan recorded.

    1. Define "the same".

    the code

    UPDATE test

    SET name = 'Tom'

    WHERE NAME = 'Tom'

    updates the value with the same, according to the column definition.

    But for some reason SQL Server ignores the collation settings and actually writes the page.

    So, it's not "the same" by data definition, it's "the same" by binary contents of the page(s).

    2. Execution plan shows that "even" updates have 4 records committed to the UPDATE part, and "odd" ones have 0 records committed.

    3. Messages show "(4 row(s) affected)" for "even" updates and "(0 row(s) affected)" for "odd" updates.

    4. SP_LOCK indicates exclusive locks applied on each of 4 records in "even" updates, when condition name <> NAME is not applied.

    To me it means that SQL Server treats those records as perfectly "dirty".

    5. "Even" updates add LOP_BEGIN_XACT and LOP_COMMIT_XACT to the log, when "Odd" ones do not do this.

    But you're right - those updates don't end up in the log file.

    Considering all of the above I can see only one explanation:

    1. UPDATE actually writes to the pages, regardless if the new values are the same or different from the old values.

    But it writes to the pages in memory.

    2. CHECKPOINT compares binary contents of the pages in memory and writes to disk only those ones which have become different after completed transactions.

    When same values are updated there is a record about a committed transaction, so CHECKPOINT has to verify the dirty pages and write to the disk any ones that have been changed. When the query filters out the same values there is no record about a committed transaction, so CHECKPOINT will not even start analysing contents of the pages.

    3. DMV shows the outcome of CHECKPOINT, not UPDATE, therefore the pages updated in memory don't appear there.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 159 total)

You must be logged in to reply to this topic. Login to reply