Insert slower than update

  • Hi!

    My first post! I've been using this site for some time but never posted anything yet.

    I came across a weird behavior when benchmarking some critical parts in a application.

    We have a table "Person" (PersonId int PK for relations) and about 20 additional columns describing the person (Firstname, lastname, email) etc. (The fields can vary depending on the installation). Each of these columns have an index and one of them also has a unique index because that's the column that uniquely identifies a person in the customers system and is also used to identify existing persons.

    Our users upload a text file with person data in the gui to insert in the database. Nonexistent records will be inserted and existing will be updated based on the column with the unique index.

    The whole process of inserting/updating in done in the following steps

    1. Bulks the txt file into a temporary table source (SourceId,firstname,lastname,email) etc

    2. Rows containing invalid and duplicate data are deleted from table source

    3. Rows containing existing (based on the unique column) rows are selected into another temp table existing(PersonId,SourceId)

    4. Update on table Person is performed using the records in existing and source tables

    5. Insert on table Person is performed using the records in table source which does not exist in table existing

    Inserted PersonId is output to a temp table newpersons (PersionId) holding the new person ids

    6. PersonId from table existing and table newpersons are inserted into a table PersonList (ListId int, PersonId)

    In our tests with 50 000 rows and Person table is made up of 20 nvarchar(200) columns

    Empty database

    Step1 bulk(50k): 3,5 secs

    Step2 clean: 4,5 secs

    Step3 prepare existing(0 rows): 0,1 secs

    Step4 update existing(0 rows)=: 0,02 secs

    Step5&6 inserts(50k+50k): 16secs

    Total time: ~24secs

    Person table contains 50 000 records

    Step1 bulk(50k): 3,5 secs

    Step2 clean: 4,5 secs

    Step3 prepare existing(50k): 1 sec

    Step4 update existing(50k): 7,1 secs

    Step5&6 inserts(0+50k): 1,5secs

    Total time: ~17,6 secs

    Updating 50 000 existing records is about 20% faster than inserting 50 000 new records!?!?!

    Is that an "expected" behavior? I thought updates would be much more expensive than inserts.

    I've tried to output the inserted personids into a @table variable and #temptable with no diff in performance.

    I know you all want working examples but this is a bit big to create a working example of so i just hope someone could point me in the right direction.

    The database data & log were set to 2Gb initial size so that should be no issue.

    Edit: Seems to be the indexs that causes the performance impact. Disabling them increases performance on insert statement. The total time with disabling, inserting- enabling is pretty much the same. Why is the update statement not affected in the same way as the insert? Which is the best approach disabling - inserting - enabling or just letting server handle it using eager table spools?

  • Is all these steps running in a single transaction? Statistics is not updated within a transaction, so all your statistics is basically saying there is no records in this table, so perform a full table scan (or clustered index scan), as this is the most efficient.

    There are two undocumented options to update statistics which can be used to fool the query optimizer in this case, but as they are undocumented they are unsupported and used entirely at your own risk.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Hi

    The steps are not running in a single transaction. Only step 5&6 run in the same transaction

    Edit: update statistics Person with fullscan had no effect on query plan.

  • Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

  • We have a table "Person" (PersonId int PK for relations) and about 20 additional columns describing the person (Firstname, lastname, email) etc. (The fields can vary depending on the installation). Each of these columns have an index and one of them also has a unique index because that's the column that uniquely identifies a person in the customers system and is also used to identify existing persons.

    You should concentrate on your table's indexing. IMO none of the tables has columns that (all) require indexing upon them. You might be querying on few of the key columns, index only those columns. E.g. Gender column might not need an index until & unless there are few reports that pull Person’s data based on gender only. Same for Email_ID.

  • chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

    It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.

  • I've tried to output the inserted personids into a @table variable and #temptable with no diff in performance.

    It might not be true. How did you measure this performance? Execution Plans?

  • Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

    It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.

    thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.

  • chandan_jha18 (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

    It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.

    thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.

    Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.

  • Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

    It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.

    thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.

    Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.

    That pretty much covers it. INSERTs affect ALL indexes. UPDATEs only affect the indexes that have the columns that were modified by the UPDATE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

    It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.

    thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.

    Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.

    That pretty much covers it. INSERTs affect ALL indexes. UPDATEs only affect the indexes that have the columns that were modified by the UPDATE.

    Yes. To summarize it from eyes of kid-DBA, inserts will modify all the indexes and updates will modify indexes where that particular column is defined in the index structure.

    But both inserts and updates will touch indexes. (inserts always and updates sometimes)

  • Yes. To summarize it from eyes of kid-DBA, inserts will modify all the indexes and updates will modify indexes where that particular column is defined in the index structure.

    But both inserts and updates will touch indexes. (inserts always and updates sometimes)

    Yup.

  • chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

    It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.

    thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.

    Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.

    That pretty much covers it. INSERTs affect ALL indexes. UPDATEs only affect the indexes that have the columns that were modified by the UPDATE.

    Yes. To summarize it from eyes of kid-DBA, inserts will modify all the indexes and updates will modify indexes where that particular column is defined in the index structure.

    But both inserts and updates will touch indexes. (inserts always and updates sometimes)

    And add one more wrinkle to that. An insert will also have to validate all referential integrity against any tables that this one is depedendent on, where as an update will only do that if the columns affected in the update are part of the foriegn key constraint. There are just lots of activities that occur on an insert that don't on an update.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Dev (1/3/2012)


    I've tried to output the inserted personids into a @table variable and #temptable with no diff in performance.

    It might not be true. How did you measure this performance? Execution Plans?

    I didnt compare query plan becuase the difference in execution time was close to 0.

    I've read some posts on the difference and hoped i would make some difference.

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply