Insert slower than update

  • Grant Fritchey (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

    It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.

    thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.

    Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.

    That pretty much covers it. INSERTs affect ALL indexes. UPDATEs only affect the indexes that have the columns that were modified by the UPDATE.

    Yes. To summarize it from eyes of kid-DBA, inserts will modify all the indexes and updates will modify indexes where that particular column is defined in the index structure.

    But both inserts and updates will touch indexes. (inserts always and updates sometimes)

    And add one more wrinkle to that. An insert will also have to validate all referential integrity against any tables that this one is depedendent on, where as an update will only do that if the columns affected in the update are part of the foriegn key constraint. There are just lots of activities that occur on an insert that don't on an update.

    Thanks for all the info.

    The update statement contains the exact same data as the first insert. Does sql server skip updating the indexes if the values are identical?

  • Dev (1/3/2012)


    We have a table "Person" (PersonId int PK for relations) and about 20 additional columns describing the person (Firstname, lastname, email) etc. (The fields can vary depending on the installation). Each of these columns have an index and one of them also has a unique index because that's the column that uniquely identifies a person in the customers system and is also used to identify existing persons.

    You should concentrate on your table's indexing. IMO none of the tables has columns that (all) require indexing upon them. You might be querying on few of the key columns, index only those columns. E.g. Gender column might not need an index until & unless there are few reports that pull Person’s data based on gender only. Same for Email_ID.

    All columns needs to be searchable from the GUI and the table Person can contain 2M+ rows. Sometimes search is performed on multiple columns where index over multiple columns also is needed? or? =)

  • niclas-1082356 (1/3/2012)


    Dev (1/3/2012)


    I've tried to output the inserted personids into a @table variable and #temptable with no diff in performance.

    It might not be true. How did you measure this performance? Execution Plans?

    I didnt compare query plan becuase the difference in execution time was close to 0.

    I've read some posts on the difference and hoped i would make some difference.

    Execution plans are not a good measure for performance anyway. They show you what happened, not how fast it was. Still worth looking at.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • niclas-1082356 (1/3/2012)


    Grant Fritchey (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Dev (1/3/2012)


    chandan_jha18 (1/3/2012)


    Jeff Moden (1/3/2012)


    Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.

    Jeff,

    Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?

    Regards

    Chandan

    It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.

    thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.

    Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.

    That pretty much covers it. INSERTs affect ALL indexes. UPDATEs only affect the indexes that have the columns that were modified by the UPDATE.

    Yes. To summarize it from eyes of kid-DBA, inserts will modify all the indexes and updates will modify indexes where that particular column is defined in the index structure.

    But both inserts and updates will touch indexes. (inserts always and updates sometimes)

    And add one more wrinkle to that. An insert will also have to validate all referential integrity against any tables that this one is depedendent on, where as an update will only do that if the columns affected in the update are part of the foriegn key constraint. There are just lots of activities that occur on an insert that don't on an update.

    Thanks for all the info.

    The update statement contains the exact same data as the first insert. Does sql server skip updating the indexes if the values are identical?

    That is not sofmething I've specifically tested, so I'm unsure, but even if it does update the index, it's going straight to the key & modifying it, it won't require a page split or rearranging, so it's likely to be the same.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • niclas-1082356 (1/3/2012)


    Dev (1/3/2012)


    We have a table "Person" (PersonId int PK for relations) and about 20 additional columns describing the person (Firstname, lastname, email) etc. (The fields can vary depending on the installation). Each of these columns have an index and one of them also has a unique index because that's the column that uniquely identifies a person in the customers system and is also used to identify existing persons.

    You should concentrate on your table's indexing. IMO none of the tables has columns that (all) require indexing upon them. You might be querying on few of the key columns, index only those columns. E.g. Gender column might not need an index until & unless there are few reports that pull Person’s data based on gender only. Same for Email_ID.

    All columns needs to be searchable from the GUI and the table Person can contain 2M+ rows. Sometimes search is performed on multiple columns where index over multiple columns also is needed? or? =)

    That’s interesting. In such scenarios I usually sacrifice performance for few searches which I guess would be less frequent. I usually involve BA’s also for such scenarios to let them prioritize the search criteria. Or if somebody has any suggestions, please share it.

    One thing for sure, I table with 20 columns doesn’t need 20 indexes. It may have, but shouldn’t have unless it’s data warehouse table (less DMLs).

    Also, your searches would be better if you create index on set of columns. If user selects multiple options, from the search screen only one of the index (Optimizer decides which one) will be used. Based on the current setup it would be based on one column only.

  • Grant Fritchey (1/3/2012)


    niclas-1082356 (1/3/2012)


    ...

    The update statement contains the exact same data as the first insert. Does sql server skip updating the indexes if the values are identical?

    That is not sofmething I've specifically tested, so I'm unsure, but even if it does update the index, it's going straight to the key & modifying it, it won't require a page split or rearranging, so it's likely to be the same.

    IMO databases (engines) are not that intelligent that they can identify identical rows & leave updates. Even if they are, we can’t skip the search operation to identify the matching rows.

    However it requires testing but my hypothesis says it will try to update those indexes (columns) if they are in UPDATE statements.

  • Paul, I can’t access these URLS (limited internet access 🙁 ). Would you mind sharing bullet points here?

  • Dev (1/3/2012)


    Paul, I can’t access these URLS (limited internet access 🙁 ). Would you mind sharing bullet points here?

    The main point is that updates that don't change values in non-clustered indexes are generally skipped in SQL Server 2005 and later. Please read both links when you have full internet access because there are too many subtleties and side-effects to summarize here.

  • SQL Kiwi (1/3/2012)


    Dev (1/3/2012)


    Paul, I can’t access these URLS (limited internet access 🙁 ). Would you mind sharing bullet points here?

    The main point is that updates that don't change values in non-clustered indexes are generally skipped in SQL Server 2005 and later. Please read both links when you have full internet access because there are too many subtleties and side-effects to summarize here.

    Surely I will do that. Your explanation discards one of my assumptions of updates. But it will do a search for identical rows, if I am not wrong here as well.

  • niclas-1082356 (1/3/2012)


    The whole process of inserting/updating in done in the following steps

    1. Bulks the txt file into a temporary table source (SourceId,firstname,lastname,email) etc

    2. Rows containing invalid and duplicate data are deleted from table source

    3. Rows containing existing (based on the unique column) rows are selected into another temp table existing(PersonId,SourceId)

    4. Update on table Person is performed using the records in existing and source tables

    5. Insert on table Person is performed using the records in table source which does not exist in table existing

    Inserted PersonId is output to a temp table newpersons (PersionId) holding the new person ids

    6. PersonId from table existing and table newpersons are inserted into a table PersonList (ListId int, PersonId)

    This process seems to have more steps than are strictly necessary. I would look to simplify the process so that fewer intermediate steps are required. Ideally, rows would be read once from the text file source, inserted or updated to the destination table, and the PersonList table maintained, all in one statement (or perhaps two at most). This is usually achievable using the MERGE statement, composable DML (INSERT on a MERGE source), and the OPENROWSET BULK provider. I described using OPENROWSET to transform rows as they are read from the source text file here: http://www.sqlservercentral.com/Forums/FindPost916633.aspx

    I am also surprised that the destination table really requires all those indexes, but anyway: it is likely that a good part of the costs involved relate to spooling and sorting the data stream into key order for each index. There's not too much to be done about that, but you might see some benefit (in logging terms at least) for the INSERT (or insertion part of the MERGE operation) by enabling trace flag 610 as described here: http://msdn.microsoft.com/en-us/library/dd425070.aspx. This optimization does not apply to UPDATEs.

    The trade-off between disabling and rebuilding non-clustered indexes versus performing the inserts and updates with them in place really depends on how much data is being changed, whether minimal logging can be used for the index rebuild, and how many indexes there are. It seems in your case that the costs are pretty evenly balanced.

    Execution plans would be interesting if you are able to provide them because there are certain details that are only revealed there. It seems likely that the eager spool you see is for halloween protection (to avoid the reading side of the plan seeing rows that have been inserted or modified by the writing side of the plan). This is another reason to pursue the OPENROWSET idea: rows read from a text file do not require halloween protection in general.

  • Dev (1/3/2012)


    But it will do a search for identical rows, if I am not wrong here as well.

    Well, the rows will be in the stream anyway (having been qualified by the reading side of the plan), but yes there will generally be a Filter operator or predicate on the update operator to filter rows that do not require updating. The bet is that the time spent to see if an update is required or not will generally be repaid many times over by skipping at least a few updates. The optimizer does make an effort to assess the cost of comparison versus expected savings, and if the expected savings are smaller, it may choose to skip the comparisons and just update the indexes anyway. As a rule, filtering this way is beneficial and often appears in real query plans.

  • SQL Kiwi (1/4/2012)


    Dev (1/3/2012)


    But it will do a search for identical rows, if I am not wrong here as well.

    Well, the rows will be in the stream anyway (having been qualified by the reading side of the plan), but yes there will generally be a Filter operator or predicate on the update operator to filter rows that do not require updating. The bet is that the time spent to see if an update is required or not will generally be repaid many times over by skipping at least a few updates. The optimizer does make an effort to assess the cost of comparison versus expected savings, and if the expected savings are smaller, it may choose to skip the comparisons and just update the indexes anyway. As a rule, filtering this way is beneficial and often appears in real query plans.

    Thank you so much for the explanation. 🙂

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply