January 3, 2012 at 1:13 am
Hi!
My first post! I've been using this site for some time but never posted anything yet.
I came across a weird behavior when benchmarking some critical parts in a application.
We have a table "Person" (PersonId int PK for relations) and about 20 additional columns describing the person (Firstname, lastname, email) etc. (The fields can vary depending on the installation). Each of these columns have an index and one of them also has a unique index because that's the column that uniquely identifies a person in the customers system and is also used to identify existing persons.
Our users upload a text file with person data in the gui to insert in the database. Nonexistent records will be inserted and existing will be updated based on the column with the unique index.
The whole process of inserting/updating in done in the following steps
1. Bulks the txt file into a temporary table source (SourceId,firstname,lastname,email) etc
2. Rows containing invalid and duplicate data are deleted from table source
3. Rows containing existing (based on the unique column) rows are selected into another temp table existing(PersonId,SourceId)
4. Update on table Person is performed using the records in existing and source tables
5. Insert on table Person is performed using the records in table source which does not exist in table existing
Inserted PersonId is output to a temp table newpersons (PersionId) holding the new person ids
6. PersonId from table existing and table newpersons are inserted into a table PersonList (ListId int, PersonId)
In our tests with 50 000 rows and Person table is made up of 20 nvarchar(200) columns
Empty database
Step1 bulk(50k): 3,5 secs
Step2 clean: 4,5 secs
Step3 prepare existing(0 rows): 0,1 secs
Step4 update existing(0 rows)=: 0,02 secs
Step5&6 inserts(50k+50k): 16secs
Total time: ~24secs
Person table contains 50 000 records
Step1 bulk(50k): 3,5 secs
Step2 clean: 4,5 secs
Step3 prepare existing(50k): 1 sec
Step4 update existing(50k): 7,1 secs
Step5&6 inserts(0+50k): 1,5secs
Total time: ~17,6 secs
Updating 50 000 existing records is about 20% faster than inserting 50 000 new records!?!?!
Is that an "expected" behavior? I thought updates would be much more expensive than inserts.
I've tried to output the inserted personids into a @table variable and #temptable with no diff in performance.
I know you all want working examples but this is a bit big to create a working example of so i just hope someone could point me in the right direction.
The database data & log were set to 2Gb initial size so that should be no issue.
Edit: Seems to be the indexs that causes the performance impact. Disabling them increases performance on insert statement. The total time with disabling, inserting- enabling is pretty much the same. Why is the update statement not affected in the same way as the insert? Which is the best approach disabling - inserting - enabling or just letting server handle it using eager table spools?
January 3, 2012 at 3:19 am
Is all these steps running in a single transaction? Statistics is not updated within a transaction, so all your statistics is basically saying there is no records in this table, so perform a full table scan (or clustered index scan), as this is the most efficient.
There are two undocumented options to update statistics which can be used to fool the query optimizer in this case, but as they are undocumented they are unsupported and used entirely at your own risk.
January 3, 2012 at 5:09 am
Hi
The steps are not running in a single transaction. Only step 5&6 run in the same transaction
Edit: update statistics Person with fullscan had no effect on query plan.
January 3, 2012 at 5:29 am
Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2012 at 5:37 am
Jeff Moden (1/3/2012)
Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.
Jeff,
Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?
Regards
Chandan
January 3, 2012 at 5:41 am
We have a table "Person" (PersonId int PK for relations) and about 20 additional columns describing the person (Firstname, lastname, email) etc. (The fields can vary depending on the installation). Each of these columns have an index and one of them also has a unique index because that's the column that uniquely identifies a person in the customers system and is also used to identify existing persons.
You should concentrate on your table's indexing. IMO none of the tables has columns that (all) require indexing upon them. You might be querying on few of the key columns, index only those columns. E.g. Gender column might not need an index until & unless there are few reports that pull Person’s data based on gender only. Same for Email_ID.
January 3, 2012 at 5:44 am
chandan_jha18 (1/3/2012)
Jeff Moden (1/3/2012)
Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.Jeff,
Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?
Regards
Chandan
It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.
January 3, 2012 at 5:46 am
I've tried to output the inserted personids into a @table variable and #temptable with no diff in performance.
It might not be true. How did you measure this performance? Execution Plans?
January 3, 2012 at 5:49 am
Dev (1/3/2012)
chandan_jha18 (1/3/2012)
Jeff Moden (1/3/2012)
Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.Jeff,
Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?
Regards
Chandan
It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.
thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.
January 3, 2012 at 5:56 am
chandan_jha18 (1/3/2012)
Dev (1/3/2012)
chandan_jha18 (1/3/2012)
Jeff Moden (1/3/2012)
Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.Jeff,
Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?
Regards
Chandan
It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.
thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.
Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.
January 3, 2012 at 6:10 am
Dev (1/3/2012)
chandan_jha18 (1/3/2012)
Dev (1/3/2012)
chandan_jha18 (1/3/2012)
Jeff Moden (1/3/2012)
Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.Jeff,
Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?
Regards
Chandan
It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.
thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.
Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.
That pretty much covers it. INSERTs affect ALL indexes. UPDATEs only affect the indexes that have the columns that were modified by the UPDATE.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2012 at 7:06 am
Jeff Moden (1/3/2012)
Dev (1/3/2012)
chandan_jha18 (1/3/2012)
Dev (1/3/2012)
chandan_jha18 (1/3/2012)
Jeff Moden (1/3/2012)
Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.Jeff,
Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?
Regards
Chandan
It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.
thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.
Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.
That pretty much covers it. INSERTs affect ALL indexes. UPDATEs only affect the indexes that have the columns that were modified by the UPDATE.
Yes. To summarize it from eyes of kid-DBA, inserts will modify all the indexes and updates will modify indexes where that particular column is defined in the index structure.
But both inserts and updates will touch indexes. (inserts always and updates sometimes)
January 3, 2012 at 7:10 am
Yes. To summarize it from eyes of kid-DBA, inserts will modify all the indexes and updates will modify indexes where that particular column is defined in the index structure.
But both inserts and updates will touch indexes. (inserts always and updates sometimes)
Yup.
January 3, 2012 at 7:34 am
chandan_jha18 (1/3/2012)
Jeff Moden (1/3/2012)
Dev (1/3/2012)
chandan_jha18 (1/3/2012)
Dev (1/3/2012)
chandan_jha18 (1/3/2012)
Jeff Moden (1/3/2012)
Actually, it's "expected" behaviour. An INSERT will "touch" on all indexes. An UPDATE might not even "touch" any of the indexes.Jeff,
Can you please write few more lines to explain what you wrote above:-) Did you mean that updates never cause any index writes?
Regards
Chandan
It's simple. When you insert a row you insert FName, LName (& many more columns) that have indexes. SQL Server inserts corresponding entry in Indexes as well. BUT when you update a particular field, SQL Server updates only that column (say FName) only.
thanks. going by your reply, lets say that we issued an update to modify a column 'Fname'. wont the index entry too will get modified? I thought that non-clustered index itself is a piece of data(subset) of columns defined in its structure and thus index will get modified too. Sorry for my confusion.
Yes but only for FName. Rest of them will be untouched (for updates). It's not true for inserts statements.
That pretty much covers it. INSERTs affect ALL indexes. UPDATEs only affect the indexes that have the columns that were modified by the UPDATE.
Yes. To summarize it from eyes of kid-DBA, inserts will modify all the indexes and updates will modify indexes where that particular column is defined in the index structure.
But both inserts and updates will touch indexes. (inserts always and updates sometimes)
And add one more wrinkle to that. An insert will also have to validate all referential integrity against any tables that this one is depedendent on, where as an update will only do that if the columns affected in the update are part of the foriegn key constraint. There are just lots of activities that occur on an insert that don't on an update.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 3, 2012 at 8:49 am
Dev (1/3/2012)
I've tried to output the inserted personids into a @table variable and #temptable with no diff in performance.
It might not be true. How did you measure this performance? Execution Plans?
I didnt compare query plan becuase the difference in execution time was close to 0.
I've read some posts on the difference and hoped i would make some difference.
Viewing 15 posts - 1 through 15 (of 27 total)
You must be logged in to reply to this topic. Login to reply