INSERT INTO and INDEXES

Question

INSERT INTO and INDEXES

Anam Verma

SSCertifiable

Points: 5365
More actions
January 26, 2010 at 7:13 pm

#144553

I am runing a query which inserts millions of rows into a table. The query is like
INSERT INTO table1
SELECT
FROM table 2
The SELECT query above is a complex query with multiple table joined and the number of records inserted may wary from 1 million to more than 10 million.
Table1 has 4 indexes (1 clustered and 3 non clustered) onto it. The row count for table1 will be nearly 100 million. Sorry, I can't post the table structure and query due to contractual reasons.
My understanding is this, with every insert statement, SQL Server will update all the indexes itself and i don't need to reorganise or rebuild it. I have checked index fragmentation and other index related things after insert statement is executed, everything seems to be fine.The problem i am getting is, whenever i run the insert statement, i have to rebuild the indexes otherwise all the queries done on table1 take ages to execute and sometimes a time out occurs. Rebuilding indexes take nearly an hour. Once indexes are rebuild, the same query take approxmately a minute which was getting time out before index was rebuilt.
Is there any way to overcome this problem?

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Shriji SSCommitted Points: 1881 More actions · Answer 1

As a best practice, you should script out your indexes. Drop them before the load, perform your load and then recreate them. When you talk about millions or rows, keeping index alive during the load is a killer.

Navnish

happycat59 One Orange Chip Points: 29389 More actions · Answer 2

I would disable the indexes instead of scripting/dropping them. This has pretty much the same affect as scripting/dropping but, I think, is a lot simpler to manage.

Have a look at "ALTER INDEX" in books online.

Anam Verma SSCertifiable Points: 5365 More actions · Answer 3

Navnish Sharma (1/26/2010)
As a best practice, you should script out your indexes. Drop them before the load, perform your load and then recreate them. When you talk about millions or rows, keeping index alive during the load is a killer.
Navnish

Navnish, i know keeping indexes alive during data load is a killer but don't forget that the table i am inserting into also contain massive amount of data. There can be situation where the table am inserting into may contain 95 million rows and only 2 million new rows are added to it. Dropping and creating indexes may end up in taking more time rather than keeping them active while data load.

I came across another forum entry on this website which tells to check for multiple things. http://www.sqlservercentral.com/Forums/Topic825829-360-1.aspx

again my question still remain unaswered, do we still have to rebuild the index after every big data load even if we leave the indexes active while data is being loaded?

Fraggle-805517 SSCrazy Eights Points: 8907 More actions · Answer 4

I could be wrong on this, but the issue might not be the indexes, but the statistics. If you statistics start to get out of date, then it is possible you are pulling bad execution plans. This would lead to poor query performance. Generally speaking, more than a 10% change in data can cause this.

And with the fact that by default, statistics are updated during a rebuild process as stated on BOL (Statistics_norecompute = off), then this might be where you are seeing the issue. Simple to figure out. Either A) just update statistics on the table and then run the query. b) run an actual execution plan and take a look at the difference between your actual and estimated row counts, etc and see if there are any big discrepancies.

HTH

Fraggle

Anam Verma SSCertifiable Points: 5365 More actions · Answer 5

Fraggle-805517 (1/26/2010)
I could be wrong on this, but the issue might not be the indexes, but the statistics. If you statistics start to get out of date, then it is possible you are pulling bad execution plans. This would lead to poor query performance. Generally speaking, more than a 10% change in data can cause this.
And with the fact that by default, statistics are updated during a rebuild process as stated on BOL (Statistics_norecompute = off), then this might be where you are seeing the issue. Simple to figure out. Either A) just update statistics on the table and then run the query. b) run an actual execution plan and take a look at the difference between your actual and estimated row counts, etc and see if there are any big discrepancies.
HTH
Fraggle

Fraggle, yes, statistics do make a big impact on the query execution but i have already mentioned in my initial post that after rebuilding the index, the query is executed in approximatedly a minute which is acceptable time.

So now another question gets into pipeline, will only updating the statistics rather than rebuilding the whole index will resolve the problem?

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 6

Navnish Sharma (1/26/2010)
As a best practice, you should script out your indexes. Drop them before the load, perform your load and then recreate them. When you talk about millions or rows, keeping index alive during the load is a killer.
Navnish

I wouldn't say it's a "best practice". It's a good suggestion but only if you don't need to keep users happy while you're doing it. Especially if you're using the "standard" edition which does not have the ability to rebuild indexes in an "online" manner.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 7

Anam Verma (1/26/2010)
Fraggle-805517 (1/26/2010)
I could be wrong on this, but the issue might not be the indexes, but the statistics. If you statistics start to get out of date, then it is possible you are pulling bad execution plans. This would lead to poor query performance. Generally speaking, more than a 10% change in data can cause this.
And with the fact that by default, statistics are updated during a rebuild process as stated on BOL (Statistics_norecompute = off), then this might be where you are seeing the issue. Simple to figure out. Either A) just update statistics on the table and then run the query. b) run an actual execution plan and take a look at the difference between your actual and estimated row counts, etc and see if there are any big discrepancies.
HTH
Fraggle
Fraggle, yes, statistics do make a big impact on the query execution but i have already mentioned in my initial post that after rebuilding the index, the query is executed in approximatedly a minute which is acceptable time.
So now another question gets into pipeline, will only updating the statistics rather than rebuilding the whole index will resolve the problem?

You said before that your indexes don't appear to be badly fragmented after the inserts. I think that might not be right. Could you post the before and after results of a DBCC SHOW CONTIG with table results for all indexes so we can take a peek?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Fraggle-805517 SSCrazy Eights Points: 8907 More actions · Answer 8

Anam Verma (1/26/2010)
Fraggle-805517 (1/26/2010)
I could be wrong on this, but the issue might not be the indexes, but the statistics. If you statistics start to get out of date, then it is possible you are pulling bad execution plans. This would lead to poor query performance. Generally speaking, more than a 10% change in data can cause this.
And with the fact that by default, statistics are updated during a rebuild process as stated on BOL (Statistics_norecompute = off), then this might be where you are seeing the issue. Simple to figure out. Either A) just update statistics on the table and then run the query. b) run an actual execution plan and take a look at the difference between your actual and estimated row counts, etc and see if there are any big discrepancies.
HTH
Fraggle
Fraggle, yes, statistics do make a big impact on the query execution but i have already mentioned in my initial post that after rebuilding the index, the query is executed in approximatedly a minute which is acceptable time.
So now another question gets into pipeline, will only updating the statistics rather than rebuilding the whole index will resolve the problem?

Anam, unless you are running your rebuild with the option not to recommpute statistics, you the system by default does this. So if, as you say, there is not fragmentation issues before or after the rebuild, this would be where my mind would take me.

Now as Jeff said above, if you would like to post your before and after fragmentation levels, we can provide a more accurate picture.

Thanks,

Fraggle

M&M SSC-Insane Points: 21700 More actions · Answer 9

As per my understanding, partitioning will be useful in such a case right?

Have not implemented on my database, but I was going through some documents.

Anam,

Out of curiosity, how much time does reindexing take in this huge table you are referring to.

M&M

homebrew01 SSC Guru Points: 55548 More actions · Answer 10

Anam Verma (1/26/2010)
Sorry, I can't post the table structure and query due to contractual reasons.

It may not matter for this thread, but couldn't you change critical names to something generic in order to include more useful information ?

Anam Verma SSCertifiable Points: 5365 More actions · Answer 11

mohammed moinudheen (1/27/2010)Out of curiosity, how much time does reindexing take in this huge table you are referring to.

it takes approximately 50 minutes to rebuild the index.

Anam Verma SSCertifiable Points: 5365 More actions · Answer 12

Jeff Moden (1/26/2010)
You said before that your indexes don't appear to be badly fragmented after the inserts. I think that might not be right. Could you post the before and after results of a DBCC SHOW CONTIG with table results for all indexes so we can take a peek?

I am setting up the whole environment for next run where i'll be able to get the index fragment details. there is another department involved in this process and i am struggling to get everyone on board to rerun the test. i'll post all the information, the moment i'll have it as i also want to learn and know where i can improve. 🙂

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

Autoupdate stats will NOT help you here if you put only 1 to 10 M rows into a table with 100M rows. Takes 20%+ to trigger the stats update. Unless you do a LOT of index SCANS (not SEEKS) I would simply force statistics updates on table1 (all stats, not just indexes) and be done with it. I would do my normal index maintenance as it occurs (say weekly for example). I am at least 95% certain this will fix your performance problem.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service