Slow process

Question

Slow process

Shah_hs

Ten Centuries

Points: 1117
More actions
May 16, 2007 at 8:53 am

#120023

I have a table of 82439877 records, i want to create a clustered index and modify few existing indexes on this table. When i try to do so it takes a very long time of around 15-18 hours and then i had to stop. Any suggesstions for making this fast?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

EdVassie SSC Guru Points: 60481 More actions · Answer 1

When you create a new clustered index, you are rebuilding the table. The entire table is copied to build the clustered index, and on 82M records this will take some time.

You can use the 'sort in tempdb' option to speed it up a bit, but it will still take hours.

You can try exporting your data, sorting it, then loading into a new table. This will still take many hours, but can be done alongside normal use of your existing table. However, you will have to apply any IUD activity to your live table to your new table before swapping the table names.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Paul Rubel Ten Centuries Points: 1047 More actions · Answer 2

One option, with a backup in place, is to use a reorg scenario. Unload the data to a flat file. Sort the data Descending Key sequence, build just the clustered index and then reload the data without logging. After the data is loaded, recreate other indexes.

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 3

Here's all the things that need to be true to do non-logged bulk inserts from BOL...

The recovery model is simple or bulk-logged.
The target table is not being replicated.
The target table does not have any triggers.
The target table has either 0 rows or no indexes.
The TABLOCK hint is specified. For more information, see Controlling the Locking Behavior.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jorge escobar pacheco SSC-Addicted Points: 421 More actions · Answer 4

Depending on your HD capacity, you might create a new table with the same column sctructure and with the proper indexes you want. And execute

Insert into [newtable] select * from [oldtable]

What's more if you have an Unique identifier, you may execute this statement in groups:

Insert into [newtable] select * from [oldtable] where [UI] between 1 and 1000000

So you have control over it, the problem is the HD capacity (you might need 3 times the space the table ocupies in the HD), but it should take almost as a [select * from] statement would take.

Later you execute

drop table [oldtable]

and finally execute

sp_rename '[newtable]', '[oldtable]'

So what do you think ?

best regards

Jorge Escobar

Ben Sullins-437405 SSCertifiable Points: 5397 More actions · Answer 5

I would use integration services...

Cheers,

Ben Sullins
bensullins.com
Beer is my primary key...

Sergiy SSC Guru Points: 110208 More actions · Answer 6

Very valuable suggestion.

The only problem is that statement

Insert into [newtable] select * from [oldtable]

will lock [oldtable] for the entire insertion period.

If you apply (NOLOCK) or do it in groups you need to take care of lines been updated during that long running insert.

I would create timestamp column on [oldtable] and copy its values into corresponding varbinary(8) column in [newtable].

Then I'd perform (NOLOCK) copy.

After that I'd start a loop for:

- UPDATE for rows where oldtable.timestamp newtable.timestampcopy (updated since copying started),

- delete from new table lines have been deleted from oldtable and

- insert lines have been inserted.

After each statement I'd collect number of affected lines using @@ROWCOUNT.

If total number is > 0 repeat the loop.

Else - begin transaction, drop oldtable, rename newtable, drop timestampcopy column, commit.

P.S. It should not take 18 hours or more. I'm more than sure when you do reindexing you are getting into deadlock situation. That's why it cannot not complete the task.

_____________
Code for TallyGenerator

James Horsley Hall of Fame Points: 3196 More actions · Answer 7

Aprat from other suggestions make sure you have not got AutoShrink on - I have seen this get locked up when doing massive operations.

Remember that data is stored in the order of the clustered index so adding one means moving all the actual data to the right position in the index btree.

One of the fastest was I have come across to do this is BCP out the data in the order of the proposed clustered index and load it back in with the BCP flag indicating data is in order (can't recall BCP flags off hand)

James Horsley
Workflow Consulting Limited