Best way to determine to Clustered index keys

Question

Best way to determine to Clustered index keys

Robin35

SSCertifiable

Points: 6471
More actions
February 2, 2017 at 2:15 pm

#318269

Hello,
I have a project tro determine the best columns for clustered index. I have a very large table with over 1 billion rows and it is a heap now but we are planning to create a clustered index on it and may be non clustered indexes too. I asked the development team to send me the queries or stored procedures that are accessing this table.
Is there any guide or standards to determine a clustered index ? if so please shed some light on this, i have googled and i found some articles but still not confident on how to move forward.
Also, once we determine the index, then how do go about adding the index to this VL table. there are some approaches to this, here's my thought.
1. Ceate index online and use sort in tempdb - ours is Enterprise edition
2. Create a new table and create the index and copy the data, may be using SSIS or BCP ?
Any suggestions ? Thanks in advance .

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 1

Here's a good article describing the generally accepted best practices for a clustered index:
https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/
Of course the best clustered index for your table will depend on a number of other factors as well, such as how data is added to the table, how data is queried from the table,

Robin35 SSCertifiable Points: 6471 More actions · Answer 2

Chris Harshman - Thursday, February 2, 2017 3:02 PM
Here's a good article describing the generally accepted best practices for a clustered index:
https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/
Of course the best clustered index for your table will depend on a number of other factors as well, such as how data is added to the table, how data is queried from the table,

Thanks Chris. I have already looked at this article and it has lot of informaiton. Once we find the index, how do we deploy it in production with 1 biliion rows table and size of 400GB ?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Robin35 - Friday, February 3, 2017 7:20 AM
Once we find the index, how do we deploy it in production with 1 biliion rows table and size of 400GB ?

I'd recommend create a new table, migrate the data over, in chunks if possible. Otherwise you're looking at over 400GB of log space been needed for the index creation, and probably at least 400GB of free space needed in the user database

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Lowell SSC Guru Points: 323518 More actions · Answer 4

Robin35 - Friday, February 3, 2017 7:20 AM
Chris Harshman - Thursday, February 2, 2017 3:02 PM
Here's a good article describing the generally accepted best practices for a clustered index:
https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/
Of course the best clustered index for your table will depend on a number of other factors as well, such as how data is added to the table, how data is queried from the table,
Thanks Chris. I have already looked at this article and it has lot of informaiton. Once we find the index, how do we deploy it in production with 1 biliion rows table and size of 400GB ?

you don't really have a 400G table out there that is missing a clustered index, do you? you are talking about changing the existing clustered index instead, right? an index that big is going to take a long time to create, and how much of the data is going to change while an index is created; you might be able to create the table under a different name,and sp_rename it once it's completely built and populated, but it depends on a lot of factors.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 5

One of the most informative videos I've ever seen on the subject is by Kimberly Tripp and can be found at the following URL...
https://www.youtube.com/watch?v=QjCEkI8Qm5c

It should be required training for anyone that writes code against an SQL Server database.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Robin35 SSCertifiable Points: 6471 More actions · Answer 6

GilaMonster - Friday, February 3, 2017 8:07 AM
Robin35 - Friday, February 3, 2017 7:20 AM
Once we find the index, how do we deploy it in production with 1 biliion rows table and size of 400GB ?
I'd recommend create a new table, migrate the data over, in chunks if possible. Otherwise you're looking at over 400GB of log space been needed for the index creation, and probably at least 400GB of free space needed in the user database

Thanks Gail. That's my initial plan. Create a new table, create clustered index and load data from original table ? does it makes sense ?

Robin35 SSCertifiable Points: 6471 More actions · Answer 7

Lowell - Friday, February 3, 2017 8:08 AM
Robin35 - Friday, February 3, 2017 7:20 AM
Chris Harshman - Thursday, February 2, 2017 3:02 PM
Here's a good article describing the generally accepted best practices for a clustered index:
https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/
Of course the best clustered index for your table will depend on a number of other factors as well, such as how data is added to the table, how data is queried from the table,
Thanks Chris. I have already looked at this article and it has lot of informaiton. Once we find the index, how do we deploy it in production with 1 biliion rows table and size of 400GB ?
you don't really have a 400G table out there that is missing a clustered index, do you? you are talking about changing the existing clustered index instead, right? an index that big is going to take a long time to create, and how much of the data is going to change while an index is created; you might be able to create the table under a different name,and sp_rename it once it's completely built and populated, but it depends on a lot of factors.

Unfortunately yes, it a heap and we are planning to create a clustered index. thanks for your suggestions.

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 8

Robin35 - Friday, February 3, 2017 11:04 AM
GilaMonster - Friday, February 3, 2017 8:07 AM
Robin35 - Friday, February 3, 2017 7:20 AM
Once we find the index, how do we deploy it in production with 1 biliion rows table and size of 400GB ?
I'd recommend create a new table, migrate the data over, in chunks if possible. Otherwise you're looking at over 400GB of log space been needed for the index creation, and probably at least 400GB of free space needed in the user database
Thanks Gail. That's my initial plan. Create a new table, create clustered index and load data from original table ? does it makes sense ?

Perhaps a different way...

Gail is correct that the HEAP will be preserved until the CI is created. You can avoid the 400GB log file growth by setting the database to the Bulk Logged recovery model but you can't avoid the 400GB of growth your MDF will suffer... unless you don't mind spending a little extra time.

0. ALWAYS TAKE A BACKUP BEFORE SUCH INTENSIVE OPERATIONS!!!!
1. Create a new file group for the database.
2. Create one file for that file group.
3. Pregrow the file to the size of the Table/CI.
4. Put the database in the BULK LOGGED Recovery Model so that we can take advantage of minimally logged index rebuilds.
5. Create the index but not on the PRIMARY file group. Instead, build it on the new file group. Because of the recovery model, the operation will be minimally logged and nearly twice as fast as conventional methods. This will also eliminate the need for having to do things in "chunks".
6. Once the index has been built and committed on the new file group, the heap will drop from the PRIMARY file group, leaving lots of space.
7. Rebuild the CI again but point it back to the PRIMARY file group.
8. Drop the now empty new file and filegroup.
9. Put the database back in the FULL Recovery Model.
10. Take a log file backup to narrow the gap for PIT (point in time) restores.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Robin35 SSCertifiable Points: 6471 More actions · Answer 9

Jeff Moden - Friday, February 3, 2017 4:15 PM
Robin35 - Friday, February 3, 2017 11:04 AM
GilaMonster - Friday, February 3, 2017 8:07 AM
Robin35 - Friday, February 3, 2017 7:20 AM
Once we find the index, how do we deploy it in production with 1 biliion rows table and size of 400GB ?
I'd recommend create a new table, migrate the data over, in chunks if possible. Otherwise you're looking at over 400GB of log space been needed for the index creation, and probably at least 400GB of free space needed in the user database
Thanks Gail. That's my initial plan. Create a new table, create clustered index and load data from original table ? does it makes sense ?
Perhaps a different way...
Gail is correct that the HEAP will be preserved until the CI is created. You can avoid the 400GB log file growth by setting the database to the Bulk Logged recovery model but you can't avoid the 400GB of growth your MDF will suffer... unless you don't mind spending a little extra time.
0. ALWAYS TAKE A BACKUP BEFORE SUCH INTENSIVE OPERATIONS!!!!
1. Create a new file group for the database.
2. Create one file for that file group.
3. Pregrow the file to the size of the Table/CI.
4. Put the database in the BULK LOGGED Recovery Model so that we can take advantage of minimally logged index rebuilds.
5. Create the index but not on the PRIMARY file group. Instead, build it on the new file group. Because of the recovery model, the operation will be minimally logged and nearly twice as fast as conventional methods. This will also eliminate the need for having to do things in "chunks".
6. Once the index has been built and committed on the new file group, the heap will drop from the PRIMARY file group, leaving lots of space.
7. Rebuild the CI again but point it back to the PRIMARY file group.
8. Drop the now empty new file and filegroup.
9. Put the database back in the FULL Recovery Model.
10. Take a log file backup to narrow the gap for PIT (point in time) restores.

This looks amazing Jeff. Thanks for your help. Our database us part of Always on AG group, so it requires a downtime as we have to take it out of AG group but can bulk recovery work in always on ?

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 10

Robin35 - Friday, February 3, 2017 5:11 PM
Jeff Moden - Friday, February 3, 2017 4:15 PM
Robin35 - Friday, February 3, 2017 11:04 AM
GilaMonster - Friday, February 3, 2017 8:07 AM
Robin35 - Friday, February 3, 2017 7:20 AM
Once we find the index, how do we deploy it in production with 1 biliion rows table and size of 400GB ?
I'd recommend create a new table, migrate the data over, in chunks if possible. Otherwise you're looking at over 400GB of log space been needed for the index creation, and probably at least 400GB of free space needed in the user database
Thanks Gail. That's my initial plan. Create a new table, create clustered index and load data from original table ? does it makes sense ?
Perhaps a different way...
Gail is correct that the HEAP will be preserved until the CI is created. You can avoid the 400GB log file growth by setting the database to the Bulk Logged recovery model but you can't avoid the 400GB of growth your MDF will suffer... unless you don't mind spending a little extra time.
0. ALWAYS TAKE A BACKUP BEFORE SUCH INTENSIVE OPERATIONS!!!!
1. Create a new file group for the database.
2. Create one file for that file group.
3. Pregrow the file to the size of the Table/CI.
4. Put the database in the BULK LOGGED Recovery Model so that we can take advantage of minimally logged index rebuilds.
5. Create the index but not on the PRIMARY file group. Instead, build it on the new file group. Because of the recovery model, the operation will be minimally logged and nearly twice as fast as conventional methods. This will also eliminate the need for having to do things in "chunks".
6. Once the index has been built and committed on the new file group, the heap will drop from the PRIMARY file group, leaving lots of space.
7. Rebuild the CI again but point it back to the PRIMARY file group.
8. Drop the now empty new file and filegroup.
9. Put the database back in the FULL Recovery Model.
10. Take a log file backup to narrow the gap for PIT (point in time) restores.
This looks amazing Jeff. Thanks for your help. Our database us part of Always on AG group, so it requires a downtime as we have to take it out of AG group but can bulk recovery work in always on ?

I suspect not especially since minimal logging takes place with the method I proposed but I don't know for sure. We have a different way of doing "always on" and so I've never used the SQL Server "always on" methods. Gail will likely be able to tell you off the top of her head.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

goher2000 SSCertifiable Points: 5479 More actions · Answer 11

I would (assuming one node can handle the workload cuz in my environment it does not)

0) point application to primary node instead of listener [if you want the application to keep working while index is running]
1) remove secondary replica out of AG (may be keep it there if you want to rollback the changes , this only applies assuming no one is going to write to the database during the change)
2) TAKE BACKUP of the database on primary database (FULL and/or diff/trans depends on whats going on in the environment)
3) change recovery model to simple. (for AlwaysON database must be using full recovery model -- no point of using to bulk log model)
4) create index(s) -- use Jeff's advise.
5) Validate the results/performance
6) change recovery model to full
7) take full and trans log backup (you can use the same backup to create your secondary replica in AG, I love HADR wizard but not for big databases)
8) delete database from secondary node
9) restore backups taken in step 7 on secondary node
10) join the database on secondary node to AG
11) point application back to listener

BTW, the going back to your original question (best way ..) , if you already have load on the server , may be tuning advisor can help or look for the scripts on google like https://blog.sqlauthority.com/2011/01/03/sql-server-2008-missing-index-script-download/

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 12

goher2000 - Sunday, February 5, 2017 11:50 PM
I would (assuming one node can handle the workload cuz in my environment it does not)
0) point application to primary node instead of listener [if you want the application to keep working while index is running]
1) remove secondary replica out of AG (may be keep it there if you want to rollback the changes , this only applies assuming no one is going to write to the database during the change)
2) TAKE BACKUP of the database on primary database (FULL and/or diff/trans depends on whats going on in the environment)
3) change recovery model to simple. (for AlwaysON database must be using full recovery model -- no point of using to bulk log model)
4) create index(s) -- use Jeff's advise.
5) Validate the results/performance
6) change recovery model to full
7) take full and trans log backup (you can use the same backup to create your secondary replica in AG, I love HADR wizard but not for big databases)
8) delete database from secondary node
9) restore backups taken in step 7 on secondary node
10) join the database on secondary node to AG
11) point application back to listener
BTW, the going back to your original question (best way ..) , if you already have load on the server , may be tuning advisor can help or look for the scripts on google like https://blog.sqlauthority.com/2011/01/03/sql-server-2008-missing-index-script-download/

Is the FULL backup required to reestablish AG or would a DIF do?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

goher2000 SSCertifiable Points: 5479 More actions · Answer 13

in this case (or whenever you change recovery model) you must take full backup before you can take diff or trn

Ricky Valencia SSCommitted Points: 1907 More actions · Answer 14

Hi Robin,

No. Bulk logged recovery model will not work in always on. The database must be in Full recovery model before it can participate in an AAG.

--Prerequisites, Restrictions, and Recommendations for Always On Availability Groups (SQL Server)
https://msdn.microsoft.com/en-us/library/ff878487.aspx