What is the best index strategy for a table that gets truncated?

Question

What is the best index strategy for a table that gets truncated?

water490

SSC Eights!

Points: 972
More actions
November 23, 2024 at 6:24 am

#4488077

Hi everyone
My SSIS package does a bulk insert of csv files into a table called tableA. The nature of the data is such that I have to truncate the table before new data is imported with bulk insert. Currently there are no indexes on tableA. TableA has fields symbol, name, trade_date, open_price, high_price, low_price, close_price, and volume (ie typical stock price data). There are about 15 million records in the table. The queries that use tableA use all the columns in tableA in various calculations. The joins are only done using symbol and trade_date. My questions are
1. What is the best way to index tableA given that it gets truncated each time new data is imported with bulk insert?
2. Does the index need to be re-built explicitly each time the package is run? Or do I just create the index once and then it is smart enough to update itself without being explicitly re-built?
Thank you

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004411 More actions · Answer 1

If the symbol and trade_date form a unique pair in the entire table, make that combination the Clustered Primary Key. Leave this clustered index on the table at all times... even after a TRUNCATE.

After you've done a TRUNCATE, make sure your BULK INSERT uses the TABLOCK option and hopefully has the same order in the data lines as the Clustered Primary Key to achieve "Minimal Logging" in the BULK_LOGGED Recovery Model for performance and, of course, taking it easy on the Transaction Log File.

Doing this with the Clustered Index in place will keep you from doubling the space needed by the table, not to mention saving load time and more..

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

water490 SSC Eights! Points: 972 More actions · Answer 2

Jeff Moden wrote:

If the symbol and trade_date form a unique pair in the entire table, make that combination the Clustered Primary Key. Leave this clustered index on the table at all times... even after a TRUNCATE.
After you've done a TRUNCATE, make sure your BULK INSERT uses the TABLOCK option and hopefully has the same order in the data lines as the Clustered Primary Key to achieve "Minimal Logging" in the BULK_LOGGED Recovery Model for performance and, of course, taking it easy on the Transaction Log File.
Doing this with the Clustered Index in place will keep you from doubling the space needed by the table, not to mention saving load time and more..

Thank you for this. You say "Leave this clustered index on the table at all times... even after a TRUNCATE". How would I ensure this happens? Are there specific steps I have to take?

Jeff Moden SSC Guru Points: 1004411 More actions · Answer 3

For Minimal Logging on a Table with a Clustered Index. Note that at least one of these is missing from the documentation but absolutely required to make Minimal Logging work.

Take Log file backup before you do anything else to limit the size of the log file that will be affected by Minimal Logging.
You MUST be in the BULK_LOGGED or SIMPLE RECOVERY MODEL for Minimal Logging to work. You still get a bit of a speed advantage in the FULL Recovery Model but it will be FULLY LOGGED. Don't forget that mirroring and log shipping will prevent you from from using any Recovery Model other than FULL. Disclaimer... I have NOT tried Minimal Logging with AG or the like, which reportedly work fine with Bulk_Logged but I have NOT tested it to see if that's true.
The table must be empty. It must either be a brand new table or must have been TRUNCATED. Despite what others say, I've never seen it work after deletes.
The Clustered Index must be the only index on the table. Despite what a previous bunch of hoopla claimed, especially with Trace Flag 610, I've never seen minimal logging occur if the table with the Clustered Index has non-clustered indexes on it, as well.
The INSERT INTO must have a WITH (TABLOCK).
It's generally required to have the source data in the same order as the Clustered Index Key. I ALWAYS include an ORDER BY. If SQL doesn't need it, no sort will be done but it's critical to be there if it does need it.
For INSERT/SELECT, if any variables are present, you must use OPTION(RECOMPILE) on the INSERT/SELECT. This is NOT documented in the Microsoft SQL Server documentation. My recommendation is to always use OPTION(RECOMPILE) on such things. It doesn't hurt much and will kill you if you forget it when it's needed.
The minimal logging ONLY WORKS ON THE FIRST INSERT/SELECT. If you do your inserts in batches, only the first insert will be Minimally Logged.
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/prerequisites-for-minimal-logging-in-bulk-import
Add your non-clustered indexes (if any) after the INSERT has completed and before you return to the FULL Recovery Model (if you changed it it). They well also be Minimally Logged in the BULK_LOGGED and SIMPLE Recovery Models AFTER the table is populated. If you did change from FULL, remember to take at least a DIF backup to limit the size of the "minimally logged thing happened" flag to limit the size of the log file that you can't restore to the middle of (use it all or stop before you get to it).
As a bit of a side bar, the FILL FACTOR (if you assign one) on the Clustered Index WILL be followed for this first INSERT. That's also not documented in the MS documentation anywhere. It IS documented in the "Jeff Moden Book of Boy Howdy" and can be found at the following URL...
https://www.sqlservercentral.com/articles/some-t-sql-inserts-do-follow-the-fill-factor-sql-oolie
p.s. Using the GENERATE_SERIES() function to build large test tables breaks Minimal Logging. Use my fnTally function or Itzik Ben-Gan's GETNUMS() or something like them (DO NOT USE RECURSIVE CTEs FOR THIS!!!)

https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally

https://www.sqlservercentral.com/articles/hidden-rbar-counting-with-recursive-ctes

If you have any other question, fire away.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

water490 SSC Eights! Points: 972 More actions · Answer 4

I added a clustered index on the two fields. The run time for my SSIS package went UP by about 30 minutes vs having no index on the table. That is really odd to me. Do you have any idea on what happened?

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 5

Couple of things. It's now maintaining an index in addition to simply dumping data into a heap. Also, depending on how your data is ordered on retrieval, you may be seeing lots of page splits as the inserts occur across the clustered index. That will slow things down. I'm still with Jeff, better to have the index on there, but you may need to adjust how you're doing the data load.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101123 More actions · Answer 6

I'd suggest the clustered index on ( trade_date, symbol ) and not the other way around. You want the clus key to be unique, so, if it's not, you'll want to add another column(s) to make it unique. IF the table is only read after loading, then specify a FILLFACTOR of 100 (which is 100%) on the index.

TRUNCATE is the most efficient method to empty the table. And once the index is defined, SQL Server will automatically maintain it, even after the table is truncated.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".