November 23, 2024 at 6:24 am
Hi everyone
My SSIS package does a bulk insert of csv files into a table called tableA. The nature of the data is such that I have to truncate the table before new data is imported with bulk insert. Currently there are no indexes on tableA. TableA has fields symbol, name, trade_date, open_price, high_price, low_price, close_price, and volume (ie typical stock price data). There are about 15 million records in the table. The queries that use tableA use all the columns in tableA in various calculations. The joins are only done using symbol and trade_date. My questions are
1. What is the best way to index tableA given that it gets truncated each time new data is imported with bulk insert?
2. Does the index need to be re-built explicitly each time the package is run? Or do I just create the index once and then it is smart enough to update itself without being explicitly re-built?
Thank you
November 23, 2024 at 2:54 pm
If the symbol and trade_date form a unique pair in the entire table, make that combination the Clustered Primary Key. Leave this clustered index on the table at all times... even after a TRUNCATE.
After you've done a TRUNCATE, make sure your BULK INSERT uses the TABLOCK option and hopefully has the same order in the data lines as the Clustered Primary Key to achieve "Minimal Logging" in the BULK_LOGGED Recovery Model for performance and, of course, taking it easy on the Transaction Log File.
Doing this with the Clustered Index in place will keep you from doubling the space needed by the table, not to mention saving load time and more..
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2024 at 4:56 pm
If the symbol and trade_date form a unique pair in the entire table, make that combination the Clustered Primary Key. Leave this clustered index on the table at all times... even after a TRUNCATE.
After you've done a TRUNCATE, make sure your BULK INSERT uses the TABLOCK option and hopefully has the same order in the data lines as the Clustered Primary Key to achieve "Minimal Logging" in the BULK_LOGGED Recovery Model for performance and, of course, taking it easy on the Transaction Log File.
Doing this with the Clustered Index in place will keep you from doubling the space needed by the table, not to mention saving load time and more..
Thank you for this. You say "Leave this clustered index on the table at all times... even after a TRUNCATE". How would I ensure this happens? Are there specific steps I have to take?
November 24, 2024 at 8:25 pm
For Minimal Logging on a Table with a Clustered Index. Note that at least one of these is missing from the documentation but absolutely required to make Minimal Logging work.
https://www.sqlservercentral.com/articles/some-t-sql-inserts-do-follow-the-fill-factor-sql-oolie
https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
https://www.sqlservercentral.com/articles/hidden-rbar-counting-with-recursive-ctes
If you have any other question, fire away.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 25, 2024 at 5:59 am
I added a clustered index on the two fields. The run time for my SSIS package went UP by about 30 minutes vs having no index on the table. That is really odd to me. Do you have any idea on what happened?
November 25, 2024 at 2:21 pm
Couple of things. It's now maintaining an index in addition to simply dumping data into a heap. Also, depending on how your data is ordered on retrieval, you may be seeing lots of page splits as the inserts occur across the clustered index. That will slow things down. I'm still with Jeff, better to have the index on there, but you may need to adjust how you're doing the data load.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 25, 2024 at 2:39 pm
I'd suggest the clustered index on ( trade_date, symbol ) and not the other way around. You want the clus key to be unique, so, if it's not, you'll want to add another column(s) to make it unique. IF the table is only read after loading, then specify a FILLFACTOR of 100 (which is 100%) on the index.
TRUNCATE is the most efficient method to empty the table. And once the index is defined, SQL Server will automatically maintain it, even after the table is truncated.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply