July 28, 2018 at 3:45 pm
Hi,
I have a typical DWH scenario of Source --> Destination ETL.
During this ETL, there are some intermediate tables constructed and dropped. Source --> Stage1 ---> Stage 2.....Stage N ---> Destination.
Each of these Stage tables are dedicated to this particular ETL and are not used by another process/user. Since any INDEX has its own housekeeping overhead, having one need not necessarily guarantee enhanced performance.
1) Thus, Record-count-wise, Depth-wise or Width-wise is there a thumb rule when we should think of Indexing a table....in my scenario, the Staging tables?
2) if not from Volume standpoint as said in (1) above, from the corresponding Execution Plan, is there anything to be watched for, that ALARMs the fact that, a particular area can be tuned? (such as RID Lookups as candidates for Covering Index or Hash Join/Merge/Nested joins to be converted into a more efficient mechanism etc)
thank you
July 28, 2018 at 8:17 pm
If indexing will enhance the performance of your intermediary staging tables then it's a good idea. Without a lot more information no one can give you meaningful advise. Try testing the load process with indexes vs without.
July 30, 2018 at 8:08 am
etl2016 - Saturday, July 28, 2018 3:44 PMHi,I have a typical DWH scenario of Source --> Destination ETL.
During this ETL, there are some intermediate tables constructed and dropped. Source --> Stage1 ---> Stage 2.....Stage N ---> Destination.
Each of these Stage tables are dedicated to this particular ETL and are not used by another process/user. Since any INDEX has its own housekeeping overhead, having one need not necessarily guarantee enhanced performance.
1) Thus, Record-count-wise, Depth-wise or Width-wise is there a thumb rule when we should think of Indexing a table....in my scenario, the Staging tables?
2) if not from Volume standpoint as said in (1) above, from the corresponding Execution Plan, is there anything to be watched for, that ALARMs the fact that, a particular area can be tuned? (such as RID Lookups as candidates for Covering Index or Hash Join/Merge/Nested joins to be converted into a more efficient mechanism etc)thank you
No arguments with Joe. You have to make the determination yourself through testing what, where, and how to index any given table within your own system. Staging or not, the right indexes in the right place help performance.
As to looking at the execution plans, there isn't a single set of "do this/don't do this" that will work in all situations. Yes, an RID lookup tells you that you don't have a clustered index, you do have nonclustered indexes, and that you are having to go to the heap to retrieve the data that is not a part of that nonclustered index. Is that a problem? Well, is it for one row or one million rows? Will adding a clustered index slow down the insert or update that you're doing as part of the data load? Can you do the data load and then add the index in support of queries against the loaded data? After adding the cluster do you still need the nonclustered index? Will making the nonclustered index covering work as well or better than creating a clustered index? Can we make the index unique? Is the query doing an aggregation? What if we added a columnstore index? Clustered columnstore or nonclustered?
See what I mean? It's all far too dependent on the situation to give you a 10 point check list that will, guaranteed, cover 90% of issues.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 30, 2018 at 8:19 am
Non-clustered index(es) are almost always a waste of resources on a staging table.
A clustered index is much more likely to be useful, although it's obviously not guaranteed to be. If you join the staging table a lot, and the join column(s) are the same as the clus key in the final table the staging table will be loaded into, then definitely clus the staging table that way.
Otherwise, as everyone has noted, it's on a case by case basis, but with special scrutiny (review) of any non-clus index(es), as there has to be a big, known advantage to bother with a non-clus index on a staging table.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 31, 2018 at 5:38 am
If the staging tables are a decent size (over a million rows), consider creating them with a clustered columnstore index.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy