June 13, 2012 at 9:43 am
I've just been handed a database to look at which will be going live in a few weeks and it's horrible.
Due to business requirements, the fact i've said it's not really fit for purpose will be ignored with vague promises of further development in the future.....
Anyway, onto my question... Due to the nature of the data there's not a fat lot of selectivity in the columns that occur in the WHERE clauses that run against it. In 10,000,000 rows of data in some columns you're looking at 8 distinct values.
In order to have some sort of parent/child relationship the developer has slapped a Primary Key with Identity on every column which has by default also become the clustered index (there's no other data unique or useful enough to use with a C I ).
These Primary keys then appear as Foriegn Keys in many other tables.
So for example, I insert a new row of data into the main table it will also cascade the identity into about 14 other tables.
So various stored procedures are then selecting data and doing JOINs between loads of these identity values to pull the data out of the assorted tables to get the data required for the app.
My question is.... Is there any benefit in slapping an index on these Foriegn keys which appear in the JOIN and WHERE clauses as the Primary Key they refer back to is actually a clustered index?
Can't see it myself but just curious.
Just to make performance even more interesting, the application inserts about 100,000 rows of data per day and the developer didn't see any benefit in using staging tables. So 100k rows get dumped in the main tables which is then copied and transformed and then the original rows are deleted.
That's great for indexes. Really great.
Oh and as the Developer appears to have used the results from DTA - verbatim - we have lots of really wide and useless indexes (which I will be later this week replacing with sensible indexes), so the indexes are actually using about 4 times more disk space than the actual data.
I'm half tempted to remove all the indexes just to see what difference it makes. π
June 13, 2012 at 10:11 am
An index on a commonly-accessed foreign key column is a pretty common way to speed up Join queries. Is that what you're asking about?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 13, 2012 at 10:17 am
Indexing foreign keys is a good first step in an indexing strategy for a DB.
http://sqlinthewild.co.za/index.php/2011/11/11/sql-university-advanced-indexing-indexing-strategies/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 13, 2012 at 11:00 am
Stueyd (6/13/2012)
I've just been handed a database to look at and it's horrible <<snip>>
You don't know what happens if you drop an index covering a pk? Try it π
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
June 14, 2012 at 6:50 am
Don't pull the indexes. That would be bad.
I'm with the others, putting an index on the FKs is a good idea. In fact, you might want to try changing the cluster from the primary key to the foreign keys of those tables. I've seen that have a huge positive impact on queries that are primarily accessing the data through the foreign keys. But these are just guesses & suggestions. To know for sure you need to measure performance, look at the execution plans to understand how the existing structure is working, and then adjust things carefully, testing as you go. Then start the whole process all over again.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 15, 2012 at 12:35 pm
Grant Fritchey (6/14/2012)
Don't pull the indexes. That would be bad.
Unless, of course, you can spend the time to do performance tuning and only add back what's actually required.
If that much data is changing, and it's a batch job, and you can/must afford the maintenance time and added transaction log (backup) size, consider either the drop index/change data/recreate index, or (online if on Enterprise) reindex the indexes after each load.
June 18, 2012 at 2:36 am
As mentioned earlier, it turns out the Dev just used all the recommendations from the DTA so the indexes are a joke and I will get rid of pretty much all of them.
As this is not yet in Production I have a week to test and re-do the indexes. Shouldn't really be my job (I expect to tweak, not do everything from scratch) but at least I have a time code and can bill for it now π
There is a service that runs 24 x 7 that collects and inserts data pretty much continually, once it goes into prod we will ensure we have a window for maintenance but it's just a poor design to start with.
But i'm still unclear.... If I have a column with an identity primary key which has a clustered index (with is continually having data inserted amd deleted) and this appears as a foriegn key elsewhere, is there a benefit in this case of indexing this foriegn key. Yes I know normally it's good practice to do this, but in this particular case do you think it could be beneficial?
Anyway i'll give it a go (along with all the other changes) and report back.....
June 18, 2012 at 2:43 am
Stueyd (6/18/2012)
Yes I know normally it's good practice to do this, but in this particular case do you think it could be beneficial?
You answered your own question there. It's normally good practice to do it, so unless you have a good reason why not to do it (based on careful testing), index it.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 18, 2012 at 2:56 am
Have a look at Glenn Berry's diagnostic information queries[/url]. Two of these queries should be particularly relevant to you; missing indexes, and unused indexes. The comments attached to both are important. The missing index dmv's tend to be heavy handed with include columns, and some indexes may be used intermittently, say monthly or quarterly - so be careful which ones you drop.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 18, 2012 at 2:58 am
Stueyd (6/13/2012)
....So for example, I insert a new row of data into the main table it will also cascade the identity into about 14 other tables....
Interesting. Care to elaborate?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 18, 2012 at 9:38 am
Stueyd (6/18/2012)
But i'm still unclear.... If I have a column with an identity primary key which has a clustered index (with is continually having data inserted amd deleted) and this appears as a foriegn key elsewhere, is there a benefit in this case of indexing this foriegn key. Yes I know normally it's good practice to do this, but in this particular case do you think it could be beneficial?
Well, are the inserted new keys going to be continually increasing? If not, consider your fillfactor carefully to reduce fragmentation growth and page splits. Are the identity primary key values ever going to be updated to new values? Again, fillfactor considerations.
June 18, 2012 at 11:39 am
Stueyd (6/18/2012)
But i'm still unclear.... If I have a column with an identity primary key which has a clustered index (with is continually having data inserted amd deleted) and this appears as a foriegn key elsewhere, is there a benefit in this case of indexing this foriegn key. Yes I know normally it's good practice to do this, but in this particular case do you think it could be beneficial?
No way to know for sure without seeing your queries, structures, & execution plans. As a general concept, I'm with Gail, you've answered your own question. Yes, it's a good practice, which means it's the way I would go, by default, unless I found a good, tested, documented, reason not to.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply