February 24, 2009 at 9:14 pm
Hi,
I am reporting on a general ledger balance table, there are 110 million rows on this table. The reporting on this table to date has been very slow due to my limited knowledge of indexing and having part read some recommendations I decided to use the identity pk as a clustered index. After doing more extensive reading and then reviewing my queries I decided to move see what was common in my reports and then recreate my clustered index. What I noticed after analyzing my reports was that there were a couple of fields which were always in the WHERE clause client_Id, financial year, general ledger type code and whether the row of data was a current instance of not. I have decided to create a clustered index against these four columns and would like to know whether this is the correct approach.
Another thing I would like to mention is this. Whilst the columns mentioned above are always used as aWHERE clause inside the different reports there are also other filters which are also applied on top eg
where clientid = @client
and fiscalyear = @fiscalyear
and typecode in ('a','b','c')
and current_instance = 'Yes'
and accountnumber > 400000 -- this is the extra filter which is not inside the clustered index.
What I would like to know is should i create a non-clustered index for the accountnumber column??
Two final things I would like to ask
1. The general ledger balance table potentially joins 3 other tables using surrogate keys. Should I create an index with all 3 surrogate keys together or should I create 3 indexes each with 1 of the surrogate keys and can this index be seperate from the account number index.
2. should the grouping and ordering have their own indexes which is seperate from the filtering index
select a, b, c, sum(d)
from gl
inner join client
on gl.client_key = client.client_key (should I have 1 non clustered index here or 3)
inner join company
on gl.company_key = company.company_key
where gl.clientid = @client
and gl.fiscalyear = @fiscalyear
and gl.ledger_code in ('A','B','C')
and gl.currentinstance = 'Yes'
and gl.accountNumber > 400000 (should I have a non clustered index here)
group by a, b, c (should I have a non clustered index here)
order by a, b
Any help would be greatly appreciated.
February 24, 2009 at 9:25 pm
Generally, I would not recommend using a multicolumn primary key/clustered index. You may be best finding a single column for your clustered index and then creating multicolumn indexes for your queries.
---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
February 24, 2009 at 9:32 pm
Hi,
thanks for the quick reply. Initially I did use a single column clustered index which was based on my identity ok field (this acts as a surrogate key) and then use multi column non clustered indexes. What I did noticed when I did ran the reports was that the reports were still slow and after turning on stats io there were alot of logical and physical seeks. What column would you recommend I use?
Thanks
Chris
February 25, 2009 at 12:22 am
chrisau168 (2/24/2009)
What I did noticed when I did ran the reports was that the reports were still slow and after turning on stats io there were alot of logical and physical seeks. What column would you recommend I use?
Hard to say. Can you post the exec plan (saved as a .sqlplan file, zipped and attached) and the IO stats?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 25, 2009 at 5:53 am
chrisau168 (2/24/2009)
Hi,I am reporting on a general ledger balance table, there are 110 million rows on this table. The reporting on this table to date has been very slow due to my limited knowledge of indexing and having part read some recommendations I decided to use the identity pk as a clustered index. After doing more extensive reading and then reviewing my queries I decided to move see what was common in my reports and then recreate my clustered index. What I noticed after analyzing my reports was that there were a couple of fields which were always in the WHERE clause client_Id, financial year, general ledger type code and whether the row of data was a current instance of not. I have decided to create a clustered index against these four columns and would like to know whether this is the correct approach.
Another thing I would like to mention is this. Whilst the columns mentioned above are always used as aWHERE clause inside the different reports there are also other filters which are also applied on top eg
where clientid = @client
and fiscalyear = @fiscalyear
and typecode in ('a','b','c')
and current_instance = 'Yes'
and accountnumber > 400000 -- this is the extra filter which is not inside the clustered index.
What I would like to know is should i create a non-clustered index for the accountnumber column??
It really depends on how SQL Server is resolving the query. It's possible that the compound clustered key will eliminate enough values that you'll still get a nice clean index seek, even with the added filter. You might be able to put an index on the accountnumber, but, depending the columns returned, that could lead to a key lookup, which adds about three reads per row, not usually helpful.
Two final things I would like to ask
1. The general ledger balance table potentially joins 3 other tables using surrogate keys. Should I create an index with all 3 surrogate keys together or should I create 3 indexes each with 1 of the surrogate keys and can this index be seperate from the account number index.
Yes... It depends. Probably the best bet for joins is three independent indexes, but again, you have to look at the execution plans to see how the queries are behaving before and after you apply an index to be sure.
2. should the grouping and ordering have their own indexes which is seperate from the filtering index
select a, b, c, sum(d)
from gl
inner join client
on gl.client_key = client.client_key (should I have 1 non clustered index here or 3)
inner join company
on gl.company_key = company.company_key
where gl.clientid = @client
and gl.fiscalyear = @fiscalyear
and gl.ledger_code in ('A','B','C')
and gl.currentinstance = 'Yes'
and gl.accountNumber > 400000 (should I have a non clustered index here)
group by a, b, c (should I have a non clustered index here)
order by a, b
Any help would be greatly appreciated.
Same answer. It depends on how things behave. Indexing is difficult. You need to apply enough indexes, but not too many. You need to put them on the right columns, in the right combination. It really requires a good working knowledge of the data in the system and the queries run against the data. Based on what you've shown, I suspect that clustered index will work well for you. Just remember though, that the clustered key is also the link between a nonclustered index and the clustered index. So the wider the clustered key is, the wider the nonclustered indexes will be. So it's a bit of a balancing act. The rule is to make the clustered key as narrow as you can. However, that doesn't mean as narrow as possible. A lot of people think that the cluster should only ever be a integer column, but testing will show that depending on the system, a wider key structure will work just fine. I've got systems with between four & six integer and date columns in the clustered index that are screaming fast and well maintained systems. But, most of the time they don't have any other indexes to worry about. So again, it's a balancing act.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 25, 2009 at 6:06 pm
Hi,
I tried to run my code with the multi column index and the result has not been all that great. What I then did was take a subset of the data and created the following index based on the recommendation of the database tuner.
This is just a non clustered index without a clustered index.
Chris
March 3, 2009 at 12:23 pm
How many records will a query like this typically return?
It looks to me like you need a clustered index on client_Id in your gl-table. An alternate approach to try may be to actually clusterindex accountnumber, and have a non-clustered index on client_Id.
All the tables you are joining would naturally need an index on the column that you are joining on. Wether this should be a clustered index or not depends on wether there are other queries done separately on these tables that would benefit from a clustered index on something else.
If your main concern is just getting these selects (from your example) to run as fast as possible, I guess it would make sense to have the keys in your joined tables as clustered index as well.
Indexing typecode/ledgercode makes no sense unless there are a lot of different codes in those columns.
current_instance appear to be a bit field(in practice), and generally speaking would make no sense to index.
Some general indexing guidelines:
*Cluster index on the key that you use in selects that return ranges of data (multiple rows). Cluster index on a column that has a high rate of uniqueness, and preferrably that is sortable in some useful way (if your id is a guid it's not very effective)
*There's no real use indexing fields with a very low number of unique fields. Like your column containing maybe 5-10 different letters ('A', 'B' etc) and your field containing 'yes' and 'no' doesn't makes much sense indexing, as they don't give you an effective filter. If you have 120mill rows and half of them is yes, you will still have 60 million rows to scan through after using the index. On the other hand, if you have 50000 rows with 'yes' and the rest is 'no' it may actually be useful with the index, providing you have no other column which would effectively filter down your search (which probably indicates bad design 🙂 )
On a final note I'm struggling a bit to see the sense in your selects, as it seems like you either are getting a serious amount of rows returned(rarely very useful), or are looking for aggregated data like sum. If the latter is mainly the case, I'd seriously consider making one or more tables that actually store periodical aggregated data, which would drastically reduce the number of times you'd have to do selects like this - as the guys who need these numbers could just retrieve the stored aggregated data instead. Since we're talking about a huge table (120mill rows) this makes even more sense.
Hope this helps 🙂
Thorbjorn Kvam,
Project manager and DBA (design) at Payex (http://www.payex.com)
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply