July 11, 2011 at 7:01 am
Hi all,
Pretty broad question, but does anyone have any tips on how I can improve the speed of the creation of indexes?
I have a table with ~60 million records, and 12 columns. One column is an IDENTITY PK, 8 columns are INT FK's, and then there is a DATETIME and two DECIMAL(12, 2) columns.
The table needs to have an index on each of the 8 FK columns, as well as the DATETIME column and one of the DECIMAL(12, 2) columns. Each of those indexes needs to have, as INCLUDED fields, all the other columns (except the PK of course).
Every day, the table is TRUNCATEd and re-created. The indexes are dropped before this occurs, and then re-created after.
The index creation time currently takes ~1 hour.
July 11, 2011 at 8:23 am
kramaswamy (7/11/2011)
Pretty broad question, but does anyone have any tips on how I can improve the speed of the creation of indexes?
Faster disks, more memory. Can't think of much else.
The table needs to have an index on each of the 8 FK columns, as well as the DATETIME column and one of the DECIMAL(12, 2) columns. Each of those indexes needs to have, as INCLUDED fields, all the other columns (except the PK of course).
Does it really, absolutely need all of those insanely wide indexes?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 11, 2011 at 8:36 am
Well, the individual indexes on the 10 columns in question are all required, as any query against this table can be made with any one of the columns as the search parameter. If any of those indexes are removed, then the query which is made against that column will have to do a table scan, and with 60 million rows, that will take much longer than what is currently acceptable.
The included columns is maybe not completely necessary, but I expect that without them the queries will be considerably slower, as they will have to do RID lookups to get the other fields.
One question I have though is this. Let's say I have the following query:
SELECT
A.ID,
A.Field,
B.Field,
C.Field
FROM TableA A
LEFT JOIN TableB B ONb.ID = A.bID
AND b.Field = SomeValue
LEFT JOIN TableC C ON c.ID = A.cID
Lets say that I have an index on A.bID and an index on A.cID, and that b.ID and c.ID are both primary keys of those tables in question, and that there is an FK on TableA for A.bID and A.cID.
I've operated under the assumption that there is a performance increase involved in having the index on A.bID contain, as an included column, A.Field. This should be true, as without having the included column, there would need to be an RID lookup operation done to get the value for A.Field.
But does the same thind hold true for A.cID? Does having A.cID be an included column in the A.bID index, increase the performance of the LEFT JOIN against TableC.ID, or does the fact that A.cID is already its own index make including the column in A.bID not necessary?
July 11, 2011 at 8:53 am
This may help answer that:
http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 11, 2011 at 8:59 am
Unfortunately it doesn't really help, GilaMonster.
I understand that in a situation in which I had a query that did a WHERE clause against all 10 columns, having a single index that had all 10 columns would be a much higher performance output than having 10 indexes.
The problem is that I'm in a situation in which any one of those 10 columns could be used as a WHERE clause on its own. If I had a single covering index, the only way it would be used is if the first column in the index was the one being queried for - the other columns would just do an INDEX SCAN.
Also doesn't really explain my followup question as to whether the other columns being included as included columns increases performance when doing JOIN operations with columns that have their own indexes already defined.
July 11, 2011 at 9:06 am
kramaswamy (7/11/2011)
Also doesn't really explain my followup question as to whether the other columns being included as included columns increases performance when doing JOIN operations with columns that have their own indexes already defined.
Include, maybe not. Key column, well see that blog post about performance using 2 indexes vs 1. (was why I posted it)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 11, 2011 at 9:24 am
Oh! Of course. Hadn't really thought about it at all like that, but it makes much more sense. It hadn't really clicked because in this situation we're looking at JOIN clauses instead of WHERE clauses, but you're saying that they are basically the same? If I say
LEFT JOIN TableB b ON b.ID = A.bID
LEFT JOIN TableC c ON c.ID = A.cID
And I have an index which contains both A.bID and a.cID, the query optimiser is able to make use of this single index for both the JOIN clauses, similar to how it would if I were to write a query such as
WHERE A.bID = [Number] AND A.cID = [Number]
July 11, 2011 at 9:47 am
Can be. Not as simple as that, but it's about giving the optimiser the option and seeing what it thinks.
Back to the original question though, short of better hardware I can't really see any good options here.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 11, 2011 at 9:51 am
GilaMonster (7/11/2011)
Can be. Not as simple as that, but it's about giving the optimiser the option and seeing what it thinks.Back to the original question though, short of better hardware I can't really see any good options here.
Is kicking everyone out an option? The servers always work awesome in those circumstances!
July 11, 2011 at 9:54 am
I'd like to dig your head a bit more on this topic if possible GilaMonster,
The basic query which I'm using against this table structure is always set up as follows:
SELECT
t1.ID,
t1.Field,
t2.Field,
t3.Field,
t4.Field
FROM #Table1 t1
LEFT JOIN #Table2 t2 ON t1.t2ID = t2.ID
LEFT JOIN #Table3 t3 ON t1.t3ID = t3.ID
LEFT JOIN #Table4 t4 ON t1.t4ID = t4.ID
Essentially I'm building a pseudo-view, from a set of normalized tables containing data, and a single table containing transactions.
From this pseudo-view, I'm doing queries against specific pieces of information. All of those queries though, are using data contained in the normalized tables. For example, I might modify that above query to become
SELECT
t1.ID,
t1.Field,
t2.Field,
t3.Field,
t4.Field
FROM #Table1 t1
LEFT JOIN #Table2 t2 ON t1.t2ID = t2.ID AND t2.Field = 123
LEFT JOIN #Table3 t3 ON t1.t3ID = t3.ID
LEFT JOIN #Table4 t4 ON t1.t4ID = t4.ID
t2.Field, in this example, would have an index already created on it, in #Table2.
The indexes which I've created on #Table1, are there only for the purposes of making those LEFT JOINS perform correctly.
If I dropped most of the indexes, and created only a single long index, which contained all the foreign keys as indexed fields, would the query optimiser decide to use this index for all the LEFT JOIN operations against the normalized tables? Would it perform better or worse, or the same, as having individual indexes against each of the fields?
In the example in your article, you were using WHERE clauses, where I can see quite clearly that it works effectively. I just don't know if this will work against my table with LEFT JOINs.
July 11, 2011 at 9:55 am
Ninja's_RGR'us (7/11/2011)
GilaMonster (7/11/2011)
Can be. Not as simple as that, but it's about giving the optimiser the option and seeing what it thinks.Back to the original question though, short of better hardware I can't really see any good options here.
Is kicking everyone out an option? The servers always work awesome in those circumstances!
Sure its an option. The job is running early morning so nobody should even be using it, but are you suggesting that users who have open idle connections to the server would still slow the speed down?
July 11, 2011 at 9:58 am
No I was simply joking.
Short of adding ressources, limiting ressources usage is the only other option here.
July 11, 2011 at 11:02 am
Damn. Well, tried creating a single index with all foreign keys in it, and that didn't work. Although in retrospect, I suppose that makes sense - if I do a join against a table and then supply an AND clause for the join, the optimiser will probably do a query against the joined table, then do a hash match or something, against the main table. Wouldn't be able to use the index if the index didn't have that column as the first column in the list.
July 11, 2011 at 12:44 pm
kramaswamy (7/11/2011)
If I dropped most of the indexes, and created only a single long index, which contained all the foreign keys as indexed fields, would the query optimiser decide to use this index for all the LEFT JOIN operations against the normalized tables? Would it perform better or worse, or the same, as having individual indexes against each of the fields?
It depends. Really it depends on data volumes, chosen join types and more things than I can think of right now.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply