May 5, 2008 at 12:16 pm
I have a 50 million row table. This table has 5 indexes. Where I tried to select using a where clause it's does not take the index (the column used in the where clause is also use as an index) . Instead a full table scan is done.
When I use a hint to force the index, the result us much faster. How come the Optimizer does not pick the index? What can I do to avoid using a hint?
Thanks
May 5, 2008 at 12:35 pm
Can you post table schema, index definition, query and aprox rowcount affected by the query please?
It's probably because the index is not covering, and query affects enough rows that the optimiser thinks a scan would be more efficient.
Are your statistics up to date? (update statistics < table name >)
Is the index fragmented?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 5, 2008 at 12:49 pm
Hello GilaMonster,
Here is the information:
rowcount whole table:+/-36000000
expected rowcount return: 244918
CREATE TABLE [dbo].[REP_table] (
[ID] [int] NOT NULL ,
[Qty] [int] NOT NULL ,
[Price] [money] NOT NULL ,
[FPrice] [money] NOT NULL ,
[Fline] [money] NOT NULL ,
[GNet] [money] NOT NULL ,
[Prod] [int] NOT NULL ,
[Dist] [varchar] (8) NOT NULL ,
[CE] [money] NOT NULL ,
[Date] [datetime] NOT NULL ,
[Acct] [varchar] (23) NOT NULL ,
[COT] [varchar] (3) NOT NULL ,
[Package] [int] NOT NULL ,
[Discount] [money] NOT NULL ,
[Route] [varchar] (12) NOT NULL
)
CREATE INDEX [IX_Dist] ON [dbo].[REP_table] ([Dist])
I have drop/recreate index. I also updated the stats for all the indexes and even for that one.
select * frep rep_table where dist='ppst' ***** very very long
select * frep rep_table with (name(IX_DIST) where dist='ppst' **** much faster
May 5, 2008 at 12:57 pm
Ok.
My guess. As far as the optimiser's concerned, the query returns too many rows for a seek/bookmark lookup to be optimal, and the index is not covering. Hence it uses a scan instead.
Do you have a clustered index on that table? If so, what's it on and how fragmented is it?
Do you need all the columns returned (select *) or just some of them?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 5, 2008 at 1:27 pm
Hello,
We currently don't have clustered index in this table. The table (36 million rows)is truncate on a weekly basis and then repopulated. All indexes are rebuild afterward. I am not sure about the fragmentation.
The select * is to move data between database when needed by our end-users.
I know this might be not the best process, but we need to live with that for now. I will be reviewing that in a few months... Meanwhile, I need to figure a way to speed-up the query.
This mean that the optimizer is not that bright :w00t:? I am coming from a Sybase & Oracle background, and this king of manipulation was taking always the indexes (without adding a hint).
Thanks
May 5, 2008 at 2:20 pm
The optimiser's very good usually. This is just an edge case where its estimate is wrong. Possibly because the table's a heap, not a cluster.
If you don't have a clustered index, and this particular query runs often, perhaps make this index clustered and see how that works.
What other indexes are there? What other queries run?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 5, 2008 at 2:21 pm
Did you try to update your statistics, which is used in estmated execution plans?
May 5, 2008 at 2:38 pm
Isn't there a primary key / unique key missing ?
I mean, every row should have a unique pointer if possible at all.
My guess is your [id] column may be just that.
Also keep in mind you can only have one clustering index on your table, because the leaf level of your clix contains the actual row data !
(Read bol on indexes)
If the results on making your current index the clustering one aren't that good, maybe you could use an extra unique index on your [id] column and
make that one the clustering one.
I advise every table should have a (narrow) clustering index, unless prooven bad for that case.
You may have to experiment a bit to feel how it behaves.
Anyway, keep in mind to load your data in the same physical order as yoru clustering index definition. (avoid plits at load time)
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 5, 2008 at 2:47 pm
I am new at this place. I will need to dig out how the queries are done on that table. I don't want to jeopardize the all other queries !!!! I just found it a bit strange !!!!
The stats are up to date. The table is truncated/repopulated on a weekly basis and index are recreated. After that, no insert/update/delete are done, just select.
Any help will be appreciated !
Thanks
May 6, 2008 at 12:19 am
I would recommend a clustered index on that table. Without knowing what other queries run and how the data's distributed, I can't recommend what column to put it on.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 6, 2008 at 11:21 am
Instead of moving the index from non-clustered to clustered, if I add a primary key on field "ID" to the table, this work?
Thanks
May 6, 2008 at 11:43 am
Rem (5/6/2008)
Instead of moving the index from non-clustered to clustered, if I add a primary key on field "ID" to the table, this work?Thanks
I'm not quite sure what you mean by "this works?", but "regular" indexing by ID would do exactly nothing to help your current scenario. Since the clustered index determines how the data is actually stored, it can be used effectively to speed up "range" searches, so - if the majority of your searches are based on criteria against the dist column - then making that column the leading edge your your clustered index (possibly dist + ID as the clustered key) might speed things up quite a bit.
In general, tables as big as what you're describing tend to need a clustered index to help with defragmentating the table, but it would be nice to actually put the clustered index to good use as well.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 6, 2008 at 12:58 pm
What I meant was if I add a Primary Key Constraint, the data will be sort according my key and will not be a heap table anymore.
In a previous reply to my post it was mentioned that a heap table can have some issues. After talking to some folks here and knowing more the apps hitting this table the ID seems to be the best candidate to be use as a primary key.
Thanks for your help.
May 6, 2008 at 1:09 pm
Rem (5/6/2008)
What I meant was if I add a Primary Key Constraint, the data will be sort according my key and will not be a heap table anymore.In a previous reply to my post it was mentioned that a heap table can have some issues. After talking to some folks here and knowing more the apps hitting this table the ID seems to be the best candidate to be use as a primary key.
Thanks for your help.
Setting a primary key is nice - but the clustered index is what will make it not be a heap anymore.
Of course - You can designate the ID to be BOTH the primary key and the clustered key.
That being said - making ID the clustered index or the primary key isn't going to "help" with why this index isn't used. On the other hand, setting up a compound clustered index, with dist as the left column (possibly followed by ID) , would pretty much guarantee that that will get used in this query - whether it makes sense with regards to the other activity on this table would be something you have to evaluate.
The bookmark lookups will make it very unlikely that that index will get picked on its own (it sounds like it's not incredibly selective, so like Gail mentioned - the compiler figures that it would be cheaper to just table-scan for the rows).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 6, 2008 at 1:26 pm
I said that the fact that the table's a heap may have something to do with how the optimiser costs the seek vs the scan. Your query pulls 0.06% of the table, so it is within the selectivity range where the optimiser might pick a seek. I'm not sure about this. I don't know as much about the innards of the optimiser as I would like.
I doubt it will hurt the query, but I can't guarantee that it will improve it.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply