Optimizer not using our indexes

Question

Post reply

Optimizer not using our indexes

lychev

SSC Rookie

Points: 38

December 18, 2006 at 11:53 am

#171641

Description of problem.

We have in the Database table ConsumerListing with a number of

searchable

columns. We can not predict the combination of search conditions.

Because of

that we chose to create a number of one-column non-clustered indexes.

Two types of searches are being performed on the table.

Type 1: we need to access all records which satisfy certain conditions.

In

our example people who Dogs and interested in Gardening in the household.

Type 2 we need to access records which satisfy certain conditions and

have

certain phone numbers. To achieve that we decided in addition to

one-column

non-clustered indexes to create indexes with keys AreaCode, Phone and

add

all the other search conditions on the leaf level of non clustered index

(included columns new for SQL Server 2005).

We decided to test query

select count(*) from ConsumerListing

where

Dogs='Y' AND Gardner='Y'

It took long. Query optimizer issued following execution plan:

StmtText

------------------------------------------------------------------------

----

------------------------------------------------------------------------

----

-----------------------------------------------

|--Compute

Scalar(DEFINE[Expr1004]=CONVERT_IMPLICIT(int,[globalagg1008],0)))

|--Stream

Aggregate(DEFINE[globalagg1008]=SUM([partialagg1007])))

|--Parallelism(Gather Streams)

|--Stream Aggregate(DEFINE[partialagg1007]=Count(*)))

|--Index

Scan(OBJECT[DD_Data].[dbo].[ConsumerListing].[Ind_Phone]),

WHERE[DD_Data].[dbo].[ConsumerListing].[Dogs]='Y' AND

[DD_Data].[dbo].[ConsumerListing].[Gardner]='Y'))

Meaning instead of seeking indexes on Dogs and Cats and merging results

(Behavior we expected) optimaizer choose to scan

Ind_Phone.

We decided to overwrite this with index hints

select count(*) from ConsumerListing

with (index (IND_Dogs, Ind_Gardner))

where

Dogs='Y' AND Gardner='Y'

We get expected Behavior

StmtText

------------------------------------------------------------------------

----

------------------------------------------------------------------------

----

------------------------------------------------------------------------

----

------------------------------------------------------------------------

----

------------------------------------------------------------------------

----

------------------------------------------------------------------------

----

-----------------------------------------------

|--Compute

Scalar(DEFINE[Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))

|--Stream Aggregate(DEFINE[Expr1007]=Count(*)))

|--Merge Join(Inner Join,

MERGE[DD_Data].[dbo].[ConsumerListing].[AddressID],

[DD_Data].[dbo].[ConsumerListing].[IndividualId],

[Uniq1002])=([DD_Data].[dbo].[ConsumerListing].[AddressID],

[DD_Data].[dbo].[ConsumerListing].[IndividualId], [Uniq1002]),

RESIDUAL[DD_Data].[dbo].[ConsumerListing].[AddressID] =

[DD_Data].[dbo].[ConsumerListing].[AddressID] AND

[DD_Data].[dbo].[ConsumerListing].[IndividualId] =

[DD_Data].[dbo].[ConsumerListing].[IndividualId] AND [Uniq1002] =

[Uniq1002]))

|--Index

Seek(OBJECT[DD_Data].[dbo].[ConsumerListing].[Ind_Dogs]),

SEEK[DD_Data].[dbo].[ConsumerListing].[Dogs]='Y') ORDERED FORWARD)

|--Index

Seek(OBJECT[DD_Data].[dbo].[ConsumerListing].[Ind_Gardner]),

SEEK[DD_Data].[dbo].[ConsumerListing].[Gardner]='Y') ORDERED FORWARD)

We decided to test both situations with

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

SET STATISTICS IO ON

SET STATISTICS TIME ON

Resulsts for SET STATISTICS TIME ON SQL Server Execution Times:

CPU time = 36266 ms, elapsed time = 396475 ms.--default

CPU time = 19032 ms, elapsed time = 19245 ms.--with index hints

In my oppinion wiht index hints times match better

Results from STATISTICS TIME ON

Table 'ConsumerListing'. Scan count 5, logical reads 3948831, physical

reads

7, read-ahead reads 3939681, lob logical reads 0, lob physical reads 0,

lob

read-ahead reads 0.--default

Table 'ConsumerListing'. Scan count 2, logical reads 88589, physical

reads

6, read-ahead reads 88581, lob logical reads 0, lob physical reads 0,

lob

read-ahead reads 0.----with index hints

Again all results are better when we're using two one column indexes.

All our one column indexes has low selectivity

Partial formatted results from DBCC SHOW_STATISTICS bellow:

Dogs

RANGE_HI_KEY

RANGE_ROWS

EQ_ROWS

DISTINCT_RANGE_ROWS

AVG_RANGE_ROWS

U

0

169,693,200

0

1

Y

0

34,562,520

0

1

Gardener

RANGE_HI_KEY

RANGE_ROWS

EQ_ROWS

DISTINCT_RANGE_ROWS

AVG_RANGE_ROWS

U

0

159,358,900

0

1

Y

0

44,896,820

0

1

Table has 204,255,720

Please note that regarding of low selectivity if you will run

select count(*) from ConsumerListing

where

Dogs='Y'

(Or anything with only one condition) optimizer will use ind_dogs (seek)(or any other appropriate index.

The question is why optimizer is using such an ineffective execution

plan?

What can we do to avoid index hints( will be difficult to program)?

Help will be greatly appreciated.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

2 options come to mind at the moment :

1 - sp_updatestats

2 - Create a table variable with the key column(s) of the table.

Insert into @TV (KeyCol) Select KeyCol FROM dbo.YourTable WHERE SomeCol = @SomeVar

DELETE @TV TV WHERE NOT Exists (Select * FROM dbo.YourTable YT WHERE SecondCol = @SecondParam AND TV.KeyCol = YT.KeyCol)

DELETE...

SELECT Needed, Columns FROM dbo.YourTable YT INNER JOIN @TV TV ON YT.KeyCol = TV.KeyCol

I have never tried this server side, but I got good results client side on a search engine with a monstruous amount of search arguments variations (200+). You might also add a condition after each delete to end the execution of the code (may be faster than deleting 0 rows with exists but I have never tested for that).

This technic also gives you the chance to put the most selective search options at the top of the list so that each successive search becomes much less expansive.

lychev SSC Rookie Points: 38 More actions · Answer 2

1.Statistics is up to date I checked it. This is static table

2. Our table is 200 Million rows I do not think this is going to work for us

Thank you for your replay

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

1 - Can more than one parameter be search at the same time, if yes what is the limit?

2 - How many parameters are passed to this procedure?

3 - Which are optional?

4 - Do you need paging of the results?

5 - How do you plan to avoid an index scan with that number of parameters?

Can we see the table DDL (with indexes and keys)?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

This article may be of particular interest to you :

http://www.sommarskog.se/dyn-search.html

Gabor Nyul SSCrazy Eights Points: 9459 More actions · Answer 5

The situation is quite easy.

where Dogs='Y' AND Gardner='Y'

That kind of where clause will never use any index as the selectivity is very bad (there can be only Y and N values in. It is much faster NOT to use any index access but go directly to the table and go trough the whole table. An index hint could be even worse in term of response time

If you want to use any index chose a more selective condition like phone number or date or a covering index (in this case the select * is not suitable) and make it clustered.

But you have to carefull by making an index clustered. You have to know how the users are accessing your table (ie what is the user behavior in terms of insert, udpate, select...)

Bye
Gabor