December 18, 2006 at 11:53 am
Description of problem.
We have in the Database table ConsumerListing with a number of
searchable
columns. We can not predict the combination of search conditions.
Because of
that we chose to create a number of one-column non-clustered indexes.
Two types of searches are being performed on the table.
Type 1: we need to access all records which satisfy certain conditions.
In
our example people who Dogs and interested in Gardening in the household.
Type 2 we need to access records which satisfy certain conditions and
have
certain phone numbers. To achieve that we decided in addition to
one-column
non-clustered indexes to create indexes with keys AreaCode, Phone and
add
all the other search conditions on the leaf level of non clustered index
(included columns new for SQL Server 2005).
We decided to test query
select count(*) from ConsumerListing
where
Dogs='Y' AND Gardner='Y'
It took long. Query optimizer issued following execution plan:
StmtText
------------------------------------------------------------------------
----
------------------------------------------------------------------------
----
-----------------------------------------------
|--Compute
Scalar(DEFINE[Expr1004]=CONVERT_IMPLICIT(int,[globalagg1008],0)))
|--Stream
Aggregate(DEFINE[globalagg1008]=SUM([partialagg1007])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE[partialagg1007]=Count(*)))
|--Index
Scan(OBJECT[DD_Data].[dbo].[ConsumerListing].[Ind_Phone]),
WHERE[DD_Data].[dbo].[ConsumerListing].[Dogs]='Y' AND
[DD_Data].[dbo].[ConsumerListing].[Gardner]='Y'))
Meaning instead of seeking indexes on Dogs and Cats and merging results
(Behavior we expected) optimaizer choose to scan
Ind_Phone.
We decided to overwrite this with index hints
select count(*) from ConsumerListing
with (index (IND_Dogs, Ind_Gardner))
where
Dogs='Y' AND Gardner='Y'
We get expected Behavior
StmtText
------------------------------------------------------------------------
----
------------------------------------------------------------------------
----
------------------------------------------------------------------------
----
------------------------------------------------------------------------
----
------------------------------------------------------------------------
----
------------------------------------------------------------------------
----
-----------------------------------------------
|--Compute
Scalar(DEFINE[Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
|--Stream Aggregate(DEFINE[Expr1007]=Count(*)))
|--Merge Join(Inner Join,
MERGE[DD_Data].[dbo].[ConsumerListing].[AddressID],
[DD_Data].[dbo].[ConsumerListing].[IndividualId],
[Uniq1002])=([DD_Data].[dbo].[ConsumerListing].[AddressID],
[DD_Data].[dbo].[ConsumerListing].[IndividualId], [Uniq1002]),
RESIDUAL[DD_Data].[dbo].[ConsumerListing].[AddressID] =
[DD_Data].[dbo].[ConsumerListing].[AddressID] AND
[DD_Data].[dbo].[ConsumerListing].[IndividualId] =
[DD_Data].[dbo].[ConsumerListing].[IndividualId] AND [Uniq1002] =
[Uniq1002]))
|--Index
Seek(OBJECT[DD_Data].[dbo].[ConsumerListing].[Ind_Dogs]),
SEEK[DD_Data].[dbo].[ConsumerListing].[Dogs]='Y') ORDERED FORWARD)
|--Index
Seek(OBJECT[DD_Data].[dbo].[ConsumerListing].[Ind_Gardner]),
SEEK[DD_Data].[dbo].[ConsumerListing].[Gardner]='Y') ORDERED FORWARD)
We decided to test both situations with
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SET STATISTICS IO ON
SET STATISTICS TIME ON
Resulsts for SET STATISTICS TIME ON SQL Server Execution Times:
CPU time = 36266 ms, elapsed time = 396475 ms.--default
CPU time = 19032 ms, elapsed time = 19245 ms.--with index hints
In my oppinion wiht index hints times match better
Results from STATISTICS TIME ON
Table 'ConsumerListing'. Scan count 5, logical reads 3948831, physical
reads
7, read-ahead reads 3939681, lob logical reads 0, lob physical reads 0,
lob
read-ahead reads 0.--default
Table 'ConsumerListing'. Scan count 2, logical reads 88589, physical
reads
6, read-ahead reads 88581, lob logical reads 0, lob physical reads 0,
lob
read-ahead reads 0.----with index hints
Again all results are better when we're using two one column indexes.
All our one column indexes has low selectivity
Partial formatted results from DBCC SHOW_STATISTICS bellow:
Dogs | | | | |
RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS |
U | 0 | 169,693,200 | 0 | 1 |
Y | 0 | 34,562,520 | 0 | 1 |
Gardener | | | | |
RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS |
U | 0 | 159,358,900 | 0 | 1 |
Y | 0 | 44,896,820 | 0 | 1 |
Table has 204,255,720
Please note that regarding of low selectivity if you will run
select count(*) from ConsumerListing
where
Dogs='Y'
(Or anything with only one condition) optimizer will use ind_dogs (seek)(or any other appropriate index.
The question is why optimizer is using such an ineffective execution
plan?
What can we do to avoid index hints( will be difficult to program)?
Help will be greatly appreciated.
December 18, 2006 at 12:07 pm
2 options come to mind at the moment :
1 - sp_updatestats
2 - Create a table variable with the key column(s) of the table.
Insert into @TV (KeyCol) Select KeyCol FROM dbo.YourTable WHERE SomeCol = @SomeVar
DELETE @TV TV WHERE NOT Exists (Select * FROM dbo.YourTable YT WHERE SecondCol = @SecondParam AND TV.KeyCol = YT.KeyCol)
DELETE...
DELETE...
DELETE...
SELECT Needed, Columns FROM dbo.YourTable YT INNER JOIN @TV TV ON YT.KeyCol = TV.KeyCol
I have never tried this server side, but I got good results client side on a search engine with a monstruous amount of search arguments variations (200+). You might also add a condition after each delete to end the execution of the code (may be faster than deleting 0 rows with exists but I have never tested for that).
This technic also gives you the chance to put the most selective search options at the top of the list so that each successive search becomes much less expansive.
December 18, 2006 at 1:31 pm
1.Statistics is up to date I checked it. This is static table
2. Our table is 200 Million rows I do not think this is going to work for us
Thank you for your replay
December 18, 2006 at 1:36 pm
1 - Can more than one parameter be search at the same time, if yes what is the limit?
2 - How many parameters are passed to this procedure?
3 - Which are optional?
4 - Do you need paging of the results?
5 - How do you plan to avoid an index scan with that number of parameters?
Can we see the table DDL (with indexes and keys)?
December 18, 2006 at 1:37 pm
This article may be of particular interest to you :
December 19, 2006 at 3:47 am
The situation is quite easy.
where Dogs='Y' AND Gardner='Y'
That kind of where clause will never use any index as the selectivity is very bad (there can be only Y and N values in. It is much faster NOT to use any index access but go directly to the table and go trough the whole table. An index hint could be even worse in term of response time
If you want to use any index chose a more selective condition like phone number or date or a covering index (in this case the select * is not suitable) and make it clustered.
But you have to carefull by making an index clustered. You have to know how the users are accessing your table (ie what is the user behavior in terms of insert, udpate, select...)
Bye
Gabor
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply