June 19, 2009 at 6:03 am
Hi,
Say I have a table (policy) that has policyID, dateFrom and dateTo, what would be the best index for the following query?
select policyID
from policy
where ndate between dateFrom and dateTo
or
select ndate, policyID
from policy p
inner join datesequence ds
on ds.ndate between p.datefrom and p.dateTo
Any suggestions I have tried various different combinations but was wondering if anybody else had experiences or this?
June 19, 2009 at 7:24 am
Have you tried to create clustered index on ndate? They are very good at range queries.
Regards
Piotr
...and your only reply is slàinte mhath
June 19, 2009 at 7:28 am
the clustered index is already there on ndate as it is a date sequence table.....
the issue is with the high volume policy table... however i'm am just experiment with using sub querys instead it seems the otimiser likes them better?
June 19, 2009 at 8:00 am
An index that has all three columns, which makes it covering. It can also be covering if one or more (or all three) of the indexes is in the clustered index. If one, say the ID column, is the cluster, then the second index would only need the two date columns.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 19, 2009 at 8:19 am
Right,
Sorry for my previous brief posts with bad spelling and grammer. Here is what I have found:
The optimiser deals better with a sub query look up rather than a looking between two values.
So my query as I wrote ealier:
Select ds.ndate, p.policyID
From policy p
Cross join
dateSequence ds
where ds.ndate>=p.dateFrom and ds.ndate=dateFrom and policyID=p.policyID)
This seem slightly weird because I’ve always had it in my head that dateFrom/dateTo are far better to work with than date effectives.
Does anybody have any experiences/alternatives?
Dan
June 19, 2009 at 8:47 am
dan (6/19/2009)
Right,Sorry for my previous brief posts with bad spelling and grammer. Here is what I have found:
The optimiser deals better with a sub query look up rather than a looking between two values.
So my query as I wrote ealier:
Select ds.ndate, p.policyID
From policy p
Cross join
dateSequence ds
where ds.ndate>=p.dateFrom and ds.ndate=dateFrom and policyID=p.policyID)
This seem slightly weird because I’ve always had it in my head that dateFrom/dateTo are far better to work with than date effectives.
Does anybody have any experiences/alternatives?
Dan
You might find that using a TOP 1 with an ORDER BY works better than the aggregation using MAX. It's not always the case, but it frequently is.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 19, 2009 at 9:03 am
Kimberly Tripp just had a blog post titled "Why aren't those nonclustered indexes being used?" that answers the question "What percentage of data IS selective enough to use a nonclustered index which doesn't cover the query." http://www.sqlskills.com/BLOGS/KIMBERLY/post/Tipping-Point-Queries-More-Questions-To-Really-Test-You!.aspx
The answer is if the select statement accesses more than about 2% of the rows, then the nonclustered index will not be used. I found that the % can be much lower depending on the width of the rows. The larger the rows,the more likely a non-clustered index will be used and the smaller the row, the more like the non-clustered index will not be used.
A more exact calculation can be done with this SQL and you must provide two values: the number of rows access by the select statment and the name of the table.
declare@SelectRowCnt integer
,@ExtentCntinteger
,@TotalRowCntinteger
,@CixLevelCntinteger
,@SelectPctnumeric(12,2)
,@ObjectIdinteger
-- Replace the ? with the appropriate values
set@ObjectId= object_id('?')
set@SelectRowCnt= ?
select@ExtentCnt= page_count / 8
,@TotalRowCnt= record_count
fromsys.dm_db_index_physical_stats
( db_id()
, @ObjectId
, 1, default, 'DETAILED' )
WHEREindex_level = 0
select@CixLevelCnt = count(*)
fromsys.dm_db_index_physical_stats
( db_id()
, @ObjectId
, 1, default, 'DETAILED' )
SET@SelectPct = CAST(@SelectRowCnt as numeric(12,2) ) * 100 / @TotalRowCnt
PRINT'Rows total count: ' + cast(@TotalRowCnt as varchar(8) )
PRINT'Rows selected count: ' + cast(@SelectRowCnt as varchar(8) )
PRINT'Rows selected % ' + cast( @SelectPct as varchar(53) )
PRINT'Clustered Index level: ' + cast(@CixLevelCnt as varchar(8) )
PRINT'Scan IO count: ' + cast( @ExtentCnt as varchar(8) )http://www.sqlservercentral.com/Forums/Skins/Classic/Images/RichTextBoxTable/tbl_bottomleft.gif
PRINT'Index IO count: ' + cast(@SelectRowCnt * ( @CixLevelCnt + 1 ) as varchar(8) )
IF(@SelectRowCnt * ( @CixLevelCnt + 1 ) ) < @ExtentCnt
PRINT 'Index will be useful'
ELSE
Print 'Index will NOT be used'
go
SQL = Scarcely Qualifies as a Language
June 19, 2009 at 11:44 pm
dan (6/19/2009)
Actually performs far better when rewritten like thisSelect ds.ndate, p.policyID
From policy p
Cross join
dateSequence ds
where p.dateFrom = (select max(dateFrom) from policy where ds.ndate>=dateFrom and policyID=p.policyID)
Hey Dan,
I would just say that for questions like this, you will generally get better responses if you post a script to set up the tables and indexes with some test data. This allows us to easily reproduce the issue and suggest alternatives.
I have created a script which may or may not resemble your requirement. In this case, the test query at the end performs far better than the example I quote above.
If you can take a look and change the script as necessary to demonstrate the issue you are seeing, that would probably help a great deal.
Paul
-- Move to a safe place
USE tempdb;
GO
-- Conditional table drops
IF OBJECT_ID(N'dbo.Policy', N'U') IS NOT NULL DROP TABLE dbo.Policy;
IF OBJECT_ID(N'dbo.DateSequence', N'U') IS NOT NULL DROP TABLE dbo.DateSequence;
GO
-- Create the test tables
CREATE TABLE dbo.Policy
(
policy_idINT NOT NULL PRIMARY KEY CLUSTERED,
date_fromDATETIME NOT NULL,
date_toDATETIME NOT NULL DEFAULT CONVERT(DATETIME, '9999-12-31', 120),
);
CREATE TABLE dbo.DateSequence
(
the_dateDATETIME NOT NULL PRIMARY KEY CLUSTERED
);
GO
-- Add 100K rows to the date sequence table - one row per date
WITHNumbers (N) AS
(
SELECTTOP (100000)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROMmaster.sys.columns C1
CROSS
JOINmaster.sys.columns C2
)
INSERTdbo.DateSequence (the_date)
SELECTCONVERT(DATETIME, N)
FROMNumbers
ORDERBY
N;
GO
-- Add 5K rows to the policy table, with random date ranges
WITHNumbers (N, Random1, Random2) AS
(
SELECTTOP (5000)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
CONVERT(INT, RAND(CHECKSUM(NEWID())) * 15000) + 36500,-- date_from
CONVERT(INT, RAND(CHECKSUM(NEWID())) * 365) + 28-- days after date_from
FROMmaster.sys.columns C1
CROSS
JOINmaster.sys.columns C2
)
INSERTdbo.Policy (policy_id, date_from, date_to)
SELECTN,
CONVERT(DATETIME, Random1),
CONVERT(DATETIME, Random1 + Random2)
FROMNumbers
ORDERBY
N;
GO
--
-- Test query on a range of policy IDs
--
-- Start from cold
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
--
-- Show performance statistics
--
SET STATISTICS IO, TIME ON;
--
SELECTP.policy_id, DS.the_date
FROMdbo.Policy AS P
JOINdbo.DateSequence AS DS
ONDS.the_date BETWEEN P.date_from AND P.date_to
WHEREP.policy_id BETWEEN 1234 AND 1264;
/*
(6858 row(s) affected)
Table 'DateSequence'. Scan count 31, logical reads 129, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Policy'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 0 ms, elapsed time = 333 ms.
*/
--
SET STATISTICS IO, TIME OFF;
GO
--
-- Tidy up
--
DROP TABLE dbo.Policy, dbo.DateSequence;
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply