Suggest indexes for non-equi join

Question

Post reply

Suggest indexes for non-equi join

dan-856459

SSC Enthusiast

Points: 138
More actions
June 19, 2009 at 6:03 am

#210982

Hi,
Say I have a table (policy) that has policyID, dateFrom and dateTo, what would be the best index for the following query?
select policyID
from policy
where ndate between dateFrom and dateTo
or
select ndate, policyID
from policy p
inner join datesequence ds
on ds.ndate between p.datefrom and p.dateTo
Any suggestions I have tried various different combinations but was wondering if anybody else had experiences or this?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Piotr.Rodak SSCrazy Eights Points: 9376 More actions · Answer 1

Have you tried to create clustered index on ndate? They are very good at range queries.

Regards

Piotr

...and your only reply is slàinte mhath

dan-856459 SSC Enthusiast Points: 138 More actions · Answer 2

the clustered index is already there on ndate as it is a date sequence table.....

the issue is with the high volume policy table... however i'm am just experiment with using sub querys instead it seems the otimiser likes them better?

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 3

An index that has all three columns, which makes it covering. It can also be covering if one or more (or all three) of the indexes is in the clustered index. If one, say the ID column, is the cluster, then the second index would only need the two date columns.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

dan-856459 SSC Enthusiast Points: 138 More actions · Answer 4

Right,

Sorry for my previous brief posts with bad spelling and grammer. Here is what I have found:

The optimiser deals better with a sub query look up rather than a looking between two values.

So my query as I wrote ealier:

Select ds.ndate, p.policyID

From policy p

Cross join

dateSequence ds

where ds.ndate>=p.dateFrom and ds.ndate=dateFrom and policyID=p.policyID)

This seem slightly weird because I’ve always had it in my head that dateFrom/dateTo are far better to work with than date effectives.

Does anybody have any experiences/alternatives?

Dan

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 5

dan (6/19/2009)
Right,
Sorry for my previous brief posts with bad spelling and grammer. Here is what I have found:
The optimiser deals better with a sub query look up rather than a looking between two values.
So my query as I wrote ealier:
Select ds.ndate, p.policyID
From policy p
Cross join
dateSequence ds
where ds.ndate>=p.dateFrom and ds.ndate=dateFrom and policyID=p.policyID)
This seem slightly weird because I’ve always had it in my head that dateFrom/dateTo are far better to work with than date effectives.
Does anybody have any experiences/alternatives?
Dan

You might find that using a TOP 1 with an ORDER BY works better than the aggregation using MAX. It's not always the case, but it frequently is.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Carl Federl One Orange Chip Points: 25384 More actions · Answer 6

Kimberly Tripp just had a blog post titled "Why aren't those nonclustered indexes being used?" that answers the question "What percentage of data IS selective enough to use a nonclustered index which doesn't cover the query." http://www.sqlskills.com/BLOGS/KIMBERLY/post/Tipping-Point-Queries-More-Questions-To-Really-Test-You!.aspx

The answer is if the select statement accesses more than about 2% of the rows, then the nonclustered index will not be used. I found that the % can be much lower depending on the width of the rows. The larger the rows,the more likely a non-clustered index will be used and the smaller the row, the more like the non-clustered index will not be used.

A more exact calculation can be done with this SQL and you must provide two values: the number of rows access by the select statment and the name of the table.

declare@SelectRowCnt integer

,@ExtentCntinteger

,@TotalRowCntinteger

,@CixLevelCntinteger

,@SelectPctnumeric(12,2)

,@ObjectIdinteger

-- Replace the ? with the appropriate values

set@ObjectId= object_id('?')

set@SelectRowCnt= ?

select@ExtentCnt= page_count / 8

,@TotalRowCnt= record_count

fromsys.dm_db_index_physical_stats

( db_id()

, @ObjectId

, 1, default, 'DETAILED' )

WHEREindex_level = 0

select@CixLevelCnt = count(*)

fromsys.dm_db_index_physical_stats

( db_id()

, @ObjectId

, 1, default, 'DETAILED' )

SET@SelectPct = CAST(@SelectRowCnt as numeric(12,2) ) * 100 / @TotalRowCnt

PRINT'Rows total count: ' + cast(@TotalRowCnt as varchar(8) )

PRINT'Rows selected count: ' + cast(@SelectRowCnt as varchar(8) )

PRINT'Rows selected % ' + cast( @SelectPct as varchar(53) )

PRINT'Clustered Index level: ' + cast(@CixLevelCnt as varchar(8) )

PRINT'Scan IO count: ' + cast( @ExtentCnt as varchar(8) )http://www.sqlservercentral.com/Forums/Skins/Classic/Images/RichTextBoxTable/tbl_bottomleft.gif

PRINT'Index IO count: ' + cast(@SelectRowCnt * ( @CixLevelCnt + 1 ) as varchar(8) )

IF(@SelectRowCnt * ( @CixLevelCnt + 1 ) ) < @ExtentCnt

PRINT 'Index will be useful'

ELSE

Print 'Index will NOT be used'

go

SQL = Scarcely Qualifies as a Language

Paul White SSC Guru Points: 150467 More actions · Answer 7

dan (6/19/2009)
Actually performs far better when rewritten like this
Select ds.ndate, p.policyID
From policy p
Cross join
dateSequence ds
where p.dateFrom = (select max(dateFrom) from policy where ds.ndate>=dateFrom and policyID=p.policyID)

Hey Dan,

I would just say that for questions like this, you will generally get better responses if you post a script to set up the tables and indexes with some test data. This allows us to easily reproduce the issue and suggest alternatives.

I have created a script which may or may not resemble your requirement. In this case, the test query at the end performs far better than the example I quote above.

If you can take a look and change the script as necessary to demonstrate the issue you are seeing, that would probably help a great deal.

Paul

-- Move to a safe place

USE tempdb;

GO

-- Conditional table drops

IF OBJECT_ID(N'dbo.Policy', N'U') IS NOT NULL DROP TABLE dbo.Policy;

IF OBJECT_ID(N'dbo.DateSequence', N'U') IS NOT NULL DROP TABLE dbo.DateSequence;

GO

-- Create the test tables

CREATE TABLE dbo.Policy

(

policy_idINT NOT NULL PRIMARY KEY CLUSTERED,

date_fromDATETIME NOT NULL,

date_toDATETIME NOT NULL DEFAULT CONVERT(DATETIME, '9999-12-31', 120),

);

CREATE TABLE dbo.DateSequence

(

the_dateDATETIME NOT NULL PRIMARY KEY CLUSTERED

);

GO

-- Add 100K rows to the date sequence table - one row per date

WITHNumbers (N) AS

(

SELECTTOP (100000)

ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROMmaster.sys.columns C1

CROSS

JOINmaster.sys.columns C2

)

INSERTdbo.DateSequence (the_date)

SELECTCONVERT(DATETIME, N)

FROMNumbers

ORDERBY

N;

GO

-- Add 5K rows to the policy table, with random date ranges

WITHNumbers (N, Random1, Random2) AS

(

SELECTTOP (5000)

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

CONVERT(INT, RAND(CHECKSUM(NEWID())) * 15000) + 36500,-- date_from

CONVERT(INT, RAND(CHECKSUM(NEWID())) * 365) + 28-- days after date_from

FROMmaster.sys.columns C1

CROSS

JOINmaster.sys.columns C2

)

INSERTdbo.Policy (policy_id, date_from, date_to)

SELECTN,

CONVERT(DATETIME, Random1),

CONVERT(DATETIME, Random1 + Random2)

FROMNumbers

ORDERBY

N;

GO

--

-- Test query on a range of policy IDs

--

-- Start from cold

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;

--

-- Show performance statistics

--

SET STATISTICS IO, TIME ON;

--

SELECTP.policy_id, DS.the_date

FROMdbo.Policy AS P

JOINdbo.DateSequence AS DS

ONDS.the_date BETWEEN P.date_from AND P.date_to

WHEREP.policy_id BETWEEN 1234 AND 1264;

/*

(6858 row(s) affected)

Table 'DateSequence'. Scan count 31, logical reads 129, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Policy'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

CPU time = 0 ms, elapsed time = 333 ms.

*/

--

SET STATISTICS IO, TIME OFF;

GO

--

-- Tidy up

--

DROP TABLE dbo.Policy, dbo.DateSequence;

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi