Suggest indexes for non-equi join

  • Hi,

    Say I have a table (policy) that has policyID, dateFrom and dateTo, what would be the best index for the following query?

    select policyID

    from policy

    where ndate between dateFrom and dateTo

    or

    select ndate, policyID

    from policy p

    inner join datesequence ds

    on ds.ndate between p.datefrom and p.dateTo

    Any suggestions I have tried various different combinations but was wondering if anybody else had experiences or this?

  • Have you tried to create clustered index on ndate? They are very good at range queries.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • the clustered index is already there on ndate as it is a date sequence table.....

    the issue is with the high volume policy table... however i'm am just experiment with using sub querys instead it seems the otimiser likes them better?

  • An index that has all three columns, which makes it covering. It can also be covering if one or more (or all three) of the indexes is in the clustered index. If one, say the ID column, is the cluster, then the second index would only need the two date columns.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Right,

    Sorry for my previous brief posts with bad spelling and grammer. Here is what I have found:

    The optimiser deals better with a sub query look up rather than a looking between two values.

    So my query as I wrote ealier:

    Select ds.ndate, p.policyID

    From policy p

    Cross join

    dateSequence ds

    where ds.ndate>=p.dateFrom and ds.ndate=dateFrom and policyID=p.policyID)

    This seem slightly weird because I’ve always had it in my head that dateFrom/dateTo are far better to work with than date effectives.

    Does anybody have any experiences/alternatives?

    Dan

  • dan (6/19/2009)


    Right,

    Sorry for my previous brief posts with bad spelling and grammer. Here is what I have found:

    The optimiser deals better with a sub query look up rather than a looking between two values.

    So my query as I wrote ealier:

    Select ds.ndate, p.policyID

    From policy p

    Cross join

    dateSequence ds

    where ds.ndate>=p.dateFrom and ds.ndate=dateFrom and policyID=p.policyID)

    This seem slightly weird because I’ve always had it in my head that dateFrom/dateTo are far better to work with than date effectives.

    Does anybody have any experiences/alternatives?

    Dan

    You might find that using a TOP 1 with an ORDER BY works better than the aggregation using MAX. It's not always the case, but it frequently is.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Kimberly Tripp just had a blog post titled "Why aren't those nonclustered indexes being used?" that answers the question "What percentage of data IS selective enough to use a nonclustered index which doesn't cover the query." http://www.sqlskills.com/BLOGS/KIMBERLY/post/Tipping-Point-Queries-More-Questions-To-Really-Test-You!.aspx

    The answer is if the select statement accesses more than about 2% of the rows, then the nonclustered index will not be used. I found that the % can be much lower depending on the width of the rows. The larger the rows,the more likely a non-clustered index will be used and the smaller the row, the more like the non-clustered index will not be used.

    A more exact calculation can be done with this SQL and you must provide two values: the number of rows access by the select statment and the name of the table.

    declare@SelectRowCnt integer

    ,@ExtentCntinteger

    ,@TotalRowCntinteger

    ,@CixLevelCntinteger

    ,@SelectPctnumeric(12,2)

    ,@ObjectIdinteger

    -- Replace the ? with the appropriate values

    set@ObjectId= object_id('?')

    set@SelectRowCnt= ?

    select@ExtentCnt= page_count / 8

    ,@TotalRowCnt= record_count

    fromsys.dm_db_index_physical_stats

    ( db_id()

    , @ObjectId

    , 1, default, 'DETAILED' )

    WHEREindex_level = 0

    select@CixLevelCnt = count(*)

    fromsys.dm_db_index_physical_stats

    ( db_id()

    , @ObjectId

    , 1, default, 'DETAILED' )

    SET@SelectPct = CAST(@SelectRowCnt as numeric(12,2) ) * 100 / @TotalRowCnt

    PRINT'Rows total count: ' + cast(@TotalRowCnt as varchar(8) )

    PRINT'Rows selected count: ' + cast(@SelectRowCnt as varchar(8) )

    PRINT'Rows selected % ' + cast( @SelectPct as varchar(53) )

    PRINT'Clustered Index level: ' + cast(@CixLevelCnt as varchar(8) )

    PRINT'Scan IO count: ' + cast( @ExtentCnt as varchar(8) )http://www.sqlservercentral.com/Forums/Skins/Classic/Images/RichTextBoxTable/tbl_bottomleft.gif

    PRINT'Index IO count: ' + cast(@SelectRowCnt * ( @CixLevelCnt + 1 ) as varchar(8) )

    IF(@SelectRowCnt * ( @CixLevelCnt + 1 ) ) < @ExtentCnt

    PRINT 'Index will be useful'

    ELSE

    Print 'Index will NOT be used'

    go

    SQL = Scarcely Qualifies as a Language

  • dan (6/19/2009)


    Actually performs far better when rewritten like this

    Select ds.ndate, p.policyID

    From policy p

    Cross join

    dateSequence ds

    where p.dateFrom = (select max(dateFrom) from policy where ds.ndate>=dateFrom and policyID=p.policyID)

    Hey Dan,

    I would just say that for questions like this, you will generally get better responses if you post a script to set up the tables and indexes with some test data. This allows us to easily reproduce the issue and suggest alternatives.

    I have created a script which may or may not resemble your requirement. In this case, the test query at the end performs far better than the example I quote above.

    If you can take a look and change the script as necessary to demonstrate the issue you are seeing, that would probably help a great deal.

    Paul

    -- Move to a safe place

    USE tempdb;

    GO

    -- Conditional table drops

    IF OBJECT_ID(N'dbo.Policy', N'U') IS NOT NULL DROP TABLE dbo.Policy;

    IF OBJECT_ID(N'dbo.DateSequence', N'U') IS NOT NULL DROP TABLE dbo.DateSequence;

    GO

    -- Create the test tables

    CREATE TABLE dbo.Policy

    (

    policy_idINT NOT NULL PRIMARY KEY CLUSTERED,

    date_fromDATETIME NOT NULL,

    date_toDATETIME NOT NULL DEFAULT CONVERT(DATETIME, '9999-12-31', 120),

    );

    CREATE TABLE dbo.DateSequence

    (

    the_dateDATETIME NOT NULL PRIMARY KEY CLUSTERED

    );

    GO

    -- Add 100K rows to the date sequence table - one row per date

    WITHNumbers (N) AS

    (

    SELECTTOP (100000)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROMmaster.sys.columns C1

    CROSS

    JOINmaster.sys.columns C2

    )

    INSERTdbo.DateSequence (the_date)

    SELECTCONVERT(DATETIME, N)

    FROMNumbers

    ORDERBY

    N;

    GO

    -- Add 5K rows to the policy table, with random date ranges

    WITHNumbers (N, Random1, Random2) AS

    (

    SELECTTOP (5000)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    CONVERT(INT, RAND(CHECKSUM(NEWID())) * 15000) + 36500,-- date_from

    CONVERT(INT, RAND(CHECKSUM(NEWID())) * 365) + 28-- days after date_from

    FROMmaster.sys.columns C1

    CROSS

    JOINmaster.sys.columns C2

    )

    INSERTdbo.Policy (policy_id, date_from, date_to)

    SELECTN,

    CONVERT(DATETIME, Random1),

    CONVERT(DATETIME, Random1 + Random2)

    FROMNumbers

    ORDERBY

    N;

    GO

    --

    -- Test query on a range of policy IDs

    --

    -- Start from cold

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    --

    -- Show performance statistics

    --

    SET STATISTICS IO, TIME ON;

    --

    SELECTP.policy_id, DS.the_date

    FROMdbo.Policy AS P

    JOINdbo.DateSequence AS DS

    ONDS.the_date BETWEEN P.date_from AND P.date_to

    WHEREP.policy_id BETWEEN 1234 AND 1264;

    /*

    (6858 row(s) affected)

    Table 'DateSequence'. Scan count 31, logical reads 129, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Policy'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    CPU time = 0 ms, elapsed time = 333 ms.

    */

    --

    SET STATISTICS IO, TIME OFF;

    GO

    --

    -- Tidy up

    --

    DROP TABLE dbo.Policy, dbo.DateSequence;

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply