Index Challenge

  • Hi Follks!

    I have one simple index related challage for all...

    CREATE TABLE TableX(

    X_ridint identity(1, 1) primary key,

    x_value1datetime,

    x_value2varchar(10),

    x_value3int,

    )

    CREATE TABLE TableY(

    Y_ridint identity(1, 1) primary key,

    X_ridint constraint fk_001 foreign key references TableX(X_rid),

    y_value1float,

    y_value2int,

    y_value3datetime,

    y_value4datetime

    )

    create nonclustered index idx_001 ON TableY (X_rid)

    include (y_value3,y_value4,Y_rid,y_value1,y_value2)

    select x.x_value2, y.Y_rid, y.y_value1, y.y_value2, y.y_value3, y.y_value4

    from TableX x

    join TableY y on x.X_rid = y.X_rid

    where y.X_rid = 1

    and y.y_value3 > GETDATE()

    and y.y_value4 < GETDATE()

    This is the sample structure and Query. The Query written above generates the attached execution plan.

    The challange is to convert the Index Seek to Clustered Index Seeks.

    Rules:

    1. You can not change the Table Design (:-P).

    2. You can change the query but cannot change the query output.

    3. You can add indexes whatever you want.

    Let's do some brainstorming..:hehe:

    Abhijit - http://abhijitmore.wordpress.com

  • Abhijit More (1/28/2011)


    The challange is to convert the Index Seek to Clustered Index Seeks.

    Why? The index seek should be more efficient.

    btw, any form of exec plan investigation is meaningless on 0 rows, as is the requirement to keep the same output (0 rows is easy). If it's to be a challenge, add some sample data.

    Here you go

    select x.x_value2, y.Y_rid, y.y_value1, y.y_value2, y.y_value3, y.y_value4

    from TableX x

    join TableY y WITH (INDEX = 1, FORCESEEK) on x.X_rid = y.X_rid

    where y.X_rid = 1

    and y.y_value3 > GETDATE()

    and y.y_value4 < GETDATE()

    AND y.Y_Rid > 0 -- totally meaningless because the identity will always be > 0, but it is a SARGable predicate on the clustered index key

    The additional predicate is meaningless, it's essentially a case of AND TRUE, and on any large table will kill performance because the nonclustered index seek has essentially turned into a table scan. It appears as a seek, but, if the table had any rows, would result in all of them being read.

    p.s. exactly the same result can be obtained by omitting the hints and dropping the nonclustered index

    DROP INDEX idx_001 ON TableY

    select x.x_value2, y.Y_rid, y.y_value1, y.y_value2, y.y_value3, y.y_value4

    from TableX x

    join TableY y on x.X_rid = y.X_rid

    where y.X_rid = 1

    and y.y_value3 > GETDATE()

    and y.y_value4 < GETDATE()

    AND y.Y_Rid > 0 -- totally meaningless because the identity will always be > 0, but it is a SARGable predicate on the clustered index key

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I added some data (a couple hundred thousand rows) and compared the original and the challenge requirement.

    INSERT INTO TableX (x_value1, x_value2, x_value3)

    SELECT TOP (250000) GETDATE(), 'abc', a.object_id + b.column_id

    FROM master.sys.columns a CROSS JOIN master.sys.columns b

    INSERT INTO TableY (X_Rid, y_value1, y_value2, y_value3, y_value4)

    SELECT TOP (75) PERCENT X_rid,

    (RAND(CHECKSUM(NEWID()))-0.5)*200, 0,

    DATEADD(dd,Floor((RAND(CHECKSUM(NEWID()))-0.5)*20),GETDATE()),

    DATEADD(dd,Floor((RAND(CHECKSUM(NEWID()))-0.5)*20),GETDATE())

    FROM TableX

    Original:

    Table 'TableY'. Scan count 1, logical reads 3, physical reads 0

    Table 'TableX'. Scan count 0, logical reads 3, physical reads 0

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Required query (clustered index seek on both tables)

    Table 'TableY'. Scan count 1, logical reads 1053, physical reads 0.

    Table 'TableX'. Scan count 0, logical reads 3, physical reads 0.

    SQL Server Execution Times:

    CPU time = 406 ms, elapsed time = 422 ms.

    I think I'll stick to the original if you don't mind...

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Indeed I agree with you Gail, there is no meaning is testing the query w/o data. Thats my bad.

    and Thanks for adding sample data, I'll appreciate it.

    Okay! now my question is by specifying the FORCESEEK we force the query optimizer to use the seek operation, does it will caz anytime a performance issues by specifying ther index as well.

    Looking at the statistics even I would prefer to stick with the original query. Can you brief more on this..

    Abhijit - http://abhijitmore.wordpress.com

  • Abhijit More (1/28/2011)


    Okay! now my question is by specifying the FORCESEEK we force the query optimizer to use the seek operation, does it will caz anytime a performance issues by specifying ther index as well.

    As I said at the end of the first post, the hints aren't required. Add the predicate and drop the nonclustered index.

    Looking at the statistics even I would prefer to stick with the original query. Can you brief more on this..

    Err, what more do you want me to say?

    Original query: 6 logical reads, 0 ms

    New query (with required 2 clustered index seeks): 1056 reads, 400ms

    Not hard to tell that the original was far, far, far more efficient.

    I'm curious as to the point of this, are you looking for ways to make queries run slower and less efficiently?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • nope I am finding out the ways we can beat the Operators.

    Abhijit - http://abhijitmore.wordpress.com

  • Bear in mind that silly tricks (like adding a predicate that qualifies all rows) will not make queries run faster. Tuning queries by trying to force specific operators is also a silly thing, unless you really, really, really know why the desired operator is absolutely the best and you know that it's the best in all cases.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Abhijit More (1/28/2011)


    nope I am finding out the ways we can beat the Operators.

    You don't want to try to "beat the Operators." You want to try to work with the optimizer to help it do it's job. Not make the, probably mistaken assumption, that you really know better how to tune queries than people like Dr. DeWitt who builds these tools.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply