Query optimization help

  • Lynn Pettis (4/17/2014)


    ScottPletcher (4/17/2014)


    You need to do further research before you change the existing index to a filtered index, because that index will no longer satisfy queries originally using it if you filter out the values it was using.

    Besides, I was hoping that the date range itself would eliminate enough rows to improve I/O significantly without impairing any existing uses of the index.

    What date range? The query wants EVERYTHING prior to a specific date.

    That's still a date range, from beginning date to the specified date. As I noted earlier, specific row counts would be necessary to do a full analysis, but coming in by date at least guarantees reducing the total number of rows that need searched.

    Isn't if a far greater concern that willy-nilly changing an existing index to a filtered index could destroy the performance of a large number of existing queries that currently perform quite well?

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (4/17/2014)


    Lynn Pettis (4/17/2014)


    ScottPletcher (4/17/2014)


    You need to do further research before you change the existing index to a filtered index, because that index will no longer satisfy queries originally using it if you filter out the values it was using.

    Besides, I was hoping that the date range itself would eliminate enough rows to improve I/O significantly without impairing any existing uses of the index.

    What date range? The query wants EVERYTHING prior to a specific date.

    That's still a date range, from beginning date to the specified date. As I noted earlier, specific row counts would be necessary to do a full analysis, but coming in by date at least guarantees reducing the total number of rows that need searched.

    Isn't if a far greater concern that willy-nilly changing an existing index to a filtered index could destroy the performance of a large number of existing queries that currently perform quite well?

    There is no specified start date. For all we know there could be 10+ years of data in this table and the query wants it all up to a specified point in time. As more data is added and the endpoint of this query progresses, more data will be pulled by the query.

    And actually, I would create a new filtered index, not modify an existing index to be a filtered index. Modifying an existing index to be a filtered index could break other queries dependent on that index.

  • Lynn Pettis (4/17/2014)


    ScottPletcher (4/17/2014)


    Lynn Pettis (4/17/2014)


    ScottPletcher (4/17/2014)


    You need to do further research before you change the existing index to a filtered index, because that index will no longer satisfy queries originally using it if you filter out the values it was using.

    Besides, I was hoping that the date range itself would eliminate enough rows to improve I/O significantly without impairing any existing uses of the index.

    What date range? The query wants EVERYTHING prior to a specific date.

    That's still a date range, from beginning date to the specified date. As I noted earlier, specific row counts would be necessary to do a full analysis, but coming in by date at least guarantees reducing the total number of rows that need searched.

    Isn't if a far greater concern that willy-nilly changing an existing index to a filtered index could destroy the performance of a large number of existing queries that currently perform quite well?

    There is no specified start date. For all we know there could be 10+ years of data in this table and the query wants it all up to a specified point in time. As more data is added and the endpoint of this query progresses, more data will be pulled by the query.

    And actually, I would create a new filtered index, not modify an existing index to be a filtered index. Modifying an existing index to be a filtered index could break other queries dependent on that index.

    It's still a date range, from the starting date to a specified ending date. Not sure why you'd object to that. For all we know, they may run queries that specify and ending date where they need only one years' worth of data. If they need to read 10+ years' worth of data, they'll have to do that anyway.

    And actually, you did suggest changing the existing index to filtered.

    I don't believe it's worth the big overhead of an additional index on this large a table when you already have an index that, with a very small addition, can cover this query.

    It was clear from earlier runs that other filters on the data didn't limit the data dramatically either. Given everything, with all known trade-offs considered, I'd rather have SQL read even a million more 15 bytes rows than create another index. Maybe if this query is to run frequently, and/or at peak/critical times, you might create a custom index. But otherwise, I don't see how such an index is worth its high cost.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (4/17/2014)


    Lynn Pettis (4/17/2014)


    ScottPletcher (4/17/2014)


    Lynn Pettis (4/17/2014)


    ScottPletcher (4/17/2014)


    You need to do further research before you change the existing index to a filtered index, because that index will no longer satisfy queries originally using it if you filter out the values it was using.

    Besides, I was hoping that the date range itself would eliminate enough rows to improve I/O significantly without impairing any existing uses of the index.

    What date range? The query wants EVERYTHING prior to a specific date.

    That's still a date range, from beginning date to the specified date. As I noted earlier, specific row counts would be necessary to do a full analysis, but coming in by date at least guarantees reducing the total number of rows that need searched.

    Isn't if a far greater concern that willy-nilly changing an existing index to a filtered index could destroy the performance of a large number of existing queries that currently perform quite well?

    There is no specified start date. For all we know there could be 10+ years of data in this table and the query wants it all up to a specified point in time. As more data is added and the endpoint of this query progresses, more data will be pulled by the query.

    And actually, I would create a new filtered index, not modify an existing index to be a filtered index. Modifying an existing index to be a filtered index could break other queries dependent on that index.

    It's still a date range, from the starting date to a specified ending date. Not sure why you'd object to that. For all we know, they may run queries that specify and ending date where they need only one years' worth of data. If they need to read 10+ years' worth of data, they'll have to do that anyway.

    And actually, you did suggest changing the existing index to filtered.

    I don't believe it's worth the big overhead of an additional index on this large a table when you already have an index that, with a very small addition, can cover this query.

    It was clear from earlier runs that other filters on the data didn't limit the data dramatically either. Given everything, with all known trade-offs considered, I'd rather have SQL read even a million more 15 bytes rows than create another index. Maybe if this query is to run frequently, and/or at peak/critical times, you might create a custom index. But otherwise, I don't see how such an index is worth its high cost.

    The only way you can interpret the following as modifying an existing index is by the name. No where in the post did I say modify an index to be a filtered index and the code provided would fail if the index name already existed:

    Lynn Pettis (4/16/2014)


    As it looks like you are using SQL Server 2008 perhaps a filtered index. Not sure if it will help, but it is worth testing.

    CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_14947]

    ON dbo.OperatorCBGEstimate (DateEndedStandard)

    INCLUDE (KeyInstn)

    WHERE MostRecentSequence = 1 and UpdOperation < 2

  • EasyBoy (4/16/2014)


    Thanks Scott and Lynn for your inputs.

    As suggested by Scott i have made changes in existing index with filtered index (suggested by Lynn). And i am able to bring the logical reads down to 30k from 70k.

    CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_14947]

    ON OperatorCBGEstimate ( DateEndedStandard, UpdOperation )

    INCLUDE ( KeyInstn, MostRecentSequence )

    WHERE MostRecentSequence = 1 and UpdOperation < 2 WITH ( DROP_EXISTING = ON )

    I may be confused here, but if I've understood correctly the optimiser might be able to do somewhat better with an index like this:-

    [cpde="sql"]CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_14947]

    ON OperatorCBGEstimate ( KeyInstn, DateEndedStandard )

    WHERE MostRecentSequence = 1 and UpdOperation < 2 WITH ( DROP_EXISTING = ON )[/code]

    It won't get the logical reads down any more than the one you tried, but it mght reduce other costs.

    Tom

  • TomThomson (4/17/2014)


    EasyBoy (4/16/2014)


    Thanks Scott and Lynn for your inputs.

    As suggested by Scott i have made changes in existing index with filtered index (suggested by Lynn). And i am able to bring the logical reads down to 30k from 70k.

    CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_14947]

    ON OperatorCBGEstimate ( DateEndedStandard, UpdOperation )

    INCLUDE ( KeyInstn, MostRecentSequence )

    WHERE MostRecentSequence = 1 and UpdOperation < 2 WITH ( DROP_EXISTING = ON )

    I may be confused here, but if I've understood correctly the optimiser might be able to do somewhat better with an index like this:-

    CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_14947]

    ON OperatorCBGEstimate ( KeyInstn, DateEndedStandard )

    WHERE MostRecentSequence = 1 and UpdOperation < 2 WITH ( DROP_EXISTING = ON )

    It won't get the logical reads down any more than the one you tried, but it mght reduce other costs.

    Sort of like the one I posted earlier except I only indexes on DateEndedStandard and made KeyInstn an included column.. Based on the query if I were to index on both columns I would swap them around from what you have listed.

  • Lynn Pettis (4/18/2014)


    TomThomson (4/17/2014)


    I may be confused here, but if I've understood correctly the optimiser might be able to do somewhat better with an index like this:-

    CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_14947]

    ON OperatorCBGEstimate ( KeyInstn, DateEndedStandard )

    WHERE MostRecentSequence = 1 and UpdOperation < 2 WITH ( DROP_EXISTING = ON )

    It won't get the logical reads down any more than the one you tried, but it mght reduce other costs.

    Sort of like the one I posted earlier except I only indexes on DateEndedStandard and made KeyInstn an included column.. Based on the query if I were to index on both columns I would swap them around from what you have listed.

    Yes, in fact I started from the one you posted. I think making the item being grouped on they first key in the index may reduce the hashing cost - with the index elements in that order the optimiser knows that it will get consecutively biggish bunches of rows with the same KeyInstn, and may be capable of deciding not to compute the hash in the case where it's going to be the same as the previous one. Although I don't know enough about sql server's optimiser to say whether it will or won't do that, I do know that some optimisers decades ago did so it seems possible that SQL Server does. I was cLutching at straws here, of course, it is definitely a try it and see thing not a firm recommendation.

    Tom

  • TomThomson (4/18/2014)


    Lynn Pettis (4/18/2014)


    TomThomson (4/17/2014)


    I may be confused here, but if I've understood correctly the optimiser might be able to do somewhat better with an index like this:-

    CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_14947]

    ON OperatorCBGEstimate ( KeyInstn, DateEndedStandard )

    WHERE MostRecentSequence = 1 and UpdOperation < 2 WITH ( DROP_EXISTING = ON )

    It won't get the logical reads down any more than the one you tried, but it mght reduce other costs.

    Sort of like the one I posted earlier except I only indexes on DateEndedStandard and made KeyInstn an included column.. Based on the query if I were to index on both columns I would swap them around from what you have listed.

    Yes, in fact I started from the one you posted. I think making the item being grouped on they first key in the index may reduce the hashing cost - with the index elements in that order the optimiser knows that it will get consecutively biggish bunches of rows with the same KeyInstn, and may be capable of deciding not to compute the hash in the case where it's going to be the same as the previous one. Although I don't know enough about sql server's optimiser to say whether it will or won't do that, I do know that some optimisers decades ago did so it seems possible that SQL Server does. I was cLutching at straws here, of course, it is definitely a try it and see thing not a firm recommendation.

    It's not clutching at straws at all, but there's a caveat, and a balancing act too. I've set up a model based on the information provided by the OP, and in the model this index does give rise to a streaming aggregate instead of a hash aggregate. The caveat is this - the optimiser will only choose it if there isn't an index available with DateEndedStandard as the leading edge. And the balancing act is this: even with a filtered index, if KeyInstn is the first key (to encourage the streaming aggregate) then the DateEndedStandard predicate is residual and hence relatively expensive. If DateEndedStandard is the leading edge then there's no filter, just a seek (range scan?) but at the expense of a hash aggregate.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • EasyBoy (4/16/2014)


    ChrisM@Work (4/16/2014)


    EasyBoy (4/15/2014)


    I did try almost all combination of altering columns for index, but somehow logical reads are same (67000).

    Did you look at the execution plans to see if the hash aggregate was replaced with a stream aggregate? If you're not sure, then create the index which Gail suggests and post the execution plan here.

    Please find the updated execution plan.

    Can anyone reading this thread account for the columns in the defined values list or output list of the index seek operator on this plan? Here they are:

    [InternalUseOnly_New].[dbo].[OperatorCBGEstimate].KeyInstn,

    [InternalUseOnly_New].[dbo].[OperatorCBGEstimate].DateEndedStandard,

    Uniq1002,

    [InternalUseOnly_New].[dbo].[OperatorCBGEstimate].BlockGroupFIPS,

    [InternalUseOnly_New].[dbo].[OperatorCBGEstimate].UpdOperation,

    [InternalUseOnly_New].[dbo].[OperatorCBGEstimate].DateEstimated

    Only the first two columns are required by the next operator (Stream Aggregate), and three of the remaining four aren't even referenced by the query. What little literature I have doesn't come close and I can't find anything useful with Boogle either.

    Thanks.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • A little more info - this is the same question on MSDN:

    [Quote]I have one simple query to optimize but somehow after creating couple of index this query is still having more logical reads and time.

    DECLARE @CURRENT smalldatetime

    set @current = '6/30/2013'

    select KeyInstn, MAX(dateendedstandard) DateEndedStandard

    from InternalEstimate with (index(AI_OperatorCBGEstimate_HAP))

    where dateendedstandard < @Current and mostrecentsequence = 1 and UpdOperation<2

    group by KeyInstn

    I have created index as

    CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

    ON [dbo].[OperateEstimate] ([MostRecentSequence],[DateEndedStandard],[UpdOperation])

    INCLUDE ([KeyInstn])[/quote]

    Is that two different tables or is one of them a view?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (4/22/2014)


    A little more info - this is the same question on MSDN:

    [Quote]I have one simple query to optimize but somehow after creating couple of index this query is still having more logical reads and time.

    DECLARE @CURRENT smalldatetime

    set @current = '6/30/2013'

    select KeyInstn, MAX(dateendedstandard) DateEndedStandard

    from InternalEstimate with (index(AI_OperatorCBGEstimate_HAP))

    where dateendedstandard < @Current and mostrecentsequence = 1 and UpdOperation<2

    group by KeyInstn

    I have created index as

    CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

    ON [dbo].[OperateEstimate] ([MostRecentSequence],[DateEndedStandard],[UpdOperation])

    INCLUDE ([KeyInstn])

    Is that two different tables or is one of them a view?[/quote]

    Please read InternalEstimate as a InternalOperCBGEstimate (This is view)

    and it's based on table OperateEstimate (read as a OperatorCBGEstimate)

  • I have also noted one more thing.

    I can't to modify existing index due to business rule. So, i havent used AI_OperatorCBGEstimate_14947. But i have created new index(AI_OperatorCBGEstimate_HAP) with the same definition as AI_OperatorCBGEstimate_14947. If i use index with HAP logical reads are about 58k and if i use AI_OperatorCBGEstimate_14947 logical reads are only 29k.

    I am attaching execution plan for both index.

  • EasyBoy (4/23/2014)


    I have also noted one more thing.

    I can't to modify existing index due to business rule. So, i havent used AI_OperatorCBGEstimate_14947. But i have created new index(AI_OperatorCBGEstimate_HAP) with the same definition as AI_OperatorCBGEstimate_14947. If i use index with HAP logical reads are about 58k and if i use AI_OperatorCBGEstimate_14947 logical reads are only 29k.

    I am attaching execution plan for both index.

    Can you post the definitions of both of these filtered indexes please?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • EasyBoy (4/23/2014)


    I have also noted one more thing.

    I can't to modify existing index due to business rule. So, i havent used AI_OperatorCBGEstimate_14947. But i have created new index(AI_OperatorCBGEstimate_HAP) with the same definition as AI_OperatorCBGEstimate_14947. If i use index with HAP logical reads are about 58k and if i use AI_OperatorCBGEstimate_14947 logical reads are only 29k.

    I am attaching execution plan for both index.

    The queries for these two plans are the same but they use different indexes. Did you edit the plans, by any chance?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here are the defination of both index

    CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_14947]

    ON OperatorCBGEstimate ( DateEndedStandard,UpdOperation)

    INCLUDE ( KeyInstn )

    WHERE MostRecentSequence = 1 and UpdOperation < 2 WITH ( DROP_EXISTING = ON )

    CREATE NONCLUSTERED INDEX [AI_OperatorCBGEstimate_HAP]

    ON OperatorCBGEstimate ( DateEndedStandard,UpdOperation)

    INCLUDE ( KeyInstn )

    WHERE MostRecentSequence = 1 and UpdOperation < 2 WITH ( DROP_EXISTING = ON )

Viewing 15 posts - 31 through 45 (of 49 total)

You must be logged in to reply to this topic. Login to reply