TSQL using parameters does not use the index

  • GilaMonster (3/18/2010)


    stefan.gustafsson 60897 (3/18/2010)


    I find it is more helpful to suggest solutions rather than saying what not to do.

    I'll be very happy to give a solution once the actual execution plan (which I've requested from the OP) is posted.

    Just curious: What do you expect to find in the actual execution plan that might affect the solution ? In this case we already know that he gets a table scan.

  • Paul White (3/18/2010)


    stefan.gustafsson 60897 (3/18/2010)


    Test again with 50000 unique GUIDs and a single GUID with 50000 copies.

    Now you are just being silly.

    Absolutely not.

    I believe that this is the exact scenario the OP has. One single massively duplicated value will cause exactly this problem.

    If the distribution of the values are equal there is no problem - the optimizer will make the correct choise every time.

    The only thing that will cause this problem is you have a very uneven distribution of values.

    (And it does not matter if you use uniqueidentifier or varchar(50))

  • stefan.gustafsson 60897 (3/18/2010)


    Just curious: What do you expect to find in the actual execution plan that might affect the solution ? In this case we already know that he gets a table scan.

    Estimated and actual rowcounts, compile-time and run-time parameter values.

    I also think this is parameter sniffing due to data skew. Want confirmation before I give a solution as, if I'm wrong, the solution I give will be totally useless.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • stefan.gustafsson 60897 (3/18/2010)


    And it does not matter if you use uniqueidentifier or varchar(50)

    It is NVARCHAR(50).

    Do you really think there is no difference between indexing a GUID at 16 bytes per row, and indexing its Unicode string representation (72 bytes per row)?

    Four and a half times fewer index records will fit on a single 8KB page. This misinformation will do nothing to help the optimizer choose the correct plan, and will cause it to switch to a table scan for fewer records than it would do given accurate information.

    It makes sense to wait for the plan details before announcing 'the only possible' solution. My point regarding data types and index sizes stands regardless, of course.

  • Again sorry for the delay, was putting out fires yesterday...

    Attached is the profiler output.

    I included all batches this time the event is Showplan XML Statistics Profile, but looks the same as before.

    I greatly appreciate everyones help with this - Tom

  • When this table is loaded there are about 800,000 rows that all have the same batchExecID, the next run loads about the same but with a new Batch Guid. Normally there are only about 7 loads retained at any given time. so unique Batch guids is generally 7. The load performance is acceptable, I would be very concerned about a clustered index unless I always knew the GUIDs would always be sequential each time a batch is run.

    The guid is generated from SSIS this is the ExecutionInstanceGuid system varaible in SSIS.

  • Sorry Tom, one question.

    You say there'll be 800 000 rows with the same batchID, but in the exec plan, actual row count was 0. Should it have been 0?

    Also, if you're retrieving 800 000 rows out of 5.6 million, I would expect SQL to go for a table scan rather than using a non-covering, nonclustered index. The tipping point at which SQL goes from NC index seek with bookmark lookup to table scan is usually around (or below) 1% of the rows in the table. You're retrieving 14%

    In this situation, I would fully expect a table scan.

    Demo of the tipping point:

    http://sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Tom Van Harpen (3/18/2010)


    When this table is loaded there are about 800,000 rows that all have the same batchExecID, the next run loads about the same but with a new Batch Guid. Normally there are only about 7 loads retained at any given time. so unique Batch guids is generally 7. The load performance is acceptable, I would be very concerned about a clustered index unless I always knew the GUIDs would always be sequential each time a batch is run.

    The guid is generated from SSIS this is the ExecutionInstanceGuid system varaible in SSIS.

    Hmm

    So, the table normally only has 7 different values of batchExecID, but about 800.000 duplicates of each.

    If that is the case, what are you really trying to accomplish with your query ?

    Do you expect that query to return 800.000 rows, or do you normally run the query with a value that is not supposed to return anything ?

    If you use a value that returns 800.000 rows, a table scan is the optimal plan. If you use a value that does not exist in the table, the index lookup is the most optimal plan.

    Using an index lookup with a rid lookup would be a disaster if the query would return 800.000 rows. Performing 800.000 rid lookups would take forever.

    So, please explain what you are trying to accomplish here.

  • Paul White (3/18/2010)


    stefan.gustafsson 60897 (3/18/2010)


    And it does not matter if you use uniqueidentifier or varchar(50)

    It is NVARCHAR(50).

    Do you really think there is no difference between indexing a GUID at 16 bytes per row, and indexing its Unicode string representation (72 bytes per row)?

    Four and a half times fewer index records will fit on a single 8KB page. This misinformation will do nothing to help the optimizer choose the correct plan, and will cause it to switch to a table scan for fewer records than it would do given accurate information.

    It makes sense to wait for the plan details before announcing 'the only possible' solution. My point regarding data types and index sizes stands regardless, of course.

    Of course there is some difference between 72 bytes per row and 16 bytes per row.

    What I am saying is that this difference is irrelevant for this problem.

    The solution to the problem is NOT to change datatype.

  • So that answers the original question, albeit somewhat trivially.

    The hard-coded value N'{00000000-0000-0000-0000-000000000000}' allows SQL Server to use the statistics to realize that the value specified falls completely outside the range of values in the string column. It therefore estimates 1 row (it never estimates zero rows!) making the NC index absolutely the right choice.

    When faced with an unknown quantity in a variable, it uses a guess, or 'magic number'. The guess in this case comes to 970,440 rows - plenty to make a table scan the right choice.

    Add OPTION (RECOMPILE) to the statement with the variable to see the difference knowing the value in the variable makes. This generates a new, throw-away, plan optimized for the particular value in the variable at execution time.

  • stefan.gustafsson 60897 (3/18/2010)


    Of course there is some difference between 72 bytes per row and 16 bytes per row.

    What I am saying is that this difference is irrelevant for this problem.

    The solution to the problem is not to change datatype.

    But it might have been 😉 and it is a change that does nothing but good things. Sweet.

  • Paul White (3/18/2010)


    stefan.gustafsson 60897 (3/18/2010)


    Of course there is some difference between 72 bytes per row and 16 bytes per row.

    What I am saying is that this difference is irrelevant for this problem.

    The solution to the problem is not to change datatype.

    But it might have been 😉 and it is a change that does nothing but good things. Sweet.

    Well, I disagree. From the problem description it was quite clear that the problem was not related to the datatype used.

    Changing the datatype might cause all kinds of problems depending on what this table is used for. There might be joins with other tables, there might be queries with hard coded string values, there might be code that assumes a certain format of the data, there might be code that inserts non-guid compatible strings into the column, and so on.

    Changing datatype changes the semantics and needs to be very carefully tested.

    It is not something you want to do unless it is really necessary.

  • Stefan_G (3/18/2010)


    Well, I disagree.

    And that's perfectly fine.

  • BWAA-HAAA!!! "It depends". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Stefan_G (3/18/2010)


    Changing datatype changes the semantics and needs to be very carefully tested.

    It is not something you want to do unless it is really necessary.

    This is a point that I agree on - The change must be evaluated to determine impact (reciprocal changes). However, it (the change) probably should be done anyway.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 15 posts - 31 through 45 (of 50 total)

You must be logged in to reply to this topic. Login to reply