Index and Optimiser question.

  • ScottPletcher (10/30/2015)


    My understanding is also that if the table is a heap, SQL can "intersect" multiple nonclustered indexes to generate the desired result, without having to do a lookup back to the heap. It may not do seeks (keyed lookups) on other nonclustered indexes, but it can at least scan them and join the results.

    You can get index intersections regardless of the underlying architecture of the table. They're very inefficient though, so they aren't used very often.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ben.brugman (10/30/2015)


    Technically it would be possible using the index containing the F column and then using the Covering index table. (Fields A and B are present in the F index, they could be used within the covering index). As Gail explained this does not happen.

    That's actually quite interesting question. After the records are found using index on F column we can get values of A and B which is our clustered index and given that A, B are the first columns in IX_Large_Table_ABDFHJ index we can easily perform a seek to get the rest of the columns. Why server doesn't even consider such kind of obvious optimization? Technically it's absolutely possible. There is only one way to force optimizer to do want we want, but we have to use table twice in our query, which becomes cumbersome and not nice at all, but it works:

    SELECT T1.B, T1.D, T1.F, T1.H, T1.J

    FROM large_table T1

    CROSS APPLY (SELECT *

    FROM large_table T2

    WHERE T2.a = T1.a

    AND T2.b = T1.b

    AND T2.f > 15 AND T2.f < 131 ) T2

    -- OR

    SELECT T1.B, T1.D, T1.F, T1.H, T1.J

    FROM large_table T1

    INNER JOIN large_table T2 ON T2.a = T1.a AND T2.b = T1.b

    WHERE T2.f > 15 AND T2.f < 131

    For sure, in order to be able to do it, A and B columns must be set to NOT NULL and clustered index must be defined as unique.

    ben.brugman (10/30/2015)


    (With 6 fields in the index, there are 15 combinations of 2 fields, 20 combinations of 3 fields, 15 combinations of 4 fields. Althogether to much to create an index for each situation. 64 (63) combinations in total).

    (For 8 fields the number of combinations is 256 (255)).

    Your math is wrong.

    • Let's say, if you already have an index on (H, F, J), then why do you need another index on (H)? or on (H, F)? or on (H, F, J, X)?
    • If you have clustered index on (A, B), and I assume that these 2 columns together are already quite selective (if not unique) then why you may ever need extra indexes on any combination starting from (A,B)? For example: (A, B, D); (A, B, F); (A, B, H); (A, B, J) etc.
    • If, let's say, column H is not selective at all (e.g. Gender: M, F, NULL) then you definitely don't want to create an index on that.
    • Also some of the columns can be used more often in a WHERE clause than the others and if they are also selective enough they would be the first candidates to be a first column in the index and then you put other columns from the WHERE clause of the same queries. And after that if there are still some useful columns but not for the all queries you can always put them into INCLUDE.

    And actually your covering index doesn't make a lot of sense because how many queries do you have that use all 6 (ABDFHJ) columns as a condition in the WHERE clause? Use INCLUDE instead.


    Alex Suprun

  • GilaMonster (10/30/2015)


    Unlikely. It's fairly rare for SQL to scan a non-covering index and then do lookups, because it's hard for it to get accurate estimations. It's prone to scan the cluster in those cases

    Ok this is something to consider. In the table were we used this technique did work. We came to this solution after a lot of trial and error.

    But we should be carefull with this technique, thank you for the remark.

    To do every combination, yes, but that's usually not necessary. In most systems some combinations of columns are more common than others. What usually works well is to tailor indexes for the more common cases, then have a 'fall-back' index for queries that are using less common sets of columns.

    The number of combinations used was more than 15 or 20, and potentially even more than that. (I do not have the actual count). The 'normal' indexes were not fast enough so we reverted to a wide index, but still a lot less wide than the 'main' record.

    Thanks for your remarks on that this is fairly rare to work.

    Ben

  • Alexander Suprun (10/30/2015)There is only one way to force optimizer to do want we want, but we have to use table twice in our query, which becomes cumbersome and not nice at all, but it works:

    Thanks for the script, I hadn't thought of that possibility, so many thanks for that. Briljant although cumbersome (as you said) is still is what I asked and you provided a 'solution' within the ´constraints´. (This is thinking outside the box 🙂 )

    (In the example the clustered index was not unique, this is fairly general for the clustered indexes we use in our tables. The clustered index is higly selective in most cases, often we want to find multiple rows using the clustered index.)

    Your math is wrong.

    For myself I think the math is totaly correct. I described the possible number of combinations. But you are totaly correct that the number of usefull combinations is far less than the possible number of combinations.

    But we came to 15 to 20 combinations which were needed (at least), and possible more later on, so we the number of different combinations we needed for indexing became to large.

    Use INCLUDE instead.

    I am not to familiar with the advantages of the INCLUDE columns. I always think that if the INCLUDE is used so the column has to be in the index it might just as wel be used in the index itself.

    (For the clustered table everything is included so it does not apply here, but here you so not add to many columns to the 'key').

    Please explain to my the advantages of the INCLUDE?

    Ben

  • ben.brugman (11/1/2015)


    Please explain to my the advantages of the INCLUDE?

    Columns included are in the leaf level only, not the key, so the index is smaller than if they were in the key.

    They also don't count towards the 900 byte, 16 column limit for an index key.

    Include is a critical part of good indexing these days, and it's needed for what I recommended (which I think I may not have been clear about)

    Target indexes (covering indexes, using include as necessary) for your more frequent queries

    Then have the massive covering index as a fall-back for other queries.

    The 'normal' indexes will definitely be fast enough if you do this, as they're covering indexes that don't have to go back to the cluster for their additional columns.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/1/2015)

    Columns included are in the leaf level only, not the key, so the index is smaller than if they were in the key.

    They also don't count towards the 900 byte, 16 column limit for an index key.

    Thank you, for refreshing my memory. This one got lost behind all the other clutter.

    Thx,

    Ben

  • ben.brugman (11/2/2015)


    GilaMonster (11/1/2015)

    Columns included are in the leaf level only, not the key, so the index is smaller than if they were in the key.

    They also don't count towards the 900 byte, 16 column limit for an index key.

    Thank you, for refreshing my memory. This one got lost behind all the other clutter.

    Thx,

    Ben

    So, revisiting your opening post in this thread:

    Large table.

    Large table, with fields A through Z.

    Clustered index on a item A (int), B (int).

    index on the items B,C,D,E,F,G,H (7 indexes).

    One Covering index on A,B,D,F,H,J

    Query,

    Select B,D,F,H,J from Large_Table where F=@a_value

    The ideal covering index for this exact query would be:

    CREATE INDEX ix_F_covering ON [Large table] (F ASC) INCLUDE (B,D,H,J)

    Understanding how INCLUDE columns make covering indexes is a significant step up - you're now in a position to design indexes to match your workload, rather than attempting to second guess what indexes might or might not work.

    Now, assuming the db is live or even under some kind of load from testing, you could obtain index stats using e.g. Glenn Berry's scripts and work from them, manipulating indexes to match requirement.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Just did a checkup on the tables which inspired this thread.

    The actual number of columns in the 'covering index' was 10. And yes they are all used in the actual queries in different combinations. (They are 1 code, 1 dttm and 8 are four bytes keys. Average length of this is 46 bytes).

    Using the include I probably would loose 1 level in the index. 😀

    And the size of this index would probably be reduced with an .5 percent.

    (Size of the non leaf levels was 0.006 of the leaf level.)

    So there would be an improvement.

    But I am guessing here that the improvement will be very small.

    So at implementation time the INCLUDE must be considered. But changing this, this late in the game, is hardly worth the effort (I am guessing this).

    Thanks for enlightning me,

    Ben

    Code used to determine the sizes of the different levels.

    select DB_NAME(database_id) as databasename,

    OBJECT_ID,

    object_name(OBJECT_ID, database_id) as table_name,

    forwarded_record_count,

    index_type_desc

    ,*

    From

    sys.dm_db_index_physical_stats (DB_ID(N'DB_Ben'),

    Object_id(N'dbo.Table_Ben'), null, null, 'DETAILED')

  • ben.brugman (11/2/2015)


    Just did a checkup on the tables which inspired this thread.

    The actual number of columns in the 'covering index' was 10. And yes they are all used in the actual queries in different combinations. (They are 1 code, 1 dttm and 8 are four bytes keys. Average length of this is 46 bytes).

    Using the include I probably would loose 1 level in the index. 😀

    And the size of this index would probably be reduced with an .5 percent.

    (Size of the non leaf levels was 0.006 of the leaf level.)

    So there would be an improvement.

    But I am guessing here that the improvement will be very small.

    So at implementation time the INCLUDE must be considered. But changing this, this late in the game, is hardly worth the effort (I am guessing this).

    Thanks for enlightning me,

    Ben

    Code used to determine the sizes of the different levels.

    select DB_NAME(database_id) as databasename,

    OBJECT_ID,

    object_name(OBJECT_ID, database_id) as table_name,

    forwarded_record_count,

    index_type_desc

    ,*

    From

    sys.dm_db_index_physical_stats (DB_ID(N'DB_Ben'),

    Object_id(N'dbo.Table_Ben'), null, null, 'DETAILED')

    Can you share a couple of your queries (obfuscated, of course)? Execution plans would be better but are harder to obfuscate, unless you have a tool such as SQL Sentry Plan Explorer

    Cheers

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (11/2/2015)

    Can you share a couple of your queries (obfuscated, of course)?

    Sorry at the moment I do not see a feasible way to do this within a limited amount of time.

    Offcourse more tables are involved in most queries.

    Offcourse the content (specifically the distribution) is important to the optimiser.

    (The size of the actual table is at the moment over 7Gb, this is including all indexes).

    So I do not see a realistic way that I can obfuscate and publish the results, Sorry.

    Ben

  • ben.brugman (11/2/2015)


    ChrisM@Work (11/2/2015)

    Can you share a couple of your queries (obfuscated, of course)?

    Sorry at the moment I do not see a feasible way to do this within a limited amount of time.

    Offcourse more tables are involved in most queries.

    Offcourse the content (specifically the distribution) is important to the optimiser.

    (The size of the actual table is at the moment over 7Gb, this is including all indexes).

    So I do not see a realistic way that I can obfuscate and publish the results, Sorry.

    Ben

    Like this wouldn't be perfect but would be better than nothing at all, provided that obfuscated column names are consistent between queries - and it's a method you're familiar with:

    Select B, D, F, H, J from Large_Table where F=@a_value

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply