Help with speeding up a query causing timeouts.

  • Hi Gila

    The reason that the filter is on JobPk and SiteFk is that there is only 1 database that handles multiple sites so its to ensure that only jobs belonging to the correct site are shown otherwise someone could tamper with the jobPK to view another sites job. This is the same case for lots of other tables and the reason a lot of tables have been denormalised to contain SiteFk so that we can do the appropriate filter on SIteFK without having to join to X tables just to get the SitePK value.

    I will have a look at those suggestions and get back to you.

    Thanks for your time and help

  • Also you say include JobDescription as an included column in that first suggestion is it ok to include a column thats nvarchar(max) in the index? I didn't think that you should put large object data into indexes. I know included columns are not at the leaf level but I don't know enough about how LOB data and indexes go together to know if its a good idea or not and I haven't mentioned that jobDesc was an nvarchar(max) yet so maybe you didn't know.

  • Also regarding Live and TempMode.

    Live is a tinyint -1, 0, 1

    and Tempmode is a bit.

    I was always under the impression that putting bits or other very non selective columns in the index

    was not a good idea especially if the index is ordered by those columns first.

    You say change the index to Live_TempMode_etc etc so I would just like your opinion on why thats a good option considering the datatypes of those columns.

    Thanks

  • In general it isn't a good idea to put non-selective columns first as the optimizer might just decide to ignore them .... but if its say 99% vs 1% on a bit column and you are filtering on an equality on the 1% then it will be useful. From memory, the only exception I can think of is when the other indexed columns are very selective and an index scan plus lookups will be more efficient than a clustered index scan.

  • Rob Reid (2/4/2008)


    I was always under the impression that putting bits or other very non selective columns in the index

    was not a good idea especially if the index is ordered by those columns first.

    It depends. The reason in this case is that you have a query "WHERE Live=1 And SiteFk=X AND TempMode=0 Order by LiveDate"

    Now, you already have an index starting with SiteFK, so it makes more sense to start the index to cover this query with another column.

    Having the index on Live, TempMode, SiteFK, LiveDate ensures that both the filter and the order by are taken care of by the index. The first 3 columns take care of the where clause (which is exact match), and the resulting rows are then ordered by LiveDate, so a sort doesn't have to be done

    If the filter or index was only on one of the bits/tinyint, I would agree that the index won't be used, but this is a 3 column (covering) index and a 3 column where clause and, as I mentioned earlier, SQL can and does do multi-column seeks.

    The JobDescription been varchar(max) does suggest not putting it in the index. Having a max datatype in the include means that the entire LOB is duplicated, which isn't a good idea.

    As with all suggestions, test carefully. I can't try out these index suggestions, so they are just that - suggestions.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ok so having bits/tinyints in a covering index is okay then its just when you create an index on one or a couple of columns that are very non-selective that it makes it not worth having.

    I have just queried the DB for the spread of live values across the JOB table and the results are:

    1 (live)278057.42%

    0 (deleted)3910410.43%

    -1 (expired)30800682.15%

    A lot of companies will expire jobs instead of deleting them as they are still viewable in their management tool and then renew, copy, re-activate them at a later date.

  • Rob Reid (2/5/2008)


    Ok so having bits/tinyints in a covering index is okay then its just when you create an index on one or a couple of columns that are very non-selective that it makes it not worth having.

    100%. Noncovereing indexes need to be very selective, because the lookup to the clustered index is so expensive. Generally, if you have a noncovering index it needs to be > 95% selective for SQL to consider it. (ie a seek on all the possible columns needs to eliminate 95% of the rows in the table)

    Covering indexes don't need to be so selective (for the queries they cover) because there's no need to lookup back to the cluster.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ok thanks for all your help today I dropped 2 of those older indexes and I've been hunting through the many many procs and queries that access the JOB table to see what other indexes could be useful.

    I dropped these 2 as looking at the stats table they were either hardly used at all or had lots of scans.

    CREATE NONCLUSTERED INDEX [nclidx_JOBS_SiteFK_LiveDate_FeatureScore] ON [dbo].[JOBS]

    (

    [SiteFK] ASC,

    [liveDate] ASC,

    [featureScore] ASC

    )

    INCLUDE ( [tempMode],

    [JobPK],

    [live],

    [featuredAdvert]) WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [nclidx_JOBS_SiteFK_ClientFK_UserFK] ON [dbo].[JOBS]

    (

    [SiteFK] ASC,

    [ClientFK] ASC,

    [UserFK] ASC

    )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

    Then I created these 2

    This one is primarily for some reports that are done that look at jobs posted (created/edited/renewed)

    between 2 dates and a mixture of Live and Create Date are used for the date range.

    CREATE NONCLUSTERED INDEX [nclidx_JOBS_CreateDate_LiveDate_ClientFK_SiteFK] ON [dbo].[JOBS]

    (

    [createDate] ASC,

    [liveDate] ASC,

    [ClientFK] ASC,

    [SiteFK] ASC

    )

    INCLUDE ( [JobPK],

    [jobRef],

    [jobTitle]) WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

    This one is primarily for a nightly job that runs to expire a sites worth of jobs that have reached their expiry date naturally and also through the job manger when a client user expires a job manually. Also there for viewing expired jobs as apposed to live jobs (different menu options)

    also there are some reports that list the jobs current status eg user who created it, current live status and expiry date.

    CREATE UNIQUE NONCLUSTERED INDEX [nclidx_JOBS_Live_ExpiryDate_SiteFK_ClientFK_UserFK_JobPK] ON [dbo].[JOBS]

    (

    [live] ASC,

    [expiryDate] ASC,

    [SiteFK] ASC,

    [ClientFK] ASC,

    [UserFK] ASC,

    [JobPK] ASC

    )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

    What I would like to know if its not too much trouble is the following.

    Is it possible using some of SQL 2005s new features to log somehow the actual query that was run when the stats in the sys.dm_db_index_usage_stats get updated. It is useful to see per index the stats for seeks, scans, updates but it would be very useful if I could actually pinpoint the actual query that was being executed so I could see those queries that were under performing. Like an extension to the missing index system that offers recommendations it could also store/log those queries that it believes the missing indexes would be useful for. Do you think theres a way to do that?

    Also I have a few queries that use a CASE statement in the WHERE clause like so

    UPDATE

    JOBS

    SET

    ExpiryDate = getdate(),

    Live = -1

    WHERE

    SiteFK= @SitePK AND

    ClientFK= CASE WHEN @ClientPK <> 0 THEN @ClientPK ELSE ClientFK END AND

    JobPK= @JobPK

    These are usually contained in stored procs so that there is no need for an IF statement or dynamic SQL and its all in one place. Now I know having functions in the WHERE prevents indexes being used on those columns and I have actually previously rewritten a similar query that had a CASE and it improved the performance. Would you suggest breaking it up to have branched IF statements in the stored proc or use dynamic SQL (web end so the plan is cached/re-used) or sp_executeSQL in the proc or separate stored procs per Branch ie. one for ClientFK=@ClientFK and one for ClientFK=0. Whats the best for index use and plan re-use and performance without being too much too maintain (e.g lots of stored procs that all do very very similar selects).

    Thanks

  • Also you mentioned in an earlier post about the index recommender offering similar recommendations

    e.g

    col1_co2_col3

    and

    col1_col2 include(col3)

    or something similar.

    Now surely there must be a difference between having the column as a main column in the index and having it as an included column or why would they have introduced included columns.

    When would you suggest putting a column in the main part of the index and when would you put it as an included column and whats the difference in your eyes.

  • I was actually listening to a podcast on that last night (SQLDownUnder - Interview with Kimberley Tripp, episode 15)

    The index key (the main part of the index) are limited to 16 columns or 900 bytes, whichever comes first. There are no limitations on include columns. You can stick a varchar(max) in the include columns if you wish.

    The key columns are found in all levels of the index, both the leaf and the index tree. (If you're fuzzy on the structure of an index, Books Online has a few good sections)

    The include columns are only found in the index's leaf pages.

    The main functional difference is that columns in the key can be searched on, columns in the include can't.

    To take your example, say there's a query "SELECT col1, col2, col3 where col1 = x and col2 = y and col3 = z"

    If the index is on Col1, col2, col3, then SQL can do a seek on all three columns and go striaght to the required rows.

    If the index is on col1, col2 INCLUDE col3, then SQL can only do a seek on col1, col2 (since col3 is not present in the tree portion of the index). Once if finds those rows, then it can filter out rows based on col3.

    Should a column go in the index key or in the include? It depends how the column is used in the query. If it's in the where or join clauses (or for more advanced indxing, order by or group by) then you want it in the index key. If it's only in the select clause, you want it in the include.

    Make sense?

    If you can, listen to that podcast. Kimberley goes into a lot more detail on all areas of indexing.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply