Long running query - out of ideas

  • Created a query that selects about 400,000 rows.

    Initially query was setup to select a couple columns and completed in about 7 seconds. The query used a non-clustered seek. Row size was less than 20 bytes

    The key column and the included columns on the index covered the query.

    I need 33 of the 36 columns in that table. I created a clustered index which included, as part of the key, the column I am using in my where clause. Now the query runs in about 17 seconds. It is doing a clustered index seek. The returned row size is about 100 bytes.

    Not sure if the increase in rows size should account for this much time increase.

    I feel the query should still run faster but not sure what to change at this point.

  • tvanharp (9/2/2009)


    Created a query that selects about 400,000 rows.

    Initially query was setup to select a couple columns and completed in about 7 seconds. The query used a non-clustered seek. Row size was less than 20 bytes

    The key column and the included columns on the index covered the query.

    I need 33 of the 36 columns in that table. I created a clustered index which included, as part of the key, the column I am using in my where clause. Now the query runs in about 17 seconds. It is doing a clustered index seek. The returned row size is about 100 bytes.

    Not sure if the increase in rows size should account for this much time increase.

    I feel the query should still run faster but not sure what to change at this point.

    You're returning 4 times more data in only double the time. Seems equitable.

    BUT, I have to ask... where are you returning those 400,000 rows to?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • tvanharp (9/2/2009)


    I created a clustered index which included, as part of the key, the column I am using in my where clause.

    This could be your problem. If the column(s) in your where clause are not the "left-most" columns in the index, then the index won't be used effectively.

    For instance, if the index is on ColA, ColB, ColC and your where clause is on ColB (or ColC), then this index is worthless for your query. To be useful, the where clause needs to use the left-most columns in the index without skipping.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hey tvanharp,

    Like many of the good folks on here, I'm willing to put some effort into this to help you toward a definitive answer.

    It would be most helpful if you could upload a saved actual execution plan for the run with the clustered index at least.

    To do this, run the query in SSMS with actual execution plan on, right click on the resulting graphical plan, and save it somewhere.

    You will need to zip it up (e.g. send to compressed folder in Windows) and attach the zip file to your next post.

    Thanks

    Paul

  • You're returning 4 times more data in only double the time. Seems equitable.

    BUT, I have to ask... where are you returning those 400,000 rows to?

    Yes that was my thought but overall it seemed too long.

    Those rows are being generated by an OLE Source component in SSIS using a SQL query as the source

  • This could be your problem. If the column(s) in your where clause are not the "left-most" columns in the index, then the index won't be used effectively.

    For instance, if the index is on ColA, ColB, ColC and your where clause is on ColB (or ColC), then this index is worthless for your query. To be useful, the where clause needs to use the left-most columns in the index without skipping.

    This is good to know, I have checked my clustered index and it is the first column listed.

  • Paul, I appreciate the offer but can't upload a plan. I did attach a screenshot

    I would be willing to give you any info you want though.

    Also since I have been messing with this, the clustered seek has turned into a clustered scan and execution time has increased dramatically.

    I updated my clustered index and removed a few fields but the field used by the where clause is still the first field.

    I don't know why it switched to a scan from a seek.

  • Paul,

    I just realized why it switched to a scan. The date field that is in the table (and first field of index) is a character and in my where clause I didn't have the value in quotes.

    Now I have the query back to where it was 400,000 rows in about 19 seconds (100 bytes per row)

    attached is the new exec plan screenshot

  • This article goes over how to save and attach execution plans. There's not enough information in just a screenshot. Most of the important details are in the properties of the various operators

    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Judging from the screenshots, I imagine the issue is one of privacy, rather than how to upload a file.

    Without an execution plan (or a crystal ball), I'm going to have to give this one a miss...

  • I know how it's just the meta data is too sensitive for certain reasons.

    I have attached some screenshot that may be more helpful

  • It is the privacy, one in a million shot the wrong person will see and know what it means but can't take that chance.

  • It's exceedingly difficult to say anything useful about a performance problem without the exec plan and/or the table and index definitions and the query. From what I can see, the query's as optimal as it can get with just a clustered index seek. It may just be that retrieving almost half a million rows will take that long.

    Unfortunately the stuff that you blacked out in the properties is the stuff that I'd need to see, along with the schema, to make any useful suggestions.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I appreciate all your help. I apologize for the the lack of information.

    In the future I will try to post more relevant information if possible.

  • tvanharp (9/2/2009)


    Created a query that selects about 400,000 rows.

    Initially query was setup to select a couple columns and completed in about 7 seconds. The query used a non-clustered seek. Row size was less than 20 bytes

    The key column and the included columns on the index covered the query.

    I need 33 of the 36 columns in that table. I created a clustered index which included, as part of the key, the column I am using in my where clause. Now the query runs in about 17 seconds. It is doing a clustered index seek. The returned row size is about 100 bytes.

    Not sure if the increase in rows size should account for this much time increase.

    I feel the query should still run faster but not sure what to change at this point.

    Let me try to make a useful suggestion, given what we have:

    Generally, a query which can be satisfied from a covering non-clustered index will be the fastest possible access path. This is because the NC index will usually pack the most rows per 8K page (assuming nothing daft is done with the FILLFACTOR!)

    The leaf level of the clustered index contains the data, so it is as wide as a complete row. This makes it less dense (fewer rows per 8K page) so more page operations will be required than if the narrower covering NC index was used as the access path.

    There are all sorts of caveats around the above, but that is the general picture. All things being otherwise equal, the NC index will be more efficient, and complete faster.

    Let's assume that each NC index row is 25 bytes (including all overheads) per row and the clustered index (the whole row, including all overheads, and not just the 100-ish bytes of returned data) is 150 bytes wide. Very roughly, six times more rows will fit in the NC structure, compared to the clustered index. So if the NC index seek required 10,000 8K pages to be read, the clustered index method would fetch around 60,000 pages.

    So, taking 2.5 times as long seems quite good.

    As far as making it run fast is concerned, the key thing is to identify the bottleneck. The most common thing is to ensure than the indexes (clustered and non-clustered) have as little logical and physical fragmentation as possible. Rebuild or reorganize as appropriate and consider setting a FILLFACTOR which suits your overall requirements.

    I could go on...but no execution plan :w00t:

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply