Nested Select Statement Optimization

  • Hi All,

    I was hoping to get some input on how to tune this query to get the optimal performance possible. I will be pivoting the result set in my client app, so I would like the query to be as snappy as possible. The code sample below contains some sample data for testing purposes.

    DECLARE@raw TABLE (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64))

    INSERT@raw

    SELECT1, 'Sample', 'FAIL' UNION ALL

    SELECT2, 'trial', 'PASS' UNION ALL

    SELECT3, 'Stride', 'Error' UNION ALL

    SELECT4, 'Regression', 'PASS' UNION ALL

    SELECT5, 'Regression', 'PASS' UNION ALL

    SELECT6, 'Setup', 'Waived' UNION ALL

    SELECT7, 'Setup', 'Defered' UNION ALL

    SELECT8, 'Setup', 'Error' UNION ALL

    SELECT9, 'Regression', 'Error' UNION ALL

    SELECT10, 'Setup', 'Error' UNION ALL

    SELECT11, 'Unit', 'Ready' UNION ALL

    SELECT12, 'Unit', 'Ready' UNION ALL

    SELECT13, 'Unit', 'Ready' UNION ALL

    SELECT14, 'Unit', 'Ready'

    SET NOCOUNT ON

    SELECT TV.Feature,

    TV.RunStatus,

    100.00 * FeatureStatusSum.DistinctSum /

    (SELECT COUNT(TestListID) AS FeatureCount

    FROM @raw AS tstvw

    WHERE (TV.Feature = Feature)) AS Percentage

    FROM @raw AS TV INNER JOIN

    (SELECT DISTINCT Feature, RunStatus, COUNT(TestListID) AS DistinctSum

    FROM @raw AS TV

    GROUP BY Feature, RunStatus) AS FeatureStatusSum ON

    TV.Feature = FeatureStatusSum.Feature AND

    TV.RunStatus = FeatureStatusSum.RunStatus

    GROUP BY TV.Feature, TV.RunStatus, FeatureStatusSum.DistinctSum

    ORDER BY TV.Feature, TV.RunStatus

    IF @@ROWCOUNT = 0

    RETURN

    The query will be going into a stored procedure. In my production DB, the @raw variable will be replaced with an existing view. In one of my databases for a project which is almost finished, this view contains ~950,000 rows with ~1700 distinct "feature" column values and ~60 distinct "runstatus" column values. I ran the query on this database several times, and it seems to be consistently taking ~10 seconds.

    If anyone has any input or suggestions on how to make this better, it would be greatly appreciated.

    Thanks,

    Justin

  • I can help on the Pivot... but your real performance problem may be the view. If you do a SELECT TOP 10 from the view, how long does it take before it returns the 10 rows?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ... and are you saying that the query you posted is returning the data you want in the format you want it in?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, thanks for you response. Select top 10 from the view gives the following (in ms i believe):

    [font="Courier New"]Time Statistics

    Client processing time0

    Total execution time968

    Wait time on server replies968[/font]

    The query does produce the desired output format. I tried to do the pivot in sql, but I kept running into the 8000 character limit for my dynamic sql strings. I created another thread on these forums at http://www.sqlservercentral.com/Forums/Topic421722-169-1.aspx, but i didn't get very far there. Actually the work that I did on that attempt was based on a pivot example that you posted, so thanks for your assistance there.

    In some searching, I came across a blog post with some sample code to do pivots on the client in C# (which is what my tools are written in). I thought I would give that a try since I had no luck with the dynamic sql approach. If you want to give that blog a look, it can be found at: http://weblogs.sqlteam.com/jeffs/articles/5091.aspx.

    -Justin

  • Are you saying that you'd rather have the run status in columns instead of rows?

    In other words, each row would be a Feature and each column would be a RunStatus percentage?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What I am looking for as the final result which will be displayed to the users of my app should look like this:

    Feature Defered Error FAIL PASS Ready Waived

    ------- ------- ----- ---- ---- ----- ------

    Regression 0.0 33.3 0.0 66.6 0.0 0.0

    Sample 0.0 0.0 100.0 0.0 0.0 0.0

    Setup 25.0 50.0 0.0 0.0 0.0 25.0

    Stride 0.0 100.0 0.0 0.0 0.0 0.0

    trial 0.0 0.0 0.0 100.0 0.0 0.0

    Unit 0.0 0.0 0.0 0.0 100.0 0.0

    This is, however; after I perform a pivot in my client app. What I want the sql server to return should look like this:

    Feature Status Percent

    ------- ------ -------

    Regression Error 33.33

    Regression PASS 66.66

    Sample FAIL 100.00

    Setup Defered 25.00

    Setup Error 50.00

    Setup Waived 25.00

    Stride Error 100.00

    trial PASS 100.00

    Unit Ready 100.00

    As I said before, I previously tried to do the pivot with some dynamic sql in the server, but was unable to because I kept running into the 8000 character limit of strings.

    -Justin

  • ...and, you're saying this didn't work?

    CREATE TABLE Raw (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64),

    CONSTRAINT PK_Raw_TestListID PRIMARY KEY CLUSTERED (TestListID))

    INSERTRaw

    SELECT1, 'Sample', 'FAIL' UNION ALL

    SELECT2, 'trial', 'PASS' UNION ALL

    SELECT3, 'Stride', 'Error' UNION ALL

    SELECT4, 'Regression', 'PASS' UNION ALL

    SELECT5, 'Regression', 'PASS' UNION ALL

    SELECT6, 'Setup', 'Waived' UNION ALL

    SELECT7, 'Setup', 'Defered' UNION ALL

    SELECT8, 'Setup', 'Error' UNION ALL

    SELECT9, 'Regression', 'Error' UNION ALL

    SELECT10, 'Setup', 'Error' UNION ALL

    SELECT11, 'Unit', 'Ready' UNION ALL

    SELECT12, 'Unit', 'Ready' UNION ALL

    SELECT13, 'Unit', 'Ready' UNION ALL

    SELECT14, 'Unit', 'Ready'

    CREATE INDEX IX_RAW_Feature_RunStatus ON Raw (Feature,RunStatus)

    DECLARE @SQL1 VARCHAR(8000)

    DECLARE @SQL2 VARCHAR(8000)

    DECLARE @SQL3 VARCHAR(8000)

    SELECT @SQL1 = ' SELECT Feature,'+CHAR(10)

    SELECT @SQL2 = ISNULL(@SQL2+','+CHAR(10),'')

    + 'STR(SUM(CASE WHEN RS=' + QUOTENAME(rs.RunStatus,'''')

    +' THEN P ELSE 0 END),6,2)' + QUOTENAME(rs.RunStatus)

    FROM (SELECT DISTINCT RunStatus FROM Raw) rs

    ORDER BY rs.RunStatus

    SELECT @SQL3 =

    '

    FROM (SELECT r.Feature,r.RunStatus AS RS,

    100.0*COUNT(*)/MAX(fc.FeatureCount) AS P

    FROM Raw r

    INNER JOIN

    (

    SELECT Feature, COUNT(*) AS FeatureCount

    FROM Raw

    GROUP BY Feature

    )fc

    ON r.Feature = fc.Feature

    GROUP BY r.Feature,r.RunStatus

    )d

    GROUP BY Feature'

    --PRINT @SQL1+@SQL2+@SQL3

    EXEC (@SQL1+@SQL2+@SQL3)

    Post a list of your Distinct RunStatuses in the form of SELECT 'xxxxxxxx' UNION ALL please... lemme see what I can do...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's a slightly better rendition based on what you just posted...

    Should be good for about 132 columns of RunStatuses not including Feature.

    DROP TABLE Raw

    GO

    CREATE TABLE Raw (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64),

    CONSTRAINT PK_Raw_TestListID PRIMARY KEY CLUSTERED (TestListID))

    INSERTRaw

    SELECT1, 'Sample', 'FAIL' UNION ALL

    SELECT2, 'trial', 'PASS' UNION ALL

    SELECT3, 'Stride', 'Error' UNION ALL

    SELECT4, 'Regression', 'PASS' UNION ALL

    SELECT5, 'Regression', 'PASS' UNION ALL

    SELECT6, 'Setup', 'Waived' UNION ALL

    SELECT7, 'Setup', 'Defered' UNION ALL

    SELECT8, 'Setup', 'Error' UNION ALL

    SELECT9, 'Regression', 'Error' UNION ALL

    SELECT10, 'Setup', 'Error' UNION ALL

    SELECT11, 'Unit', 'Ready' UNION ALL

    SELECT12, 'Unit', 'Ready' UNION ALL

    SELECT13, 'Unit', 'Ready' UNION ALL

    SELECT14, 'Unit', 'Ready'

    CREATE INDEX IX_RAW_Feature_RunStatus ON Raw (Feature,RunStatus)

    DECLARE @MaxFeatureLen VARCHAR(10)

    SELECT @MaxFeatureLen = MAX(LEN(Feature)) FROM Raw

    DECLARE @SQL1 VARCHAR(8000)

    DECLARE @SQL2 VARCHAR(8000)

    DECLARE @SQL3 VARCHAR(8000)

    SELECT @SQL1 = ' SELECT CAST(Feature AS VARCHAR('+ @MaxFeatureLen +')) AS Feature,'+CHAR(10)

    SELECT @SQL2 = ISNULL(@SQL2+','+CHAR(10),'')

    + 'STR(SUM(CASE WHEN RS=' + QUOTENAME(rs.RunStatus,'''')

    +' THEN P ELSE 0 END),5,1)' + QUOTENAME(rs.RunStatus)

    FROM (SELECT DISTINCT RunStatus FROM Raw) rs

    ORDER BY rs.RunStatus

    SELECT @SQL3 =

    '

    FROM (SELECT r.Feature,r.RunStatus AS RS,

    100.0*COUNT(*)/MAX(fc.FeatureCount) AS P

    FROM Raw r

    INNER JOIN

    (

    SELECT Feature, COUNT(*) AS FeatureCount

    FROM Raw

    GROUP BY Feature

    )fc

    ON r.Feature = fc.Feature

    GROUP BY r.Feature,r.RunStatus

    )d

    GROUP BY Feature'

    --PRINT @SQL1+@SQL2+@SQL3

    EXEC (@SQL1+@SQL2+@SQL3)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for taking the time to look at this Jeff, I really appreciate it. Your example works for the small sample data set in this post, but it doesn't work for the large dataset in my production database. In the production DB, I have ~1700 features and ~60 run statuses resulting in ~5700 distinct feature/status combinations. With this number of combinations, the sql strings get filled up extremely quickly. Also, the features and run statuses can grow or shrink dynamically. The numbers above are from a project that we are basically done with, and all future projects have ever increasing features.

    Would you still like a snapshot of the actual data that I'll be using? It will probably be too large to post here, but I can email it to you. It won't be the actual data anyway, I'll have to fudge it a bit (work is pretty strict on what we are allowed to make externally visible).

    -Justin

  • Did you actually try the code I wrote? Like I said, it should be good for about 132 run statuses... number of features don't matter in this code...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ...and all I really need is a distinct list of what you currently have for RunStatuses...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The original query can be optimized as:

    SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / B.FeatureCount

    FROM @raw AS TV

    INNER JOIN (SELECT Z.Feature, FeatureCount = COUNT(*) FROM @raw Z GROUP BY Z.Feature) B ON B.Feature = TV.Feature

    GROUP BY TV.Feature, TV.RunStatus, B.FeatureCount

    ORDER BY TV.Feature, TV.RunStatus

  • Exactly what I did in my cross-tab code... nice, job, Koji...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/28/2007)


    Exactly what I did in my cross-tab code... nice, job, Koji...

    Jeff,

    with SET STATISTICS IO ON

    SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / B.FeatureCount

    :

    GROUP BY TV.Feature, TV.RunStatus, B.FeatureCount

    :

    Mine above shows

    Table '#398D8EEE'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0.

    SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / MAX(B.FeatureCount)

    :

    GROUP BY TV.Feature, TV.RunStatus

    :

    Yours above shows

    Table 'Worktable'. Scan count 5, logical reads 21, physical reads 0, read-ahead reads 0.

    Table '#398D8EEE'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0.

    Execution plans are also different.

    I wonder if it makes a difference when data volume is large.

    Regards

  • Jeff, your cross tab code does not work in my production DB. @sql2 gets filled before it is done consuming all of the run statuses. I have attached a list of all of the run statuses in this DB. @sql2 gets filled up on the 54th of 61 run statuses. Other than that, it works great.

    Koji, thanks for your reply. I tried your code, and it cut the execution time to about 6 seconds from 10 seconds in my production DB. Thanks!!!

    -Justin

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply