Nested Select Statement Optimization

Question

Nested Select Statement Optimization

jones.justinw

Say Hey Kid

Points: 708
More actions
December 27, 2007 at 5:29 pm

#154037

Hi All,
I was hoping to get some input on how to tune this query to get the optimal performance possible. I will be pivoting the result set in my client app, so I would like the query to be as snappy as possible. The code sample below contains some sample data for testing purposes.
DECLARE@Raw TABLE (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64))
INSERT@Raw
SELECT1, 'Sample', 'FAIL' UNION ALL
SELECT2, 'trial', 'PASS' UNION ALL
SELECT3, 'Stride', 'Error' UNION ALL
SELECT4, 'Regression', 'PASS' UNION ALL
SELECT5, 'Regression', 'PASS' UNION ALL
SELECT6, 'Setup', 'Waived' UNION ALL
SELECT7, 'Setup', 'Defered' UNION ALL
SELECT8, 'Setup', 'Error' UNION ALL
SELECT9, 'Regression', 'Error' UNION ALL
SELECT10, 'Setup', 'Error' UNION ALL
SELECT11, 'Unit', 'Ready' UNION ALL
SELECT12, 'Unit', 'Ready' UNION ALL
SELECT13, 'Unit', 'Ready' UNION ALL
SELECT14, 'Unit', 'Ready'
SET NOCOUNT ON
SELECT TV.Feature,
TV.RunStatus,
100.00 * FeatureStatusSum.DistinctSum /
(SELECT COUNT(TestListID) AS FeatureCount
FROM @Raw AS tstvw
WHERE (TV.Feature = Feature)) AS Percentage
FROM @Raw AS TV INNER JOIN
(SELECT DISTINCT Feature, RunStatus, COUNT(TestListID) AS DistinctSum
FROM @Raw AS TV
GROUP BY Feature, RunStatus) AS FeatureStatusSum ON
TV.Feature = FeatureStatusSum.Feature AND
TV.RunStatus = FeatureStatusSum.RunStatus
GROUP BY TV.Feature, TV.RunStatus, FeatureStatusSum.DistinctSum
ORDER BY TV.Feature, TV.RunStatus
IF @@ROWCOUNT = 0
RETURN
The query will be going into a stored procedure. In my production DB, the @Raw variable will be replaced with an existing view. In one of my databases for a project which is almost finished, this view contains ~950,000 rows with ~1700 distinct "feature" column values and ~60 distinct "runstatus" column values. I ran the query on this database several times, and it seems to be consistently taking ~10 seconds.
If anyone has any input or suggestions on how to make this better, it would be greatly appreciated.
Thanks,
Justin

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 1

I can help on the Pivot... but your real performance problem may be the view. If you do a SELECT TOP 10 from the view, how long does it take before it returns the 10 rows?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 2

... and are you saying that the query you posted is returning the data you want in the format you want it in?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jones.justinw Say Hey Kid Points: 708 More actions · Answer 3

Jeff, thanks for you response. Select top 10 from the view gives the following (in ms i believe):

[font="Courier New"]Time Statistics

Client processing time0

Total execution time968

Wait time on server replies968[/font]

The query does produce the desired output format. I tried to do the pivot in sql, but I kept running into the 8000 character limit for my dynamic sql strings. I created another thread on these forums at http://www.sqlservercentral.com/Forums/Topic421722-169-1.aspx, but i didn't get very far there. Actually the work that I did on that attempt was based on a pivot example that you posted, so thanks for your assistance there.

In some searching, I came across a blog post with some sample code to do pivots on the client in C# (which is what my tools are written in). I thought I would give that a try since I had no luck with the dynamic sql approach. If you want to give that blog a look, it can be found at: http://weblogs.sqlteam.com/jeffs/articles/5091.aspx.

-Justin

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

Are you saying that you'd rather have the run status in columns instead of rows?

In other words, each row would be a Feature and each column would be a RunStatus percentage?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jones.justinw Say Hey Kid Points: 708 More actions · Answer 5

What I am looking for as the final result which will be displayed to the users of my app should look like this:

Feature Defered Error FAIL PASS Ready Waived

------- ------- ----- ---- ---- ----- ------

Regression 0.0 33.3 0.0 66.6 0.0 0.0

Sample 0.0 0.0 100.0 0.0 0.0 0.0

Setup 25.0 50.0 0.0 0.0 0.0 25.0

Stride 0.0 100.0 0.0 0.0 0.0 0.0

trial 0.0 0.0 0.0 100.0 0.0 0.0

Unit 0.0 0.0 0.0 0.0 100.0 0.0

This is, however; after I perform a pivot in my client app. What I want the sql server to return should look like this:

Feature Status Percent

------- ------ -------

Regression Error 33.33

Regression PASS 66.66

Sample FAIL 100.00

Setup Defered 25.00

Setup Error 50.00

Setup Waived 25.00

Stride Error 100.00

trial PASS 100.00

Unit Ready 100.00

As I said before, I previously tried to do the pivot with some dynamic sql in the server, but was unable to because I kept running into the 8000 character limit of strings.

-Justin

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 6

...and, you're saying this didn't work?

CREATE TABLE Raw (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64),

CONSTRAINT PK_Raw_TestListID PRIMARY KEY CLUSTERED (TestListID))

INSERTRaw

SELECT1, 'Sample', 'FAIL' UNION ALL

SELECT2, 'trial', 'PASS' UNION ALL

SELECT3, 'Stride', 'Error' UNION ALL

SELECT4, 'Regression', 'PASS' UNION ALL

SELECT5, 'Regression', 'PASS' UNION ALL

SELECT6, 'Setup', 'Waived' UNION ALL

SELECT7, 'Setup', 'Defered' UNION ALL

SELECT8, 'Setup', 'Error' UNION ALL

SELECT9, 'Regression', 'Error' UNION ALL

SELECT10, 'Setup', 'Error' UNION ALL

SELECT11, 'Unit', 'Ready' UNION ALL

SELECT12, 'Unit', 'Ready' UNION ALL

SELECT13, 'Unit', 'Ready' UNION ALL

SELECT14, 'Unit', 'Ready'

CREATE INDEX IX_RAW_Feature_RunStatus ON Raw (Feature,RunStatus)

DECLARE @SQL1 VARCHAR(8000)

DECLARE @SQL2 VARCHAR(8000)

DECLARE @SQL3 VARCHAR(8000)

SELECT @SQL1 = ' SELECT Feature,'+CHAR(10)

SELECT @SQL2 = ISNULL(@SQL2+','+CHAR(10),'')

+ 'STR(SUM(CASE WHEN RS=' + QUOTENAME(rs.RunStatus,'''')

+' THEN P ELSE 0 END),6,2)' + QUOTENAME(rs.RunStatus)

FROM (SELECT DISTINCT RunStatus FROM Raw) rs

ORDER BY rs.RunStatus

SELECT @SQL3 =

'

FROM (SELECT r.Feature,r.RunStatus AS RS,

100.0*COUNT(*)/MAX(fc.FeatureCount) AS P

FROM Raw r

INNER JOIN

(

SELECT Feature, COUNT(*) AS FeatureCount

FROM Raw

GROUP BY Feature

)fc

ON r.Feature = fc.Feature

GROUP BY r.Feature,r.RunStatus

)d

GROUP BY Feature'

--PRINT @SQL1+@SQL2+@SQL3

EXEC (@SQL1+@SQL2+@SQL3)

Post a list of your Distinct RunStatuses in the form of SELECT 'xxxxxxxx' UNION ALL please... lemme see what I can do...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

Here's a slightly better rendition based on what you just posted...

Should be good for about 132 columns of RunStatuses not including Feature.

DROP TABLE Raw

GO

CREATE TABLE Raw (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64),

CONSTRAINT PK_Raw_TestListID PRIMARY KEY CLUSTERED (TestListID))

INSERTRaw

SELECT1, 'Sample', 'FAIL' UNION ALL

SELECT2, 'trial', 'PASS' UNION ALL

SELECT3, 'Stride', 'Error' UNION ALL

SELECT4, 'Regression', 'PASS' UNION ALL

SELECT5, 'Regression', 'PASS' UNION ALL

SELECT6, 'Setup', 'Waived' UNION ALL

SELECT7, 'Setup', 'Defered' UNION ALL

SELECT8, 'Setup', 'Error' UNION ALL

SELECT9, 'Regression', 'Error' UNION ALL

SELECT10, 'Setup', 'Error' UNION ALL

SELECT11, 'Unit', 'Ready' UNION ALL

SELECT12, 'Unit', 'Ready' UNION ALL

SELECT13, 'Unit', 'Ready' UNION ALL

SELECT14, 'Unit', 'Ready'

CREATE INDEX IX_RAW_Feature_RunStatus ON Raw (Feature,RunStatus)

DECLARE @MaxFeatureLen VARCHAR(10)

SELECT @MaxFeatureLen = MAX(LEN(Feature)) FROM Raw

DECLARE @SQL1 VARCHAR(8000)

DECLARE @SQL2 VARCHAR(8000)

DECLARE @SQL3 VARCHAR(8000)

SELECT @SQL1 = ' SELECT CAST(Feature AS VARCHAR('+ @MaxFeatureLen +')) AS Feature,'+CHAR(10)

SELECT @SQL2 = ISNULL(@SQL2+','+CHAR(10),'')

+ 'STR(SUM(CASE WHEN RS=' + QUOTENAME(rs.RunStatus,'''')

+' THEN P ELSE 0 END),5,1)' + QUOTENAME(rs.RunStatus)

FROM (SELECT DISTINCT RunStatus FROM Raw) rs

ORDER BY rs.RunStatus

SELECT @SQL3 =

'

FROM (SELECT r.Feature,r.RunStatus AS RS,

100.0*COUNT(*)/MAX(fc.FeatureCount) AS P

FROM Raw r

INNER JOIN

(

SELECT Feature, COUNT(*) AS FeatureCount

FROM Raw

GROUP BY Feature

)fc

ON r.Feature = fc.Feature

GROUP BY r.Feature,r.RunStatus

)d

GROUP BY Feature'

--PRINT @SQL1+@SQL2+@SQL3

EXEC (@SQL1+@SQL2+@SQL3)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jones.justinw Say Hey Kid Points: 708 More actions · Answer 8

Thanks for taking the time to look at this Jeff, I really appreciate it. Your example works for the small sample data set in this post, but it doesn't work for the large dataset in my production database. In the production DB, I have ~1700 features and ~60 run statuses resulting in ~5700 distinct feature/status combinations. With this number of combinations, the sql strings get filled up extremely quickly. Also, the features and run statuses can grow or shrink dynamically. The numbers above are from a project that we are basically done with, and all future projects have ever increasing features.

Would you still like a snapshot of the actual data that I'll be using? It will probably be too large to post here, but I can email it to you. It won't be the actual data anyway, I'll have to fudge it a bit (work is pretty strict on what we are allowed to make externally visible).

-Justin

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

Did you actually try the code I wrote? Like I said, it should be good for about 132 run statuses... number of features don't matter in this code...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

...and all I really need is a distinct list of what you currently have for RunStatuses...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Koji Matsumura SSCrazy Points: 2241 More actions · Answer 11

The original query can be optimized as:

SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / B.FeatureCount

FROM @Raw AS TV

INNER JOIN (SELECT Z.Feature, FeatureCount = COUNT(*) FROM @Raw Z GROUP BY Z.Feature) B ON B.Feature = TV.Feature

GROUP BY TV.Feature, TV.RunStatus, B.FeatureCount

ORDER BY TV.Feature, TV.RunStatus

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 12

Exactly what I did in my cross-tab code... nice, job, Koji...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Koji Matsumura SSCrazy Points: 2241 More actions · Answer 13

Jeff Moden (12/28/2007)
Exactly what I did in my cross-tab code... nice, job, Koji...

Jeff,

with SET STATISTICS IO ON

SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / B.FeatureCount

:

GROUP BY TV.Feature, TV.RunStatus, B.FeatureCount

:

Mine above shows

Table '#398D8EEE'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0.

SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / MAX(B.FeatureCount)

:

GROUP BY TV.Feature, TV.RunStatus

:

Yours above shows

Table 'Worktable'. Scan count 5, logical reads 21, physical reads 0, read-ahead reads 0.

Table '#398D8EEE'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0.

Execution plans are also different.

I wonder if it makes a difference when data volume is large.

Regards

jones.justinw Say Hey Kid Points: 708 More actions · Answer 14

Jeff, your cross tab code does not work in my production DB. @sql2 gets filled before it is done consuming all of the run statuses. I have attached a list of all of the run statuses in this DB. @sql2 gets filled up on the 54th of 61 run statuses. Other than that, it works great.

Koji, thanks for your reply. I tried your code, and it cut the execution time to about 6 seconds from 10 seconds in my production DB. Thanks!!!

-Justin