December 27, 2007 at 5:29 pm
Hi All,
I was hoping to get some input on how to tune this query to get the optimal performance possible. I will be pivoting the result set in my client app, so I would like the query to be as snappy as possible. The code sample below contains some sample data for testing purposes.
DECLARE@raw TABLE (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64))
INSERT@raw
SELECT1, 'Sample', 'FAIL' UNION ALL
SELECT2, 'trial', 'PASS' UNION ALL
SELECT3, 'Stride', 'Error' UNION ALL
SELECT4, 'Regression', 'PASS' UNION ALL
SELECT5, 'Regression', 'PASS' UNION ALL
SELECT6, 'Setup', 'Waived' UNION ALL
SELECT7, 'Setup', 'Defered' UNION ALL
SELECT8, 'Setup', 'Error' UNION ALL
SELECT9, 'Regression', 'Error' UNION ALL
SELECT10, 'Setup', 'Error' UNION ALL
SELECT11, 'Unit', 'Ready' UNION ALL
SELECT12, 'Unit', 'Ready' UNION ALL
SELECT13, 'Unit', 'Ready' UNION ALL
SELECT14, 'Unit', 'Ready'
SET NOCOUNT ON
SELECT TV.Feature,
TV.RunStatus,
100.00 * FeatureStatusSum.DistinctSum /
(SELECT COUNT(TestListID) AS FeatureCount
FROM @raw AS tstvw
WHERE (TV.Feature = Feature)) AS Percentage
FROM @raw AS TV INNER JOIN
(SELECT DISTINCT Feature, RunStatus, COUNT(TestListID) AS DistinctSum
FROM @raw AS TV
GROUP BY Feature, RunStatus) AS FeatureStatusSum ON
TV.Feature = FeatureStatusSum.Feature AND
TV.RunStatus = FeatureStatusSum.RunStatus
GROUP BY TV.Feature, TV.RunStatus, FeatureStatusSum.DistinctSum
ORDER BY TV.Feature, TV.RunStatus
IF @@ROWCOUNT = 0
RETURN
The query will be going into a stored procedure. In my production DB, the @raw variable will be replaced with an existing view. In one of my databases for a project which is almost finished, this view contains ~950,000 rows with ~1700 distinct "feature" column values and ~60 distinct "runstatus" column values. I ran the query on this database several times, and it seems to be consistently taking ~10 seconds.
If anyone has any input or suggestions on how to make this better, it would be greatly appreciated.
Thanks,
Justin
December 27, 2007 at 8:00 pm
I can help on the Pivot... but your real performance problem may be the view. If you do a SELECT TOP 10 from the view, how long does it take before it returns the 10 rows?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2007 at 8:02 pm
... and are you saying that the query you posted is returning the data you want in the format you want it in?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2007 at 8:40 pm
Jeff, thanks for you response. Select top 10 from the view gives the following (in ms i believe):
[font="Courier New"]Time Statistics
Client processing time0
Total execution time968
Wait time on server replies968[/font]
The query does produce the desired output format. I tried to do the pivot in sql, but I kept running into the 8000 character limit for my dynamic sql strings. I created another thread on these forums at http://www.sqlservercentral.com/Forums/Topic421722-169-1.aspx, but i didn't get very far there. Actually the work that I did on that attempt was based on a pivot example that you posted, so thanks for your assistance there.
In some searching, I came across a blog post with some sample code to do pivots on the client in C# (which is what my tools are written in). I thought I would give that a try since I had no luck with the dynamic sql approach. If you want to give that blog a look, it can be found at: http://weblogs.sqlteam.com/jeffs/articles/5091.aspx.
-Justin
December 27, 2007 at 9:32 pm
Are you saying that you'd rather have the run status in columns instead of rows?
In other words, each row would be a Feature and each column would be a RunStatus percentage?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2007 at 10:04 pm
What I am looking for as the final result which will be displayed to the users of my app should look like this:
Feature Defered Error FAIL PASS Ready Waived
------- ------- ----- ---- ---- ----- ------
Regression 0.0 33.3 0.0 66.6 0.0 0.0
Sample 0.0 0.0 100.0 0.0 0.0 0.0
Setup 25.0 50.0 0.0 0.0 0.0 25.0
Stride 0.0 100.0 0.0 0.0 0.0 0.0
trial 0.0 0.0 0.0 100.0 0.0 0.0
Unit 0.0 0.0 0.0 0.0 100.0 0.0
This is, however; after I perform a pivot in my client app. What I want the sql server to return should look like this:
Feature Status Percent
------- ------ -------
Regression Error 33.33
Regression PASS 66.66
Sample FAIL 100.00
Setup Defered 25.00
Setup Error 50.00
Setup Waived 25.00
Stride Error 100.00
trial PASS 100.00
Unit Ready 100.00
As I said before, I previously tried to do the pivot with some dynamic sql in the server, but was unable to because I kept running into the 8000 character limit of strings.
-Justin
December 27, 2007 at 10:16 pm
...and, you're saying this didn't work?
CREATE TABLE Raw (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64),
CONSTRAINT PK_Raw_TestListID PRIMARY KEY CLUSTERED (TestListID))
INSERTRaw
SELECT1, 'Sample', 'FAIL' UNION ALL
SELECT2, 'trial', 'PASS' UNION ALL
SELECT3, 'Stride', 'Error' UNION ALL
SELECT4, 'Regression', 'PASS' UNION ALL
SELECT5, 'Regression', 'PASS' UNION ALL
SELECT6, 'Setup', 'Waived' UNION ALL
SELECT7, 'Setup', 'Defered' UNION ALL
SELECT8, 'Setup', 'Error' UNION ALL
SELECT9, 'Regression', 'Error' UNION ALL
SELECT10, 'Setup', 'Error' UNION ALL
SELECT11, 'Unit', 'Ready' UNION ALL
SELECT12, 'Unit', 'Ready' UNION ALL
SELECT13, 'Unit', 'Ready' UNION ALL
SELECT14, 'Unit', 'Ready'
CREATE INDEX IX_RAW_Feature_RunStatus ON Raw (Feature,RunStatus)
DECLARE @SQL1 VARCHAR(8000)
DECLARE @SQL2 VARCHAR(8000)
DECLARE @SQL3 VARCHAR(8000)
SELECT @SQL1 = ' SELECT Feature,'+CHAR(10)
SELECT @SQL2 = ISNULL(@SQL2+','+CHAR(10),'')
+ 'STR(SUM(CASE WHEN RS=' + QUOTENAME(rs.RunStatus,'''')
+' THEN P ELSE 0 END),6,2)' + QUOTENAME(rs.RunStatus)
FROM (SELECT DISTINCT RunStatus FROM Raw) rs
ORDER BY rs.RunStatus
SELECT @SQL3 =
'
FROM (SELECT r.Feature,r.RunStatus AS RS,
100.0*COUNT(*)/MAX(fc.FeatureCount) AS P
FROM Raw r
INNER JOIN
(
SELECT Feature, COUNT(*) AS FeatureCount
FROM Raw
GROUP BY Feature
)fc
ON r.Feature = fc.Feature
GROUP BY r.Feature,r.RunStatus
)d
GROUP BY Feature'
--PRINT @SQL1+@SQL2+@SQL3
EXEC (@SQL1+@SQL2+@SQL3)
Post a list of your Distinct RunStatuses in the form of SELECT 'xxxxxxxx' UNION ALL please... lemme see what I can do...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2007 at 10:27 pm
Here's a slightly better rendition based on what you just posted...
Should be good for about 132 columns of RunStatuses not including Feature.
DROP TABLE Raw
GO
CREATE TABLE Raw (TestListID INT, Feature VARCHAR(256), RunStatus VARCHAR(64),
CONSTRAINT PK_Raw_TestListID PRIMARY KEY CLUSTERED (TestListID))
INSERTRaw
SELECT1, 'Sample', 'FAIL' UNION ALL
SELECT2, 'trial', 'PASS' UNION ALL
SELECT3, 'Stride', 'Error' UNION ALL
SELECT4, 'Regression', 'PASS' UNION ALL
SELECT5, 'Regression', 'PASS' UNION ALL
SELECT6, 'Setup', 'Waived' UNION ALL
SELECT7, 'Setup', 'Defered' UNION ALL
SELECT8, 'Setup', 'Error' UNION ALL
SELECT9, 'Regression', 'Error' UNION ALL
SELECT10, 'Setup', 'Error' UNION ALL
SELECT11, 'Unit', 'Ready' UNION ALL
SELECT12, 'Unit', 'Ready' UNION ALL
SELECT13, 'Unit', 'Ready' UNION ALL
SELECT14, 'Unit', 'Ready'
CREATE INDEX IX_RAW_Feature_RunStatus ON Raw (Feature,RunStatus)
DECLARE @MaxFeatureLen VARCHAR(10)
SELECT @MaxFeatureLen = MAX(LEN(Feature)) FROM Raw
DECLARE @SQL1 VARCHAR(8000)
DECLARE @SQL2 VARCHAR(8000)
DECLARE @SQL3 VARCHAR(8000)
SELECT @SQL1 = ' SELECT CAST(Feature AS VARCHAR('+ @MaxFeatureLen +')) AS Feature,'+CHAR(10)
SELECT @SQL2 = ISNULL(@SQL2+','+CHAR(10),'')
+ 'STR(SUM(CASE WHEN RS=' + QUOTENAME(rs.RunStatus,'''')
+' THEN P ELSE 0 END),5,1)' + QUOTENAME(rs.RunStatus)
FROM (SELECT DISTINCT RunStatus FROM Raw) rs
ORDER BY rs.RunStatus
SELECT @SQL3 =
'
FROM (SELECT r.Feature,r.RunStatus AS RS,
100.0*COUNT(*)/MAX(fc.FeatureCount) AS P
FROM Raw r
INNER JOIN
(
SELECT Feature, COUNT(*) AS FeatureCount
FROM Raw
GROUP BY Feature
)fc
ON r.Feature = fc.Feature
GROUP BY r.Feature,r.RunStatus
)d
GROUP BY Feature'
--PRINT @SQL1+@SQL2+@SQL3
EXEC (@SQL1+@SQL2+@SQL3)
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2007 at 10:42 pm
Thanks for taking the time to look at this Jeff, I really appreciate it. Your example works for the small sample data set in this post, but it doesn't work for the large dataset in my production database. In the production DB, I have ~1700 features and ~60 run statuses resulting in ~5700 distinct feature/status combinations. With this number of combinations, the sql strings get filled up extremely quickly. Also, the features and run statuses can grow or shrink dynamically. The numbers above are from a project that we are basically done with, and all future projects have ever increasing features.
Would you still like a snapshot of the actual data that I'll be using? It will probably be too large to post here, but I can email it to you. It won't be the actual data anyway, I'll have to fudge it a bit (work is pretty strict on what we are allowed to make externally visible).
-Justin
December 27, 2007 at 11:01 pm
Did you actually try the code I wrote? Like I said, it should be good for about 132 run statuses... number of features don't matter in this code...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2007 at 11:04 pm
...and all I really need is a distinct list of what you currently have for RunStatuses...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 28, 2007 at 12:41 am
The original query can be optimized as:
SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / B.FeatureCount
FROM @raw AS TV
INNER JOIN (SELECT Z.Feature, FeatureCount = COUNT(*) FROM @raw Z GROUP BY Z.Feature) B ON B.Feature = TV.Feature
GROUP BY TV.Feature, TV.RunStatus, B.FeatureCount
ORDER BY TV.Feature, TV.RunStatus
December 28, 2007 at 1:11 am
Exactly what I did in my cross-tab code... nice, job, Koji...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 28, 2007 at 1:41 am
Jeff Moden (12/28/2007)
Exactly what I did in my cross-tab code... nice, job, Koji...
Jeff,
with SET STATISTICS IO ON
SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / B.FeatureCount
:
GROUP BY TV.Feature, TV.RunStatus, B.FeatureCount
:
Mine above shows
Table '#398D8EEE'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0.
SELECT TV.Feature, TV.RunStatus, 100.00 * COUNT(*) / MAX(B.FeatureCount)
:
GROUP BY TV.Feature, TV.RunStatus
:
Yours above shows
Table 'Worktable'. Scan count 5, logical reads 21, physical reads 0, read-ahead reads 0.
Table '#398D8EEE'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0.
Execution plans are also different.
I wonder if it makes a difference when data volume is large.
Regards
December 28, 2007 at 10:06 am
Jeff, your cross tab code does not work in my production DB. @sql2 gets filled before it is done consuming all of the run statuses. I have attached a list of all of the run statuses in this DB. @sql2 gets filled up on the 54th of 61 run statuses. Other than that, it works great.
Koji, thanks for your reply. I tried your code, and it cut the execution time to about 6 seconds from 10 seconds in my production DB. Thanks!!!
-Justin
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply