April 20, 2010 at 4:48 am
I'm a bit stuck - statistics isn't my strong point.
I want to create a report of a single column attribute (which can have 5 values) over a sequence of a fixed number of rows, then move the sequence on by one row and recalculate, then move on again to the end of the data.
So the output will be a percentage of each of the 5 attributes
lets say N is the number of rows in the window, then get the average of the number of each attribute (NoVals/N)*100 to get the percentage in that set of N
Example output required.
/*
StartSeq attr1 attr2 attr3 attr4 attr5 Description
-------- ----- ----- ----- ----- ----- -----------------------
1 20% 20% 20% 40% 0% averages of rows 1 to N
2 21% 20% 19% 40% 0% averages of rows 2 to N+1
3 21% 20% 19% 39% 1% averages of rows 3 to N+2
...
X 11% 10% 29% 39% 11% averages of rows X to N+X-1
*/
I hope this makes sense.
Any pointers, references to articles or scripts or even the proper statistical terms for this would help.
April 20, 2010 at 5:46 am
April 20, 2010 at 8:10 am
Thanks exactly what I needed, I've read and inwardly digested the article.
I created test data and worked out both methods - and on the test data I found the performance improvement on method #2 as described in the article.
However, on the live data - method #2 takes just as long as method #2, there is the expected reduction in number of reads, but it still runs just as slowly as method #1
I've attached my Test Data and the plan for both test and live - Live has more columns different column names, and foreign keys
The major difference between test and live, is the clustered index on [Seqeunce Number] on test, and a plain index on [Sequence Number] on live - but surely that can't account for the huge performance difference.
Edit: I've had to delete the LivePlan as it contains sensitive information. I'll try and put a screenshot up instead.
April 20, 2010 at 8:39 am
Attached screenshot of execution plans for
Top: Live (with nonclustered index)
Middle: Test (with clustered index )
End: Live (with clustered index)
Showing that the clustered index makes little or no difference to the plan on live.
Is there some way of finding out why this plan is chosen over the more efficient test version?
April 20, 2010 at 7:40 pm
Hey Tom,
The article has a small error when it comes to using Method 2: a clustered index is *required* to take advantage of the Quirky Update method. This is noted in the discussion thread for the article, but the article text was never corrected to add the required index.
I'm not saying that is the reason for the behaviour you observed - but it is a very important point.
I had a quick look at your script and the execution plans, and I believe the cause to be the table spool. Please try the following:
SET NOCOUNT ON;
GO
-- moving aggregates method #1 --------------------
DECLARE @intervals INTEGER;
SET @intervals = 20;
SET STATISTICS IO ON;
IF OBJECT_ID(N'tempdb..#DataTable', N'U')
IS NOT NULL
DROP TABLE #DataTable;
CREATE TABLE #DataTable
(
Seq INTEGER NOT NULL PRIMARY KEY,
A INTEGER NOT NULL,
B INTEGER NOT NULL,
C INTEGER NOT NULL,
D INTEGER NOT NULL
);
INSERT #DataTable (Seq, A, B, C, D)
SELECT [Sequence Number], A, B, C, D
FROM dbo.MonitorData
PIVOT (
COUNT(VModel)
FOR VModel IN ([A], , [C], [D])
) AS P;
WITH Totals
AS (
SELECT D.Seq AS StartSeq,
CAST(SUM(T.B) AS DECIMAL(10,4)) AS TotB,
CAST(SUM(T.A) AS DECIMAL(10,4)) AS TotA,
CAST(SUM(T.C) AS DECIMAL(10,4)) AS TotC,
CAST(SUM(T.D) AS DECIMAL(10,4)) AS TotD,
CAST(SUM(T.B) + SUM(T.A) + SUM(T.C) + SUM(T.D) AS DECIMAL(10,4)) AS Tot
FROM #DataTable AS D
JOIN #DataTable AS T
ON T.Seq >= D.Seq
AND T.Seq < D.Seq + @intervals
GROUP BY D.Seq
)
SELECT StartSeq,
CAST((TotA * 100.0)/ Tot AS DECIMAL(10,4)) AS PcntA,
CAST((TotB * 100.0)/ Tot AS DECIMAL(10,4)) AS PcntB,
CAST((TotC * 100.0)/ Tot AS DECIMAL(10,4)) AS PcntC,
CAST((TotD * 100.0)/ Tot AS DECIMAL(10,4)) AS PcntD
FROM Totals
ORDER BY
StartSeq
OPTION (RECOMPILE);
DROP TABLE #DataTable;
GO
-- moving aggregates method #2 --------------------
DECLARE @intervals INTEGER;
SET @intervals = 20;
DECLARE @anchor INTEGER;
DECLARE @init_B DECIMAL(10,4),
@init_A DECIMAL(10,4),
@init_C DECIMAL(10,4),
@init_D DECIMAL(10,4);
DECLARE @moving_B DECIMAL(10,4),
@moving_A DECIMAL(10,4),
@moving_C DECIMAL(10,4),
@moving_D DECIMAL(10,4);
IF OBJECT_ID(N'tempdb..#DataTable', N'U')
IS NOT NULL
DROP TABLE #DataTable;
IF OBJECT_ID(N'tempdb..#Tmp1', N'U')
IS NOT NULL
DROP TABLE #Tmp1;
CREATE TABLE #DataTable
(
Seq INTEGER NOT NULL PRIMARY KEY,
A INTEGER NOT NULL,
B INTEGER NOT NULL,
C INTEGER NOT NULL,
D INTEGER NOT NULL
);
CREATE TABLE #Tmp1
(
-- CLUSTERED INDEX REQUIRED ON Seq FOR
-- THE QUIRKY UPDATE!
Seq INTEGER NOT NULL
PRIMARY KEY CLUSTERED,
A INTEGER NOT NULL,
B INTEGER NOT NULL,
C INTEGER NOT NULL,
D INTEGER NOT NULL,
pcntA DEC (10,4) NULL,
prevNA DEC (10,4) NULL,
pcntB DEC (10,4) NULL,
prevNB DEC (10,4) NULL,
pcntC DEC (10,4) NULL,
prevNC DEC (10,4) NULL,
pcntD DEC (10,4) NULL,
prevND DEC (10,4) NULL,
);
INSERT #DataTable (Seq, A, B, C, D)
SELECT [Sequence Number], A, B, C, D
FROM dbo.MonitorData
PIVOT (
COUNT(VModel)
FOR VModel IN ([A], , [C], [D])
) AS P;
INSERT #Tmp1
(
Seq,
A, B, C, D,
pcntA, pcntB, pcntC, pcntD,
prevNA, prevNB, prevNC, prevND
)
SELECT A.Seq,
A.A,
A.B,
A.C,
A.D,
CAST(NULL as DECIMAL(10,4)) [PcntA],
CAST(B.A AS DECIMAL(10,4)) [PrevNA],
CAST(NULL as DECIMAL(10,4)) [PcntB],
CAST(B.B AS DECIMAL(10,4)) [PrevNB],
CAST(NULL as DECIMAL(10,4)) [PcntC],
CAST(B.C AS DECIMAL(10,4)) [PrevNC],
CAST(NULL as DECIMAL(10,4)) [PcntD],
CAST(B.D AS DECIMAL(10,4)) [PrevND]
FROM #DataTable AS A
LEFT
JOIN #DataTable B ON A.Seq - @intervals = B.Seq;
-- Get initial values
SELECT @init_B = SUM(B),
@init_A = SUM(A),
@init_C = SUM(C),
@init_D = SUM(D)
FROM #tmp1
WHERE Seq <= @intervals;
UPDATE T1
SET @moving_A = CASE
WHEN Seq < @intervals then NULL
WHEN Seq = @intervals then @init_A
WHEN Seq > @intervals then @moving_A + [A] - [PrevNA]
END,
PcntA = (@moving_A * 100.0)/Cast(@intervals as DECIMAL(10,4)),
@moving_B = CASE
WHEN Seq < @intervals then NULL
WHEN Seq = @intervals then @init_B
WHEN Seq > @intervals then @moving_B + - [PrevNB]
END,
PcntB = (@moving_B * 100.0)/Cast(@intervals as DECIMAL(10,4)),
@moving_C = CASE
WHEN Seq < @intervals then NULL
WHEN Seq = @intervals then @init_C
WHEN Seq > @intervals then @moving_C + [C] - [PrevNC]
END,
PcntC = (@moving_C * 100.0)/Cast(@intervals as DECIMAL(10,4)),
@moving_D = CASE
WHEN Seq < @intervals then NULL
WHEN Seq = @intervals then @init_D
WHEN Seq > @intervals then @moving_D + [D] - [PrevND]
END,
PcntD = (@moving_D * 100.0)/Cast(@intervals as DECIMAL(10,4)),
@anchor = Seq
FROM #Tmp1 AS T1 WITH (TABLOCKX)
OPTION (MAXDOP 1);
SELECT Seq - @intervals +1 AS StartSeq,
PcntA,
PcntB,
PcntC,
PcntD
FROM #Tmp1
WHERE Seq >= @intervals;
DROP TABLE #Tmp1;
DROP TABLE #DataTable;
SET STATISTICS IO OFF;
April 21, 2010 at 2:15 am
This is excellent Paul, Thank-you.
The clustered primary key improved performance from a cost of 1.8 to 0.08,
I must remember the discussions on SSC are a vital part of the articles
April 21, 2010 at 2:45 am
Tom Brown (4/21/2010)
This is excellent Paul, Thank-you.
No problem. You took the time to check out the article I referenced, so it is only fair that I should return the favour.
The clustered primary key improved performance from a cost of 1.8 to 0.08
Ah. I see I was too brief in my previous reply. The performance improvement does not stem from the clustered index (alone) - the clustered index is required for correct operation of the update method used in query 2.
The query cost was dominated by the table spool, caused by the CTE self join. My primary improvement was to materialize the intermediate result in the #DataTable temporary table.
Second point: query plans only ever include the cost estimate calculated by the optimiser when producing the plan - it never reflects the true cost of running the query. Please do not rely on it for performance comparisons. CPU time used, elapsed time, and reads and writes are all valid performance metrics (at least to some extent).
I must remember the discussions on SSC are a vital part of the articles
Often better than the articles! (Not in this case though, I hasten to add)
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply