November 20, 2019 at 2:57 am
Jonathan AC Roberts wrote:Jeff Moden wrote:I do have a way to avoid that problem that will blow the doors off of most other methods for calculating running totals but it does take a bit of setup and purists will insist that it will blow up someday (even when there's a built in test-for-success added to the code). Let me know if you' like to use that.
Jeff, It's not a cursor is it? 😛
PS: that was a joke.
Heh... ironically, it is... sort of... it's a "Pseudo Cursor". I think you're probably already familiar with the "Quirky Update", yes?
I have some recollection of seeing it once, but i think it went over my head .
Maybe we could generate some test data in a table and compare it with any methods anyone can think of?
November 20, 2019 at 6:03 am
No problem... Here's the code (below) to build a simple table with 1,000 accounts of 10,000 rows each of constrained random data for a total of 10 million rows. I also took the time to ensure that it's massively fragmented with an average segment size of just one page on the expected clustered index for a bit of extra fun.
The goal would be to populate the two obviously named columns that currently contain NULLs. There are no restrictions on what can be done/used so long as it's native to SQL Server, is done using only T-SQL, and does NOT make use of stuff like SQLCLR. If you need to change the indexes, that won't be counted against your time. Of course, the code must create the correct answers. I'll be happy to act as the common point for testing so that we're not fighting differences between machines. I'm running SQL Server 2017 Developer Edition with 24GB of RAM allocated to SQL Server.
My machine is a Dell Alienware 17R with 32GB of RAM, 2TB of NVME SSDs (yeah, I over-provisioned it a wee bit), and has an Intel I7-8750H CPU with 6 core threaded to 12. It's one of "those" systems that runs between 2.2 to 4GHz (wish they'd actually make servers like that). I'll do the tests at 4GHz. I also have a SEAGATE ST1000LM049 BARRACUDA PRO 1TB 7200RPM SATA-6GBPS 128MB BUFFER bit of spinning rust if we need to test with that.
/*************************************************************************************
Create the test table with a non-clustered Primary Key and a separate clustered index
*************************************************************************************/
--===== If the test table already exists, drop it in case we need to rerun.
-- The 3 part naming is overkill, but prevents accidents on real tables.
IF OBJECT_ID('dbo.TransactionDetail') IS NOT NULL
DROP TABLE dbo.TransactionDetail
;
--===== Create the test table (TransactionDetail) with a NON clustered PK
CREATE TABLE dbo.TransactionDetail
(
TransactionDetailID INT IDENTITY(1,1), --Temporal "tie-breaker"
Date DATETIME,
AccountID INT,
Amount MONEY,
AccountRunningTotal MONEY, --Running total across each account
AccountRunningCount INT, --Like "Rank" across each account
CONSTRAINT PK_TransactionDetail_TransactionDetailID
PRIMARY KEY NONCLUSTERED (TransactionDetailID)
WITH FILLFACTOR = 100
)
;
--===== Add a clustered index that will easily cause page splits because
-- of the randomized data being inserted. This index also represents
-- the expected sort order of most of the code examples.
CREATE UNIQUE CLUSTERED INDEX IXC_Transaction_AccountID_Date_TransactionDetailID
ON dbo.TransactionDetail (AccountID, Date, TransactionDetailID)
;
-----------------------------------------------------------------------------------------------------------------------
/*************************************************************************************
Populate the table using a rather slow method but one that's sure to cause lots of
Page splits and that will fragment the table with over 99% fragmentation.
*************************************************************************************/
--===== Preset the environment for appearance
SET NOCOUNT ON
;
--===== Populate the table in "segments" to force page splits.
-- Normally this would NOT have a While loop in it.
-- Because the While loop is there and page splits are happening,
-- this takes about 65 seconds to create on my box.
WHILE (ISNULL(IDENT_CURRENT('TransactionDetail'),0)) < 10000000 --Yeah... 10 Million rows.
BEGIN
INSERT INTO dbo.TransactionDetail
(Date, AccountID, Amount)
SELECT TOP 10000 --Small batch size to drive fragmentation through the roof.
--5 years worth of dates with times from 1/1/2015 to 12/31/2019
RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2015','2020')+CONVERT(DATETIME,'2015') AS Date,
--1000 different account numbers
ABS(CHECKSUM(NEWID()))%1000+1,
--Dollar amounts from -9999.99 to + 9999.99
CAST(CHECKSUM(NEWID())%1000000 /100.0 AS MONEY)
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
END
;
--===== Display the data limits and counts summary.
SELECT AccountIDs = COUNT(DISTINCT AccountID)
,MinAccountID = MIN(AccountID)
,MaxAccountID = MAX(AccountID)
,MinAmount = MIN(Amount)
,MaxAmount = MAX(Amount)
,[RowCount] = COUNT(*)
,AvgRowsPerAcct = COUNT(*)/COUNT(DISTINCT AccountID)
FROM dbo.TransactionDetail
;
--===== Display the detailed fragmentation data to show that the CI is 99% fragmented.
SELECT *
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.TransactionDetail'),NULL,NULL,'DETAILED')
;
--===== Show what 10 of the rows look like.
SELECT TOP 10 *
FROM dbo.TransactionDetail
ORDER BY AccountID, Date, TransactionDetailID
;
--Jeff Moden
Change is inevitable... Change for the better is not.
November 20, 2019 at 4:07 pm
I didn't mess with any indexes on the table you provided.
Here are a few methods:
Update Windowed SUM/COUNT using ROWS BETWEEN
/*-- ****************************************************************************************
-- Update Windowed SUM/COUNT using ROWS BETWEEN
-- ******************************************************************************************/
;WITH CTE AS
(
SELECT SUM(Amount) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) NewAccountRunningTotal,
COUNT(*) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) NewAccountRunningCount,
AccountRunningTotal,
AccountRunningCount
FROM dbo.TransactionDetail td
)
UPDATE CTE
SET AccountRunningTotal=NewAccountRunningTotal,
AccountRunningCount=NewAccountRunningCount
GO
Update Windowed SUM/COUNT using RANGE BETWEEN
/*-- ****************************************************************************************
-- Update Windowed SUM/COUNT using RANGE BETWEEN
-- *****************************************************************************************/
;WITH CTE AS
(
SELECT SUM(Amount) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) NewAccountRunningTotal,
COUNT(*) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) NewAccountRunningCount,
AccountRunningTotal,
AccountRunningCount
FROM dbo.TransactionDetail td
)
UPDATE CTE
SET AccountRunningTotal=NewAccountRunningTotal,
AccountRunningCount=NewAccountRunningCount
GO
Update Windowed SUM/COUNT
/*-- ****************************************************************************************
-- Update Windowed SUM/COUNT
-- *****************************************************************************************/
;WITH CTE AS
(
SELECT SUM(Amount) OVER (PARTITION BY AccountId ORDER BY Date ASC) NewAccountRunningTotal,
COUNT(*) OVER (PARTITION BY AccountId ORDER BY Date ASC) NewAccountRunningCount,
AccountRunningTotal,
AccountRunningCount
FROM dbo.TransactionDetail td
)
UPDATE CTE
SET AccountRunningTotal=NewAccountRunningTotal,
AccountRunningCount=NewAccountRunningCount
GO
Update Cross Apply
/*-- ****************************************************************************************
-- Update Cross Apply
-- *****************************************************************************************/
UPDATE td
SET td.AccountRunningTotal=td2.AccountRunningTotal,
td.AccountRunningCount=td2.AccountRunningCount
FROM dbo.TransactionDetail td
CROSS APPLY(SELECT SUM(td2.Amount) AccountRunningTotal,
COUNT(*) AccountRunningCount
FROM dbo.TransactionDetail td2
WHERE td2.AccountID=td.AccountID
AND td2.Date<=td.Date) td2
GO
Updatable Cursor
/*-- ****************************************************************************************
-- Updatable cursor
-- *****************************************************************************************/
SET NOCOUNT ON
DECLARE @TransactionDetailID int,
@Date datetime,
@AccountID int,
@Amount money,
@AccountRunningTotal money,
@AccountRunningCount int
DECLARE @SaveAccountID int=-1
DECLARE @cursor cursor
SET @cursor = cursor FORWARD_ONLY
FOR SELECT td.[TransactionDetailID],
td.[Date],
td.[AccountID],
td.[Amount],
td.[AccountRunningTotal],
td.[AccountRunningCount]
FROM [dbo].[TransactionDetail] td
ORDER BY td.AccountId, td.Date, td.TransactionDetailID
FOR UPDATE OF td.[AccountRunningTotal], td.[AccountRunningCount]
OPEN @cursor
FETCH NEXT FROM @cursor INTO @TransactionDetailID, @Date, @AccountID, @Amount, @AccountRunningTotal, @AccountRunningCount
WHILE @@FETCH_STATUS = 0 BEGIN
IF @AccountID <> @SaveAccountID BEGIN
SELECT @AccountRunningTotal = @Amount,
@AccountRunningCount = 1
END ELSE BEGIN
SELECT @AccountRunningTotal += @Amount,
@AccountRunningCount += 1
END
UPDATE [dbo].[TransactionDetail]
SET AccountRunningTotal = @AccountRunningTotal,
AccountRunningCount = @AccountRunningCount
WHERE CURRENT OF @cursor
SELECT @SaveAccountID=@AccountID
FETCH NEXT FROM @cursor INTO @TransactionDetailID, @Date, @AccountID, @Amount, @AccountRunningTotal, @AccountRunningCount
END
CLOSE @cursor
DEALLOCATE @cursor
GO
Temporary Table method with ROWS BETWEEN and MERGE JOIN
/*-- ****************************************************************************************
-- Temporary Table method with ROWS BETWEEN and MERGE JOIN
-- *****************************************************************************************/
IF OBJECT_ID('tempdb..#TransactionDetail','U') IS NOT NULL DROP TABLE #TransactionDetail
SELECT AccountID,
Date,
TransactionDetailID,
SUM(Amount) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AccountRunningTotal,
COUNT(*) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AccountRunningCount
INTO #TransactionDetail
FROM [dbo].[TransactionDetail] td
CREATE UNIQUE CLUSTERED INDEX IX_#TransactionDetail_1 ON #TransactionDetail(AccountID, Date, TransactionDetailID)
UPDATE td
SET td.AccountRunningTotal=td2.AccountRunningTotal,
td.AccountRunningCount=td2.AccountRunningCount
FROM [dbo].[TransactionDetail] td
INNER JOIN #TransactionDetail td2
ON td2.AccountID = td.AccountID
AND td2.Date = td.Date
AND td2.TransactionDetailID = td.TransactionDetailID
DROP TABLE #TransactionDetail
GO
Viewing 3 posts - 16 through 17 (of 17 total)
You must be logged in to reply to this topic. Login to reply