April 7, 2017 at 10:33 pm
Luis Cazares - Friday, April 7, 2017 8:27 PMrobertopmorris - Friday, April 7, 2017 2:04 PMJeff:Thanks for the response.
There are about 16,000 to 20,000 rows in the data set in total currently.
I want the running total to start over on change of PartNum - there are around 10,000 Parts and this number is ever increasing - by around 2,000 per annum.
The running total needs to be run once or twice a day - rarely three times.
The current SQL code returns the results very quickly - less than 5 seconds - but only gives a running total per date because the data set contains multiple entries for the same Part on the same date - we need to view this data row by row and have the running total displayed likewise.
I hope this makes sense.Thank you in advance.
Roberto.Did you read the article that we mentioned before?
Heh... probably not. He might after my next post, though. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
April 7, 2017 at 11:48 pm
Ok. Here's a test data generator for folks to play with. I normally use a million rows just to get some clue as to how scalable whatever code is being tested will be. Details are in the code.
--=====================================================================================================================
-- Create and populate a test table.
-- Nothing in this section is a part of the solution. We're just creating test data here.
-- Written so that it works in all versions of SQL Server from 2005 and up.
-- Refer to the following URL for tips on "Minimal Logging".
-- https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
-- To find out more about creating random constrained data in huge quanties quickly, see the following URLs.
-- http://www.sqlservercentral.com/articles/Data+Generation/87901/
-- http://www.sqlservercentral.com/articles/Test+Data/88964/
--=====================================================================================================================
--===== If the test table exists, drop it to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#MyTable','U') IS NOT NULL
DROP TABLE #MyTable
;
--===== Create the test table.
-- Note that there is nothing to uniquely identify rows in this table and there can be dupes.
CREATE TABLE #MyTable
(
PartNum VARCHAR(50)
,DueDate DATETIME
,RequirementFlag TINYINT
,Quantity DECIMAL(22,8)
)
;
--===== I don't know if there are any indexes on the table but I included one that seems most likely.
CREATE CLUSTERED INDEX CI_PartNum_DueDate
ON #MyTable (PartNum, DueDate)
;
--===== Obviously-named local variables to control what the test data contains.
DECLARE @RowCount INT
,@PartCount INT
,@StartDate DATETIME
,@EndDate DATETIME
;
SELECT @RowCount = 1000000
,@PartCount = 10000
,@StartDate = '20160101'
,@EndDate = GETDATE()
;
--===== Populate the test table with random constrained data similar to the real table.
WITH cte AS
(
SELECT TOP (@RowCount)
PartNum = STUFF(RIGHT(ABS(CHECKSUM(NEWID())%@PartCount)+1000001,6),3,0,'-')
,DueDate = @StartDate + ABS(CHECKSUM(NEWID())%(DATEDIFF(dd,@StartDate,@EndDate)+1))
,RequirementFlag = SIGN(ABS(CHECKSUM(NEWID())%10))
,Quantity = ABS(CHECKSUM(NEWID())%2000)+1
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
)
INSERT INTO #MyTable WITH (TABLOCK) --Allow "Minimal Logging" if not in FULL Recovery Model
(PartNum, DueDate, RequirementFlag, Quantity)
SELECT PartNum, DueDate, RequirementFlag, Quantity
FROM cte
ORDER BY PartNum, DueDate --Allow "Minimal Logging" when filling empty Clustered Index without a Trace Flag
OPTION (RECOMPILE) --Helps "Minimal Logging" happen because value of variables unknown until run time.
;
Here's my shot at the problem using the technique that Luis and I have both provided links to. It only takes 3 seconds on a million rows and if you were to convert it to a permanent table instead of a Temp Table, you could run it several times a day to keep the table up to date. The code is a bit more complicated but being able to process a million rows into a table doing a running total makes it worth it in my book. Additionally, it only touches 2 million rows. The Triangular Join methods have to touch almost 52 million internal rows for this particular test.
--=====================================================================================================================
-- Use a "Quirky Update" modified to have a safety check to do the Running Total.
-- Please refer to the following article on how a "Quirky Update" works.
-- http://www.sqlservercentral.com/articles/T-SQL/68467/
-- To find out more about the safety check, please see the following post.
-- https://www.sqlservercentral.com/Forums/802558/Solving-the-quotRunning-Totalquot-quotOrdinal-Rankquot-Problems-Rewritten?PageIndex=5#bm981258
--=====================================================================================================================
--===== If the running total table exists, drop it to make reruns in SSMS easier.
-- This could be done on a "real" table and updated several times each day.
IF OBJECT_ID('tempdb..#RunningTotal','U') IS NOT NULL
DROP TABLE #RunningTotal
;
--===== Create the test table with both a unique "counter" column and a running total column.
CREATE TABLE #RunningTotal
(
RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,PartNum VARCHAR(50)
,DueDate DATETIME
,RequirementFlag TINYINT
,Quantity DECIMAL(22,8)
,RunningTotal DECIMAL(22,8) DEFAULT (0.0)
)
;
--===== Populate the table with ALL of the data from the source table.
-- This is very fast (about 1.5 seconds per million rows on my box).
INSERT INTO #RunningTotal WITH (TABLOCK) --Allow "Minimal Logging" if not in FULL Recovery Model
(PartNum, DueDate, RequirementFlag, Quantity)
SELECT PartNum, DueDate, RequirementFlag, Quantity
FROM #MyTable
ORDER BY PartNum, DueDate --Allow "Minimal Logging" when filling empty Clustered Index without a Trace Flag
OPTION (RECOMPILE) --Not needed here but I've "developed the habit" to support minimal logging
;
--===== Declare the variables we'll need to support the "Quirky Update" method of calculating Running Totals.
DECLARE @PrevPartNum VARCHAR(50)
,@RunningTotal DECIMAL(22,8)
,@SafetyCounter INT
;
--===== Presets
SELECT @PrevPartNum = ''
,@RunningTotal = 0.0
,@SafetyCounter = 1 --You can test the safety feature by changing this to any number other than 1.
;
--===== Calculate the running total with a safety check.
UPDATE rt
SET @RunningTotal = RunningTotal = CASE
WHEN RowNum = @SafetyCounter
THEN
CASE
WHEN PartNum = @PrevPartNum
THEN @RunningTotal + Quantity
ELSE Quantity
END
ELSE 1/0 --Forces an error if the counter doesn't match the row being worked on.
END
,@PrevPartNum = PartNum
,@SafetyCounter = @SafetyCounter + 1
FROM #RunningTotal rt WITH (TABLOCKX, INDEX(1))
OPTION (MAXDOP 1)
;
--Jeff Moden
Change is inevitable... Change for the better is not.
April 9, 2017 at 11:57 pm
Jeff Moden - Friday, April 7, 2017 11:48 PMHere's my shot at the problem using the technique that Luis and I have both provided links to. It only takes 3 seconds on a million rows and if you were to convert it to a permanent table instead of a Temp Table, you could run it several times a day to keep the table up to date. The code is a bit more complicated but being able to process a million rows into a table doing a running total makes it worth it in my book. Additionally, it only touches 2 million rows. The Triangular Join methods have to touch almost 52 million internal rows for this particular test.
Just for clarity ...
Your recommendation for running totals is
SELECT ...
INTO #Temp
FROM ...
WHERE ...
Exec QU on #Temp
SELECT ...
FROM #Temp
April 10, 2017 at 7:43 am
DesNorton - Sunday, April 9, 2017 11:57 PMJeff Moden - Friday, April 7, 2017 11:48 PMHere's my shot at the problem using the technique that Luis and I have both provided links to. It only takes 3 seconds on a million rows and if you were to convert it to a permanent table instead of a Temp Table, you could run it several times a day to keep the table up to date. The code is a bit more complicated but being able to process a million rows into a table doing a running total makes it worth it in my book. Additionally, it only touches 2 million rows. The Triangular Join methods have to touch almost 52 million internal rows for this particular test.
Just for clarity ...
Your recommendation for running totals is
SELECT ...
INTO #Temp
FROM ...
WHERE ...Exec QU on #Temp
SELECT ...
FROM #Temp
Correct. Most people don't want a "Running Total" column on their main table (and I agree with that because it can go out of date very quickly especially if in-row updates are allowed rather than "corrections" in the form of additional rows) and the necessary clustered index to correctly support the Quirky Update is frequently contrary to what is needed for other things on the main table. It also guarantees exclusive use of the data at the given moment and guarantees a point in time snapshot of the data at the time of the running total.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 16 through 18 (of 18 total)
You must be logged in to reply to this topic. Login to reply