February 8, 2008 at 4:04 pm
Radovan Jablonovsky (2/8/2008)
Hi,If it is possible in your production environment to dynamically change number of processors, then it is OK. In our environment it is more conservative, therefore I am looking for solution which is independent from server setup, type of storage, number of files in tablespace, etc ... RDBMS database/server is more about consistency.
Sincerely,
Radovan Jablonovsky
Well, kind of... OPTION (MAXDOP 1) forces parallelism to go away...
--Jeff Moden
Change is inevitable... Change for the better is not.
February 13, 2008 at 10:15 pm
Woooow....Noone mentioned guaranteed order via ROW_NUMBER()??
I see that you guys really focus'd hard on matching current and previous rows.
Nice thread though. I found Jeff's article extremely useful for what I was trying to do, but I read the thread and...okay, here's my thoughts.
If you're so concerned about ORDER, then use a CTE with Row_Number(), you're guaranteed order of your results. Upside, you don't have to construct an index.
Downside, many millions of rows might take a bit to "load" up. If the table is being used, then no worries, most of the rows should already be in the buffer and it would only be logical reads, bonus points if that's the case.
I ran this on my system after doing a DBCC DROPCLEANBUFFERS. The code ran for 21 seconds. Here's the stats.
I also ran this on my system after running a select on the table, no fancy code. The update statement ran in 1 second. So, big speed differences if the table is already somewhat in use. =)
Table 'JBMTest'. Scan count 1, logical reads 10363, physical reads 3, read-ahead reads 5541, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1000000 row(s) affected)
Here's the query:
DECLARE @PrevGrpBal MONEY
SET @PrevGrpBal = 0
DECLARE @PrevAcctID INT
SET @PrevAcctID = 0;
With Running_Total AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY AccountID ORDER BY AccountID, Date) as rn,
AccountID, Date, Amount, GrpBal
FROM dbo.JBMTest
)
UPDATE r
SET @PrevGrpBal = r.GrpBal = CASE
WHEN r.AccountID = @PrevAcctID
THEN @PrevGrpBal + r.Amount
ELSE r.Amount -- Restarts total at "0 + current amount"
END,
@PrevAcctID = AccountID
FROM Running_Total r
Is there another example we are trying to solve here that I missed? I use the heck out of the ranking functions in SQL.
February 13, 2008 at 10:16 pm
The previous solution also works with parallel IO on 64 bit boxes quite well, because you're updating the results of your CTE which get pushed back to the underlying table.
Things get tricky if you do table joins or sub-queries in the CTE.
From some of the code postings it looks like I might have missed a critical piece of information on the problem that's trying to be solved.
But the code above solves the running total for each account over a specified period of time.
February 13, 2008 at 10:51 pm
Well stated in both posts, Christopher... But, you've been spoiled by SS 2k5... none of that works in 2k and, I don't know about other folks, but the companies I've had the fortune/misfortune to work with since the year 2000, have all used the 2k version... finally found 1 that actually did use 2k5 and had to give them up because they were slaughtering the database, the data, and the server.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 13, 2008 at 11:24 pm
Earlier Jeff posted a question on using SUM(SAL) OVER()
Since OVER() doesn't support ORDER BY when used with an aggregation.
How about this one?
SELECT J1.AccountID,
J1.Date,
J1.Amount AS AmountThisDate,
SUM(J2.Amount) AS RunningBalance,
CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance
FROM dbo.JBMTest AS J1
JOIN dbo.JBMTest AS J2
ON J2.AccountID = J1.AccountID
AND (J2.Date >= '01/01/2000' --Anchor Date from example
AND J2.Date <= J1.Date)
GROUP BY J1.AccountID, J1.Date, J1.Amount
ORDER BY J1.AccountID, J1.Date;
Not quite as elegant as SUM(SAL) OVER()
But I'm willing to bet that the query plans generated for both would be similiar to this query.
Credit doesn't go to me for this method. Learned this, like many other things from other people. This method gets credited to Itzik, since I learned it from one of his many articles, but it's useful in so many circumstances.
And this one works on SQL 2000. =)
February 13, 2008 at 11:30 pm
Jeff Moden (2/13/2008)
Well stated in both posts, Christopher... But, you've been spoiled by SS 2k5... none of that works in 2k and, I don't know about other folks, but the companies I've had the fortune/misfortune to work with since the year 2000, have all used the 2k version... finally found 1 that actually did use 2k5 and had to give them up because they were slaughtering the database, the data, and the server.
I can't comment on that. π I'm sure someone will see this post and who I work for and fire me if I did. :w00t:
February 14, 2008 at 1:23 am
Christopher Ford (2/13/2008)
Earlier Jeff posted a question on using SUM(SAL) OVER()Since OVER() doesn't support ORDER BY when used with an aggregation.
How about this one?
SELECT J1.AccountID,
J1.Date,
J1.Amount AS AmountThisDate,
SUM(J2.Amount) AS RunningBalance,
CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance
FROM dbo.JBMTest AS J1
JOIN dbo.JBMTest AS J2
ON J2.AccountID = J1.AccountID
AND (J2.Date >= '01/01/2000' --Anchor Date from example
AND J2.Date <= J1.Date)
GROUP BY J1.AccountID, J1.Date, J1.Amount
ORDER BY J1.AccountID, J1.Date;
Not quite as elegant as SUM(SAL) OVER()
But I'm willing to bet that the query plans generated for both would be similiar to this query.
Credit doesn't go to me for this method. Learned this, like many other things from other people. This method gets credited to Itzik, since I learned it from one of his many articles, but it's useful in so many circumstances.
And this one works on SQL 2000. =)
Sadly that set based solution in my system over several million rows has a duration of about a half hour. The update variable nips through it in 10 seconds.
February 14, 2008 at 7:40 am
Sadly that set based solution in my system over several million rows has a duration of about a half hour. The update variable nips through it in 10 seconds.
Yeah, I can understand that, after I ran it, I ran it with Statistics IO on.
because of the way the JOIN works, it basically rescans the table quite a few times on the right side of the JOIN, J2 gets scanned like 10000 times.
BUT, it works on SQL 2000. π
February 14, 2008 at 8:06 am
Christopher Ford (2/13/2008)
Earlier Jeff posted a question on using SUM(SAL) OVER()Since OVER() doesn't support ORDER BY when used with an aggregation.
How about this one?
SELECT J1.AccountID,
J1.Date,
J1.Amount AS AmountThisDate,
SUM(J2.Amount) AS RunningBalance,
CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance
FROM dbo.JBMTest AS J1
JOIN dbo.JBMTest AS J2
ON J2.AccountID = J1.AccountID
AND (J2.Date >= '01/01/2000' --Anchor Date from example
AND J2.Date <= J1.Date)
GROUP BY J1.AccountID, J1.Date, J1.Amount
ORDER BY J1.AccountID, J1.Date;
Not quite as elegant as SUM(SAL) OVER()
This is precisely the Triangular Join mentioned in the first paragraph of the article. This is what Jeff is trying to get rid of.
--
JimFive
February 14, 2008 at 8:23 am
Christopher -
First off - nice CTE solution. However - the order by inside of the ROW_NUMBER() only guarantees the order in which the numbers are assigned, and not the physical order in which things are returned. So - unless you use the same parlor room tricks Jeff is advocating, you're likely to run into the same issues everyone else is talking about.
Without guaranteeing the order, you have no guarantees that the "previous row" is the row you should be pulling the running total from.
I'll refer you to this: http://blogs.msdn.com/queryoptteam/archive/2006/05/02/588731.aspx
So again - without the clustered index in there to help you, this solution STILL can fall apart....
If you want to see - drop the clustered index, or change it to a non-clustered index.
Of course - if you DO take into account those caveats previously mentioned (clustered index, WITH(INDex(), TabLock) OPTION (MAXDOP 1), no partitioned tables), it's very elegant.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 14, 2008 at 8:41 am
VERY nice find on that blog post! It sums up the ordered-output issue(s) quite explicitly.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 14, 2008 at 8:53 am
Conor had a lot of interesting stuff in the few blog posts he has made - like rules of thumb when queries will recompile, why Top 100 PERCENT is evil, etc... I'm disappointed that he seems to have given up on it (nothing new in a few years).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 14, 2008 at 11:07 am
You're talking about this Conor? π
http://www.sqlskills.com/blogs/conor/
Best Regards,
Chris BΓΌttner
February 14, 2008 at 11:35 am
Cool - something to add to my reading list....I'll have to send him a stern talking to about orphaning blog readers like that! (I'm so sure he'll care about that.....:P)
Thanks!
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 14, 2008 at 2:19 pm
Fine...
So...solutions which guaratee order in which rows are processed is it? Man, you guys are hard to please.
Here's 2 solutions I can think of. One for SQL 2005 and one for SQL 2000. I'm quite sure there's a much simpler solution that also guarantee's the row output. These weren't quite as speedy...in fact, the sql 2000 one really chewed the I/O a bit. Can't imagine why. =)
This is what happens when you think too hard as well. =) Or get nit picky on something like order... bah...who cares about order?? π It's too hard to write queries that require order and updates and in order and accumulate. π
I will say that I haven't had quite as much fun coding as I have trying to come up with alternative ways to do Running Totals since finding this article.
SET STATISTICS IO ON;
---- Guarantee Order Returned ----
WITH RunningTotal AS
(
SELECT AccountID, Date, Amount, GrpBal,
ROW_NUMBER() OVER(PARTITION BY AccountID
ORDER BY AccountID, Date) AS rn
FROM dbo.JBMTest
)
SELECT Cur.AccountID,
Cur.Date AS Date,
Cur.Amount AS Amount,
ISNULL(Cur.Amount + Prv.Amount,cur.Amount) AS grpBal
FROM RunningTotal AS Cur
LEFT OUTER JOIN RunningTotal AS Prv
ON Cur.AccountID = Prv.AccountID
AND Cur.rn - 1 = Prv.rn
ORDER BY Cur.AccountID, Cur.Date;
---- Really thinking too hard and order guarantee -----
SELECT a.AccountID,
a.RowNum,
a.Date AS Date,
a.Amount AS Amount,
ISNULL(a.Amount + b.Amount,a.Amount) AS grpBal
FROM (SELECT AccountID, RowNum, Date, Amount,
(SELECT TOP(1) RowNum
FROM dbo.JBMTest AS A2
WHERE A2.AccountID = A1.AccountID
AND (A2.Date < A1.Date
OR A2.Date = A1.Date
AND A2.RowNum < A1.RowNum)
ORDER BY Date DESC, RowNum DESC) AS PrvRowNum
FROM dbo.JBMTest AS A1) AS a
LEFT OUTER JOIN dbo.JBMTest AS b
ON a.PrvRowNum = b.RowNum
ORDER BY a.AccountID, a.date, a.rowNum;
Viewing 15 posts - 106 through 120 (of 250 total)
You must be logged in to reply to this topic. Login to reply