May 12, 2008 at 10:06 am
Hi,
I have a problem which appears simple on the surface and I am hoping there is a common design solution which I can apply.
I won't initially go into great detail to avoid any confusion. For the sake of debate I have two tables, here are simplified versions:
Transaction{ Id, Type, UserId, State, Description, Amount, EntryDate }
(Again for the sake of discussion Transaction Types are Deposit and Withdrawal)
Ledger{ Id, UserId, TransactionId, Amount, Balance, EntryDate }
When I perform an Insert into the Transaction table I update the Ledger. For the most part this is a simple case of inserting the required data, the issue comes with the Balance column.
I want the balance to be correct as it should be post-transaction. I can think of two approaches to this:
a) Calculate the balance using a SUM on the Amount column of the Transaction table to determine the new aggregated value. My issue with this is performance and locking to avoid concurrency issues.
b) Somehow perform this incrementally using some form of TOP ORDER BY combination. I prefer an incremental approach from a performance perspective but with respect to data integrity it makes me shudder an incorrect calculation for any reason would cause a ripple effect. Having said this it's obviously not acceptable for any solution to yield incorrect results π
I have implemented A and have found that I need to serialise the operation via transaction isolation levels / hints. This works however when more than a handful of concurrent transactions are running I cannot even read from the Transactions table without readuncommitted/nolock which is not something I want to start adding to the rest of the application unless I have to.
I have played around with the various isolation levels although I will admit I am not entirely confident which I should be using just yet. Also I am currently calculating the balance in two phases:
SET @DepositBalance =
(SELECT SUM(Amount) FROM [Banking].[Transaction] --with (rowlock)
WHERE UserId = @userid AND Type = 2)
SET @WithdrawalBalance =
(SELECT SUM(Amount) FROM [Banking].[Transaction] --with (rowlock)
WHERE UserId = @userid AND Type = 3)
I could compliment the stored transaction value with its negative counterpart if it's a withdrawal so this calculation could become a single SUM rather than the two. I believe I could experience many problems with the above if an insert operation occurs between my deposit and withdrawal calculations but then again the same could be said between the balance calculations and the Ledger insert operation.
Any thoughts and ideas would be appreciated with regards to a safe solution to my problem, I hope the above makes sense it's been a long day π
Regards,
Mark
May 12, 2008 at 10:37 am
When I perform an Insert into the Transaction table I update the Ledger. For the most part this is a simple case of inserting the required data, the issue comes with the Balance column
is the Ledger table used to track the point in time balance, for historical purposes? since you are updating it, i don't think it is, so it might be a step you can completely eliminate.
It kind of looks like you are updating just so you have a balance (grouped by userid, etc.
I'm thinking you can you simply replace the Ledger Table with a view, and the view is the SELECT SUM() ...GROUP BY of whatever is appropriate, instead of updating a static table?
for example,:
SELECT UserId, Type, SUM(Amount) AS Balance FROM Transaction GROUP BY UserId,Type As TransactionType
Lowell
May 12, 2008 at 10:43 am
Now about to leave work, yeah thanks for the reply I will think about this and post again tomorrow. With the table design I have posted yes the only useful additional column would be the balance but there are a few other columns that may present themselves in the future, most notably TransactionStateId for that ledger entry.
Mark
May 12, 2008 at 10:37 pm
Please read the following for the solution to the Running Total (Balance) problem including Grouped Running Totals...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 5:25 am
Hi all,
Firstly, thanks for the responses
I have read the suggested article and although it is very informative it is geared towards calculating the total after the data is populated. I want to perform this iteratively as a transaction is stored. I have a Trigger which fires when a Transaction is stored (I am using a Trigger due to my use of C#3/Linq for my DAL). The purpose of this trigger is to update the Ledger table, here is my most recent attempt:
ALTER TRIGGER [Banking].[trUpdateLedgerAfterTransactionInsert]
ON [Banking].[Transaction]
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
-- WARNING this isn't designed to work with bulk inserts....
DECLARE @Type AS SMALLINT
SET @Type = (SELECT [Type] FROM Inserted)
DECLARE @TransactionId AS UNIQUEIDENTIFIER
SET @TransactionId = (SELECT Id FROM Inserted)
DECLARE @Amount AS MONEY
SET @Amount = (SELECT Amount FROM Inserted)
DECLARE @EntryDate DATETIME
SET @EntryDate = (SELECT EntryDate FROM Inserted)
DECLARE @userid UNIQUEIDENTIFIER
SET @userid = (SELECT UserId FROM Inserted)
DECLARE @LedgerType SMALLINT
SET @LedgerType = 0 -- For now always 0 (Company ledger will be 1)
DECLARE @AddMoney AS MONEY
IF(@Type = 2) -- DEPOSIT
SET @AddMoney = @Amount
ELSE IF(@Type = 3) -- WITHDRAWAL
SET @AddMoney = 0.0 - @Amount
ELSE
RAISERROR(60000,16,1)
DECLARE @Balance MONEY
SET @Balance =
(SELECT TOP 1 Balance
FROM Banking.Ledger WITH (INDEX(IDX_Ledger_UserId_EntryDate_Id), TABLOCKX)
WHERE UserId = @userid)
IF @Balance IS NULL
SET @Balance = @AddMoney
ELSE
SET @Balance = @Balance + @AddMoney
INSERT INTO Banking.Ledger WITH(TABLOCKX)
(Type, UserId, TransactionId, Amount, Balance, EntryDate)
VALUES(
@LedgerType,
@TransactionId,
@Amount,
@EntryDate)
END
Previous efforts have included playing with different isolation levels and hints and performing a SUM rather than an iterative approach. Whatever I do I can cause the balance to be incorrect due to concurrency.
This works pretty well for a while, I have a quad core machine and ran 10 threads all inserting new transactions. At some point the balance becomes incorrect and as it's iterative all balances are incorrect from that point onwards.
I assume that the problem here is that I calculate the balance and although I specify a table lock this is released and then re-obtained before the insert operation occurs. If another process writes a Transaction between these two statements then the invariant will be broken. Is this correct? Or will SQL Server maintain across both statements / to the end of the transaction?
Regards,
Mark
May 13, 2008 at 5:59 am
As an update also setting SET TRANSACTION ISOLATION LEVEL SERIALIZABLE appears to have potentially solved the problem I have now successfully run 50,000 concurrent transactions I will keep it executing.
UPDATE: Ok it made it to around 60k transactions and then the balance was incorrect.
May 13, 2008 at 8:05 am
quortex (5/13/2008)
have read the suggested article and although it is very informative it is geared towards calculating the total after the data is populated.
That's EXACTLY what you are doing with the trigger... calculating the balance after the data has been populated.
I want to perform this iteratively as a transaction is stored.
That's RBAR no matter how you shake it... and almost all RBAR is terribly slow compared to set based solutions π
-- WARNING this isn't designed to work with bulk inserts....
That, in itself is a terrible mistake... most developers never check for never mind read what's in a trigger. You cannot preven bulk inserts from happening... you need to write code that will handle the eventuality.
The code from the article CAN be incorporated into a trigger and it can be controlled to only affect the customers that have suffered an update. The code from the article will do a grouped running balance on 50,000 accounts of 200 transactions (1 million rows) in seven seconds flat... all you have to do is isolate the accounts affected by a transaction and use the same code. No cursor will come close to the performance.
7 Seconds, 1 million rows or transactions... how long does your code take to process the 60K transactions only to be wrong? π
As an update also setting SET TRANSACTION ISOLATION LEVEL SERIALIZABLE appears to have potentially solved the problem I have now successfully run 50,000 concurrent transactions I will keep it executing.
UPDATE: Ok it made it to around 60k transactions and then the balance was incorrect.
The problem is likely parallelism... you need to set MAXDOP 1 for either solution. I recommend you drop the RBAR solution and go for the gold. :hehe: If you do it correctly, it will also process a single account should a single row ever be inserted. If an account has 50,000 rows, it will only take about 35 milliseconds to process.
The only thing that might beat the method in the article would be to maintain a separate table with a list of accounts and the current balance (1 row each) of each. Even then, you should use set based processing in the trigger instead of RBAR so that you can handle the eventuality of someone doing more than a 1 row insert and you can do it with some good performance.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 8:14 am
The only thing that might beat the method in the article would be to maintain a separate table with a list of accounts and the current balance (1 row each) of each. Even then, you should use set based processing in the trigger instead of RBAR so that you can handle the eventuality of someone doing more than a 1 row insert and you can do it with some good performance.
... and, before you do that, you must plan carefully for how to keep that balance in sync if the server suffers a catastophic failure during transaction(s)...
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 8:29 am
You might consider modifying the transaction table to include the balance at the end of the transaction. Then you will only have inserts to worry about. Add a column, Previous_Transaction_ID , with FK reference to the prior transaction and a unique constraint to prevent transactions from using the same transaction as the prior row to make sure the balances are correctly calculated.
Transaction { Id, Type, UserId, State, Description, Amount, EntryDate, Balance_Amount, Previous_Transaction_ID }
May 13, 2008 at 9:27 am
Hi everyone,
Thanks again. I have gone back to the RBAR article, the reason I initially shyed away from this is the UPDATE recalculates every row which seemd a little OTT considering I am only inserting one entry.
The warning comment in my code is simply because it is test code, it is more of a TODO note to myself π Don't fret about that...
Firstly with my previously pasted in code which looked at the TOP entry and added the new amount I would like to try this with OPTION(MAXDOP 1). However I can't work out how to do this in a SET operation e.g:
SET @Balance =
(SELECT TOP 1 Balance
FROM Banking.Ledger WITH (INDEX(IDX_Ledger_UserId_EntryDate_Id), TABLOCKX)
WHERE UserId = @userid)
Or perhaps this doesn't make sense here? I do not know enough about Sql Server to understand how the query plan parrellelises.
I have also had a go at updating all balances for a user on an insert, here is my Trigger code:
(Note: I have added a RelativeAmount column so if it is a withdrawal it would be negative, a deposit would be positive)
ALTER TRIGGER [Banking].[trUpdateLedgerAfterTransactionInsert]
ON [Banking].[Transaction]
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
-- WARNING this isn't designed to work with bulk inserts....
DECLARE @Type AS SMALLINT
SET @Type = (SELECT [Type] FROM Inserted)
DECLARE @TransactionId AS UNIQUEIDENTIFIER
SET @TransactionId = (SELECT Id FROM Inserted)
DECLARE @Amount AS MONEY
SET @Amount = (SELECT Amount FROM Inserted)
DECLARE @EntryDate DATETIME
SET @EntryDate = (SELECT EntryDate FROM Inserted)
DECLARE @userid UNIQUEIDENTIFIER
SET @userid = (SELECT UserId FROM Inserted)
DECLARE @LedgerType SMALLINT
SET @LedgerType = 0
DECLARE @RelativeAmount AS MONEY
IF(@Type = 2) -- DEPOSIT
SET @RelativeAmount = @Amount
ELSE IF(@Type = 3) -- WITHDRAWAL
SET @RelativeAmount = 0.0 - @Amount
ELSE
RAISERROR('Transaction Type not supported',16,1)
DECLARE @Balance MONEY
SET @Balance = 0.0
INSERT INTO Banking.Ledger
(Type, UserId, TransactionId, Amount, RelativeAmount, Balance, EntryDate)
VALUES(
@LedgerType, -- Ledger Type 0=User, 1=Company this is temporary
@TransactionId,
@Amount,
@RelativeAmount,
@EntryDate)
----------------------------------------------
-- Recalculate *ALL* balances for this user
----------------------------------------------
DECLARE @PrevBalance MONEY
SET @PrevBalance = 0
DECLARE @PrevUserId UNIQUEIDENTIFIER
UPDATE Banking.Ledger
SET @PrevBalance = Balance = @PrevBalance + RelativeAmount,
@PrevUserId = UserId
FROM Banking.Ledger
WITH (INDEX(IDX_Ledger_UserId_EntryDate_Id), TABLOCKX)
WHERE UserId = @userid
OPTION(MAXDOP 1)
END
This works perfectly fine in a non-concurrent test. However if I start creating lots of Transactions in parallel most of my requests deadlock. I assume that this is due to the table lock in the Update operation. From my, albeit somewhat limited ;), knowledge I can't see how this can be resolved with this technique.
On the other hand with my previous attempt at iteratively adding the new amount to the previous amount (however much that makes me shudder for some reason) the lock duration was much shorter, and I didn't encounter any deadlock problems.
Thanks again for your help with this,
Mark
May 13, 2008 at 10:13 am
Another update...
I have got around the concurrency problem by using an application lock via sp_getapplock and sp_releaseapplock. I never even knew this could be accomplished in SS and I have to say I don't feel very comfortable with it.
That said I will run it overnight but I can't see the results being incorrect π I obviously want to avoid this if possible.
Regards,
Mark
May 13, 2008 at 3:06 pm
quortex (5/13/2008)
Firstly with my previously pasted in code which looked at the TOP entry and added the new amount I would like to try this with OPTION(MAXDOP 1). However I can't work out how to do this in a SET operation e.g:SET @Balance =
(SELECT TOP 1 Balance
FROM Banking.Ledger WITH (INDEX(IDX_Ledger_UserId_EntryDate_Id), TABLOCKX)
WHERE UserId = @userid)
Or perhaps this doesn't make sense here? I do not know enough about Sql Server to understand how the query plan parrellelises.
You didn't use the MAXDOP option there.
Why are you hell bent on processing a single row at a time in the trigger? Are you expecting a GUI to make the entries one at a time? And why does the update to the Ledger have to be done at the same virtual instant? Are you showing the customer their new balance right away?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2008 at 2:44 am
Hi,
Using application locks (sp_getapplock, sp_releaseapplock) along with the SET based update SPROC shown last I have had a concurrent test run all night without fail, which is a first. It would nice to be able to do this without an application lock as they have their obvious disadvantages. To recap without the application lock the code deadlocks regularly due to the table lock on the Ledger table. Even if it takes circa 35ms to do an update 35ms is a long time when there is a lot of contention.
You didn't use the MAXDOP option there.
No sorry I think you missed my question, I can't get MAXDOP to work in a SET @Var = scenario. For example:
SELECT TOP 1 Balance
FROM Banking.Ledger WITH (INDEX(IDX_Ledger_UserId_EntryDate_Id), TABLOCKX)
WHERE UserId = @userid OPTION(MAXDOP 1)
Works fine but If I try and do anything along the lines of:
SET @Balance = (SELECT TOP 1 Balance
FROM Banking.Ledger WITH (INDEX(IDX_Ledger_UserId_EntryDate_Id), TABLOCKX)
WHERE UserId = @userid OPTION(MAXDOP 1))
I get the error: Incorrect syntax near the keyword 'OPTION'.
Why are you hell bent on processing a single row at a time in the trigger?
I am not at all I prefer the SET based approach, at the moment I am just trying to get this to work correctly in a multi-threaded environment. However I would like to also find out why my original single row approach doesn't work, this is also a learning exercise.
Are you expecting a GUI to make the entries one at a time?
We have an n-tier service oriented system, transactions will always handled one at a time via service call, but obviously many many calls could be happening concurrently. But I don't really like to code with only a single row insert in mind, obviously in an ideal world should work with multiple inserts. That said if I can gain performance or have a more stable implementation by specifying that only one row be handled at a time I would accept that and would throw an exception in the trigger if this condition is not met. But I don't think that's likely to be the case.
And why does the update to the Ledger have to be done at the same virtual instant? Are you showing the customer their new balance right away?
Yes if the update to the ledger wasn't committed at the same time as the transaction update then things become much much easier. After a transaction has occurred it will be common for the customer to see their new balance straight away but of more concern is the system using the balance calculation, it really needs to be in synch with the transactions. The implementation is much cleaner if they are updated at the same time.
Although I have this working with application locks and the SET based update I really would like to remove the need for the application locks. My problem then is the update duration for all of the balances for a user, even though it is quick, is still fairly intensive in a concurrent environment and causes many deadlocks. Obviously you want to avoid deadlocks as much as possible in any system, most notably banking. For example I wouldn't want to take a card payment and then discover the transaction won't commit due to deadlocks. Yes I can change the order, I could retry, I could use a queue but for the sake of debate let's just focus on the trigger. My current code is as follows:
ALTER TRIGGER [Banking].[trUpdateLedgerAfterTransactionInsert]
ON [Banking].[Transaction]
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
-- WARNING this isn't designed to work with bulk inserts....
DECLARE @Type AS SMALLINT
SET @Type = (SELECT [Type] FROM Inserted)
DECLARE @TransactionId AS UNIQUEIDENTIFIER
SET @TransactionId = (SELECT Id FROM Inserted)
DECLARE @Amount AS MONEY
SET @Amount = (SELECT Amount FROM Inserted)
DECLARE @EntryDate DATETIME
SET @EntryDate = (SELECT EntryDate FROM Inserted)
DECLARE @userid UNIQUEIDENTIFIER
SET @userid = (SELECT UserId FROM Inserted)
DECLARE @LedgerType SMALLINT
SET @LedgerType = 0
DECLARE @RelativeAmount AS MONEY
IF(@Type = 2) -- DEPOSIT
SET @RelativeAmount = @Amount
ELSE IF(@Type = 3) -- WITHDRAWAL
SET @RelativeAmount = 0.0 - @Amount
ELSE
RAISERROR('Transaction Type not supported',16,1)
DECLARE @res INT
EXEC @res = sp_getapplock
@Resource = 'trUpdateLedgerAfterTransactionInsertLock',
@LockMode = 'Exclusive',
@LockOwner = 'Transaction',
@LockTimeout = 60000,
@DbPrincipal = 'public'
IF @res NOT IN (0, 1)
RAISERROR('Unable to acquire Lock', 16, 1 )
INSERT INTO Banking.Ledger
(Type, UserId, TransactionId, Amount, RelativeAmount, Balance, EntryDate)
VALUES(
@LedgerType, -- Ledger Type 0=User, 1=Company this is temporary
@TransactionId,
@Amount,
@RelativeAmount,
@EntryDate)
----------------------------------------------
-- Recalculate *ALL* balances for this user
----------------------------------------------
DECLARE @PrevBalance MONEY
SET @PrevBalance = 0
DECLARE @PrevUserId UNIQUEIDENTIFIER
UPDATE Banking.Ledger
SET @PrevBalance = Balance = @PrevBalance + RelativeAmount,
@PrevUserId = UserId
FROM Banking.Ledger
WITH (INDEX(IDX_Ledger_UserId_EntryDate_Id), TABLOCKX)
WHERE UserId = @userid
OPTION(MAXDOP 1)
EXEC @res = sp_releaseapplock
@Resource = 'trUpdateLedgerAfterTransactionInsertLock',
@DbPrincipal = 'public',
@LockOwner = 'Transaction'
END
I know that the above code should also perform multiple inserts, I will do this, but let's not worry about that for now.
Regards,
Mark
May 21, 2008 at 4:58 pm
Hi Mark,
if I read your posts correctly, you are storing both balances AND activity in your ledger table, right?
It might make sense to split these up (maintain the balances separate from activity).
This should speed up the maintenance.
Also make sure you have indexes on eventual foreign key referenced columns (e.g. TransactionId)
As a sidenote, you could combine your multiple variable initialisations into a single SELECT initialization
(SELECT @a = fielda, @b-2=fieldb ... FROM inserted)
Best Regards,
Chris BΓΌttner
May 24, 2008 at 1:00 am
Initially you wrote that ledger contains balance, actually sum of transactions.
Your latest trigger inserts a record into ledger for each transaction, so you have several balance rows to update and the count would be growing. Transaction table has all necessary data for historical analysis, so why copy data to ledger table?
If you change the logic a bit and store negative value into amount column, if transaction is withdrawal, the balance is simply sum(amount).
Actually, ledger is calculable at any time so it's unnecessary.
select sum(amount) from xx where Key=something
If xx has clustered index on Key, this query takes less than 100ms on a billion record table if subset is ~50k records.
If you need this to check if the user can withdraw the requested amount, it's simple as:
1. insert trigger on transaction
select userid,sum(amount) as delta into #temp from inserted I group by userid;
update L set balance=balance + delta from ledger L join #temp T on T.userid=L.userid
insert into ledger(userid,balance) select I.userid,I.delta from #temp I left outer join ledger L on I.userid=L.userid where L.userid is null
2. insert,update trigger on ledger raises exception, if balance goes under:
if exists (select 1 from inserted where balance<0.0) begin
rollback transaction
raiserror ....
end
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply