June 3, 2009 at 9:56 am
Hi,
I'm trying to allocate some payments to charges on a LIFO (last in first out) basis. I'm struggling to find a set based way to do this and was hoping someone may be able to help.
My tables look like this:
tblExampleToBeAllocated
AccountId ChargeDate ChargeValue Ordinal
----------- ----------------------- ----------- -----------
1 2006-08-02 00:00:00.000 10000 1
1 2006-12-02 00:00:00.000 20000 2
1 2007-01-02 00:00:00.000 1000 3
1 2007-03-02 00:00:00.000 5000 4
1 2007-06-02 00:00:00.000 12000 5
tblExampleToAllocate
AccountId TxDate TxValue
----------- ----------------------- -----------
1 2007-07-02 00:00:00.000 15000
1 2007-02-02 00:00:00.000 26000
Code to create these is below.
The payments in tblExampleToAllocate need to be allocated to charges in tblExampleToBeAllocated, based on the payment date. E.g.
Payment 1 is for 15000 and is dated July 2007. The first charge with a date less than or equal to the payment date is charge 5 (June 2007) and so 12000 is allocated, leaving 3000 remaining.
This 3000 is then allocated to the preceding charge, which is charge 4 (March 2007). As the charge is more than the unallocated remainder, I net these off, leaving a net charge of 2000.
I then do the same for the second charge, starting at the first charge with a date <= the payment date (in this case charge 3), following the same logic.
The results of the exercise should look like this:
[code]
AccountId ChargeDate ChargeValue Ordinal NetValueRemaining
----------- ----------------------- ----------- ----------- -----------------
1 2006-08-02 00:00:00.000 10000 1 5000
1 2006-12-02 00:00:00.000 20000 2 0
1 2007-01-02 00:00:00.000 1000 3 0
1 2007-03-02 00:00:00.000 5000 4 2000
1 2007-06-02 00:00:00.000 12000 5 0
[/code]
Easy enough to do with a loop, can anyone show me how to do this in a set? I've been banging my head against this for some time now :w00t:
Thanks, Iain
Create tables:
[code]
create table tblExampleToBeAllocated(
AccountId int
, ChargeDate datetime
, ChargeValue int
, Ordinal int)
go
insert tblExampleToBeAllocated
values (1, '2 Aug 2006', 10000, 1)
insert tblExampleToBeAllocated
values (1, '2 Dec 2006', 20000, 2)
insert tblExampleToBeAllocated
values (1, '2 Jan 2007', 1000, 3)
insert tblExampleToBeAllocated
values (1, '2 Mar 2007', 5000, 4)
insert tblExampleToBeAllocated
values (1, '2 Jun 2007', 12000, 5)
go
create table tblExampleToAllocate (
AccountId int
, TxDate datetime
, TxValue int)
go
insert tblExampleToAllocate
values (1, '2 Jul 2007', 15000)
insert tblExampleToAllocate
values (1, '2 Feb 2007', 26000)
go
select * from tblExampleToBeAllocated
select * from tblExampleToAllocate
[/code]
June 3, 2009 at 10:45 am
Joe Celko has written 2 article on implimenting FIFO and LIFO with SQL.
http://www.dbazine.com/ofinterest/oi-articles/celko32
http://www.dbazine.com/ofinterest/oi-articles/celko33
SQL = Scarcely Qualifies as a Language
July 28, 2009 at 10:02 am
Hey all,
Sorry to post in an old topic, but I figured out a solution that works for my situation and thought I'd share.
First, a bit of background:
I'm allocating 'payments' to 'transactions' on a LIFO (Last In First Out) basis.
The payments could allocate to multiple transactions.
The payments could have a date that is in the middle of a range already allocated to another transaction, e.g.
tran 1 12-Dec-01 £100
tran 2 20-Dec-01 £200
tran 3 31-Dec-01 £100
pay 1 10-Jan-02 £300
pay 2 12-Jan-02 £100
There is no relationship between payment amount and transaction amount.
There is no relationship between payment date and transaction date.
I've implemented a running sum based solution, which is given below. Hope this helps someone, as I've had a lot of 'fun' getting it to work 😛
Note that this is a stepwise implementation, which is somewhat different to the way my actual sproc works, but I figured this was much easier to understand.
Critiques welcomed and appreciated.
Regards, Iain
use SQL_Tips_Examples
-- drop test tables
--drop table dbo.test_tran
--drop table dbo.test_alloc
-- create test tables
-- transactions table - these are the values to allocate against
create table dbo.test_tran (
acc_id int
, tran_id int
, tran_date datetime
, tran_value int check (tran_value > 0)
)
go
-- allocation table - these are the values to be allocated
create table dbo.test_alloc (
acc_id int
, alloc_id int
, alloc_date datetime
, alloc_value int check (alloc_value > 0)
)
go
-- add some test data that covers all of the scenarios
insert dbo.test_tran values (1,1,'01-January-2007',100)
insert dbo.test_tran values (1,2,'01-February-2007',200)
insert dbo.test_tran values (1,3,'01-March-2007',400)
insert dbo.test_tran values (1,4,'01-April-2007',200)
insert dbo.test_tran values (1,5,'01-May-2007',100)
insert dbo.test_tran values (1,6,'01-June-2007',200)
insert dbo.test_tran values (1,7,'01-July-2007',100)
insert dbo.test_tran values (1,8,'01-August-2007',200)
insert dbo.test_tran values (1,9,'01-September-2007',400)
insert dbo.test_tran values (1,10,'01-October-2007',200)
insert dbo.test_tran values (1,11,'01-November-2007',100)
insert dbo.test_tran values (1,12,'01-December-2007',200)
insert dbo.test_tran values (1,13,'01-January-2008',100)
insert dbo.test_tran values (1,14,'01-February-2008',200)
insert dbo.test_tran values (1,15,'01-March-2008',400)
insert dbo.test_tran values (1,16,'01-April-2008',200)
insert dbo.test_tran values (2,1,'01-January-2007',100)
insert dbo.test_tran values (2,2,'01-February-2007',200)
insert dbo.test_tran values (2,3,'01-March-2007',400)
insert dbo.test_tran values (2,4,'01-April-2007',200)
insert dbo.test_tran values (2,5,'01-May-2007',100)
insert dbo.test_tran values (2,6,'01-June-2007',200)
insert dbo.test_tran values (2,7,'01-July-2007',100)
insert dbo.test_tran values (2,8,'01-August-2007',200)
insert dbo.test_tran values (2,9,'01-September-2007',400)
insert dbo.test_tran values (2,10,'01-October-2007',200)
insert dbo.test_tran values (2,11,'01-November-2007',100)
insert dbo.test_tran values (2,12,'01-December-2007',200)
insert dbo.test_tran values (2,13,'01-January-2008',100)
insert dbo.test_tran values (2,14,'01-February-2008',200)
insert dbo.test_tran values (2,15,'01-March-2008',400)
insert dbo.test_tran values (2,16,'01-April-2008',200)
insert dbo.test_alloc values (1,10,'01-October-2007',1000)
insert dbo.test_alloc values (1,14,'01-February-2008',600)
insert dbo.test_alloc values (1,17,'20-January-2008',50)
insert dbo.test_alloc values (1,15,'01-March-2008',100)
insert dbo.test_alloc values (1,5,'01-May-2007',500)
insert dbo.test_alloc values (1,7,'01-July-2007',200)
insert dbo.test_alloc values (2,4,'01-April-2007',100)
insert dbo.test_alloc values (2,7,'01-July-2007',300)
insert dbo.test_alloc values (2,10,'01-October-2007',200)
insert dbo.test_alloc values (2,12,'01-December-2007',1000)
insert dbo.test_alloc values (2,15,'01-March-2008',200)
insert dbo.test_alloc values (2,16,'01-April-2008',100)
go
-- take a look at the data
select * from dbo.test_tran
select * from dbo.test_alloc
-- add an ordinal column to the tran table
-- this will be used to offset join later in the script
alter table dbo.test_tran add tran_ordinal int
go
-- use row_number() to populate the ordinal sequentially based on date
-- uses a cte to get round not being able to update from row_number() directly
;with cte as (
select acc_id
, tran_id
, row_number() over (partition by acc_id order by tran_date) as tran_ordinal
from dbo.test_tran
)
update dbo.test_tran
set tran_ordinal = cte.tran_ordinal
from dbo.test_tran t
join cte
on t.acc_id = cte.acc_id
and t.tran_id = cte.tran_id
go
-- check the result
select * from dbo.test_tran
-- add a column to the tran table to hold the allocatable value
alter table dbo.test_tran add sum_alloc_value int
go
-- cluster on acc_id and date on alloc table to speed the correlated subquery
create clustered index ix_acc_id_ordinal on dbo.test_alloc(acc_id, alloc_date)
-- cluster on acc_id and ordinal on tran table to speed the offset join
create clustered index ix_acc_id_ordinal on dbo.test_tran(acc_id, tran_ordinal)
go
/* update sum_alloc_value by summing all alloc_values where the date is
equal to or earlier than the current row date but is after the previous row date
e.g.
dbo.test_tran
acc_id tran_id tran_date tran_value tran_ordinal sum_alloc_value
----------- ----------- ----------------------- ----------- ------------ ---------------
1 1 2007-01-01 00:00:00.000 100 1 NULL
1 2 2007-02-01 00:00:00.000 200 2 NULL
1 3 2007-03-01 00:00:00.000 400 3 NULL
1 4 2007-04-01 00:00:00.000 200 4 NULL
for transaction 2 i need to sum all of the alloc transactions:
that are on or before 2007-02-01
but are also after 2007-01-01 */
-- self join the table, offsetting by one using the tran_ordinal
-- this puts the current and previous tran_date onto the same row
-- making processing simpler
-- this select demonstrates what is going on
-- note use of left join to avoid dropping the unmatched value
select t1.* ,t2.*
from dbo.test_tran t1
left join dbo.test_tran t2
on t1.acc_id = t2.acc_id
and t1.tran_ordinal = t2.tran_ordinal + 1
-- this leaves us a problem with the unmatched row
-- as this now has a null t2.tran_date value
-- get round this by using coalesce and '01-January-1900'
-- can do this as the null value will always be the earliest in the sequence
-- and we won't have any alloc rows that are dated pre '01-January-1900'
select t1.* ,t2.*, coalesce(t2.tran_date,'01-January-1900') as previous_tran_date
from dbo.test_tran t1
left join dbo.test_tran t2
on t1.acc_id = t2.acc_id
and t1.tran_ordinal = t2.tran_ordinal + 1
-- now we use a correlated subquery to sum the alloc_value
-- not strictly neccessary, but put this into a cte
-- to make the sum criteria a bit clearer
-- note join on both acc_id and tran_id in case tran_id
-- is not unique across different accounts
-- show the results
;with cte as (
select t1.acc_id
, t1.tran_id
, t1.tran_date
, coalesce(t2.tran_date,'01-January-1900') as previous_tran_date
from dbo.test_tran t1
left join dbo.test_tran t2
on t1.acc_id = t2.acc_id
and t1.tran_ordinal = t2.tran_ordinal + 1
)
select cte.*
, (select sum(alloc_value) from dbo.test_alloc
where acc_id = cte.acc_id
and alloc_date previous_tran_date) as sum_alloc_value
from cte
-- this is the actual update
;with cte as (
select t1.acc_id
, t1.tran_id
, t1.tran_date
, coalesce(t2.tran_date,'01-January-1900') as previous_tran_date
from dbo.test_tran t1
left join dbo.test_tran t2
on t1.acc_id = t2.acc_id
and t1.tran_ordinal = t2.tran_ordinal + 1
)
update dbo.test_tran
set sum_alloc_value = (select sum(alloc_value) from dbo.test_alloc
where acc_id = cte.acc_id
and alloc_date cte.previous_tran_date)
from dbo.test_tran t
join cte
on t.acc_id = cte.acc_id
and t.tran_id = cte.tran_id
-- check the result
select * from dbo.test_tran
/* we'll calculate the allocation based on a running sum
of both the sum_alloc_value and the tran_value
the running sum of the tran value needs to start at each point
a new sum_alloc_value is found, but only if the later
sum_alloc_value is wholly allocated to transactions
e.g.
acc_id tran_id tran_date tran_value tran_ordinal sum_alloc_value
----------- ----------- ----------------------- ----------- ------------ ---------------
1 1 2007-01-01 00:00:00.000 100 1 NULL
1 2 2007-02-01 00:00:00.000 200 2 NULL
1 3 2007-03-01 00:00:00.000 400 3 NULL
1 4 2007-04-01 00:00:00.000 200 4 NULL
1 5 2007-05-01 00:00:00.000 100 5 500
1 6 2007-06-01 00:00:00.000 200 6 NULL
1 7 2007-07-01 00:00:00.000 100 7 200
1 8 2007-08-01 00:00:00.000 200 8 NULL
1 9 2007-09-01 00:00:00.000 400 9 NULL
1 10 2007-10-01 00:00:00.000 200 10 1000
1 11 2007-11-01 00:00:00.000 100 11 NULL
1 12 2007-12-01 00:00:00.000 200 12 NULL
1 13 2008-01-01 00:00:00.000 100 13 NULL
1 14 2008-02-01 00:00:00.000 200 14 650
1 15 2008-03-01 00:00:00.000 400 15 100
1 16 2008-04-01 00:00:00.000 200 16 NULL
sum_alloc_value 15 allocates wholly to tran_id 15
therefore, we restart the running sums from position 14
sum_alloc_value 14 does not wholly allocate to the transactions between 14
and the next sum_alloc_value at 10 (200+100+200+100=600)
so we don't restart the running sum from 10 */
-- start by adding an ordinal to the sum_alloc_value
-- we'll use this to calculate the gap between sum_alloc_values
alter table dbo.test_tran add alloc_ordinal int
go
-- update the alloc_ordinal
;with cte as (
select acc_id
, tran_id
, row_number() over (partition by acc_id order by tran_date) as alloc_ordinal
from dbo.test_tran
where sum_alloc_value is not null
)
update dbo.test_tran
set alloc_ordinal = cte.alloc_ordinal
from dbo.test_tran t
join cte
on t.acc_id = cte.acc_id
and t.tran_id = cte.tran_id
-- check the result
select * from dbo.test_tran
-- add a column to hold the tran_ordinal at the other end of the gap
alter table dbo.test_tran add gap_end_ordinal int
go
-- update the gap_end_ordinal by offset joining
-- coalesce to 1 (must be the lowest possible value) to avoid a null
update t1
set gap_end_ordinal = coalesce(t2.tran_ordinal + 1,1)
from dbo.test_tran t1
left join dbo.test_tran t2
on t1.acc_id = t2.acc_id
and t1.alloc_ordinal = t2.alloc_ordinal + 1
where t1.alloc_ordinal is not null
-- check the result
select * from dbo.test_tran
-- find the allocatable tran_value within the gap
alter table dbo.test_tran add allocateable_tran_value int
-- update using the two ordinal positions
update dbo.test_tran
set allocateable_tran_value = (select sum(tran_value) from dbo.test_tran
where acc_id = t.acc_id
and tran_ordinal between t.gap_end_ordinal
and t.tran_ordinal)
from dbo.test_tran t
where t.alloc_ordinal is not null
-- check the result
select * from dbo.test_tran
-- add an indicator to show if the previous sum_alloc_value wholly allocates
-- to the transactions within the gap
alter table dbo.test_tran add allocates_wholly bit
-- update allocates_wholly
update dbo.test_tran
set allocates_wholly = case when allocateable_tran_value > sum_alloc_value
then 1
else 0
end
from dbo.test_tran
where alloc_ordinal is not null
-- check the result
select * from dbo.test_tran
-- add the running sums based on the allocates_wholly flag
-- the running sum will restart at any position where
-- alloc_ordinal is not null and allocates_wholly = 1
-- for the next sum_alloc_value row that is later than the current row
-- add a column to tell us if we should restart the running sum
alter table dbo.test_tran add is_restart bit
-- update is_restart
-- again use a self join and offset to find the next value
update t1
set is_restart = 1
--select t1.*, t2.alloc_ordinal, t2.allocates_wholly
from dbo.test_tran t1
left join dbo.test_tran t2
on t1.acc_id = t2.acc_id
and t1.alloc_ordinal = t2.alloc_ordinal - 1
where t1.alloc_ordinal is not null
and coalesce(t2.allocates_wholly,1) = 1
-- check the result
select * from dbo.test_tran
-- add a column to hold the tran_ordinal that defines the end of the range to sum
alter table dbo.test_tran add running_sum_end_ordinal int
go
-- update running_sum_end_ordinal
-- note - need to treat the first row in a group differently
-- because this will always be the same as tran_ordinal
-- but the logic of the corr subquery would populate the next in the sequence
-- populate the last null caused by the left join with 10000
-- as don't expect more than 10000 transactions per acc_id
update dbo.test_tran
set running_sum_end_ordinal = coalesce(
case when is_restart = 1 then tran_ordinal else null end
,(select min(tran_ordinal) from dbo.test_tran
where acc_id = t.acc_id
and tran_ordinal > t.tran_ordinal
and is_restart = 1)
, 10000
)
from dbo.test_tran t
-- check the result
select * from dbo.test_tran
-- add columns to hold the running sums
alter table dbo.test_tran add tran_value_running_sum int
alter table dbo.test_tran add alloc_value_running_sum int
-- update running sums
update dbo.test_tran
set tran_value_running_sum = (select sum(tran_value) from dbo.test_tran
where acc_id = t.acc_id
and tran_ordinal between t.tran_ordinal
and t.running_sum_end_ordinal)
, alloc_value_running_sum = (select sum(sum_alloc_value) from dbo.test_tran
where acc_id = t.acc_id
and tran_ordinal between t.tran_ordinal
and t.running_sum_end_ordinal)
from dbo.test_tran t
-- check the result
select * from dbo.test_tran
-- where a sum_alloc_value is larger than the allocateable_tran_value it can allocate to
-- and this happens to push the allocateable value over the allocateable_tran_value total
-- **for the next gap group** as well, this causes a miscalc as in row 5 below
-- where we have 150 (1850-1700) of unallocated value still to allocate:
/*
acc_id tran_id tran_date tran_value tran_ordinal sum_alloc_value alloc_ordinal gap_end_ordinal allocateable_tran_value allocates_wholly is_restart running_sum_end_ordinal tran_value_running_sum alloc_value_running_sum
----------- ----------- ----------------------- ----------- ------------ --------------- ------------- --------------- ----------------------- ---------------- ---------- ----------------------- ---------------------- -----------------------
1 1 2007-01-01 00:00:00.000 100 1 NULL NULL NULL NULL NULL NULL 5 1000 500
1 2 2007-02-01 00:00:00.000 200 2 NULL NULL NULL NULL NULL NULL 5 900 500
1 3 2007-03-01 00:00:00.000 400 3 NULL NULL NULL NULL NULL NULL 5 700 500
1 4 2007-04-01 00:00:00.000 200 4 NULL NULL NULL NULL NULL NULL 5 300 500
1 5 2007-05-01 00:00:00.000 100 5 500 1 1 1000 1 1 5 100 500
1 6 2007-06-01 00:00:00.000 200 6 NULL NULL NULL NULL NULL NULL 14 1700 1850
1 7 2007-07-01 00:00:00.000 100 7 200 2 6 300 1 NULL 14 1500 1850
*/
-- we clean these up by running a check and removing the is_restart flag
-- where there is an unallocated portion of sum_alloc_value remaining
-- from the prior allocations even though the is_restart flag = 1
-- run this in a loop to bubble update the values
-- this covers situations where removing the is_restart flag causes another overlap
declare @rows_updated int
-- explicitly set the rows_updated value to ensure we process at least once
set @rows_updated = 1
while @rows_updated > 0
begin
-- *** debug
print '@rows_updated = ' + cast(@rows_updated as varchar)
update t1
set is_restart = null
--select *
from dbo.test_tran t1
left join dbo.test_tran t2
on t1.acc_id = t2.acc_id
and t1.tran_ordinal = t2.tran_ordinal - 1
where t1.is_restart = 1
and t2.tran_value_running_sum 0
begin
-- update running_sum_end_ordinal
update dbo.test_tran
set running_sum_end_ordinal = coalesce(
case when is_restart = 1 then tran_ordinal else null end
,(select min(tran_ordinal) from dbo.test_tran
where acc_id = t.acc_id
and tran_ordinal > t.tran_ordinal
and is_restart = 1)
, 10000
)
from dbo.test_tran t
-- update running sums
update dbo.test_tran
set tran_value_running_sum = (select sum(tran_value) from dbo.test_tran
where acc_id = t.acc_id
and tran_ordinal between t.tran_ordinal
and t.running_sum_end_ordinal)
, alloc_value_running_sum = (select sum(sum_alloc_value) from dbo.test_tran
where acc_id = t.acc_id
and tran_ordinal between t.tran_ordinal
and t.running_sum_end_ordinal)
from dbo.test_tran t
end -- if
end -- while
-- check the result
select * from dbo.test_tran
-- run the allocation
-- values are allocated where:
-- tran_value can wholly allocate:
-- tran_value_running_sum < alloc_value_running_sum
-- OR
-- alloc_value_running_sum 0
-- add a field to hold the final allocated value
alter table dbo.test_tran add final_allocation int
-- update final_allocation
update dbo.test_tran
set final_allocation =
case when tran_value_running_sum < alloc_value_running_sum
then tran_value
when alloc_value_running_sum 0
then tran_value - (tran_value_running_sum - alloc_value_running_sum)
else null
end
-- check the result
select * from dbo.test_tran
July 29, 2009 at 9:44 am
Triangular joins anyone? Search this site for stuff about that.
Have you tried this with 1M or even 100K rows in the table? Just scanned quickly so if I missed scalability numbers my apologies!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 29, 2009 at 10:13 am
Hi Jeff,
Interesting point, not something that had even entered my mind. Definitely interested in seeing how I can root these out.
I've taken a look at your article on triangular joins, get the concept, but not the solution as per your other article "Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5". Primarily (I hope) because I'm not familiar with the @var = value = @var + value syntax so can't quite follow what is going on.
Can you point me in the direction of an explanation?
On scalability, the above runs on 50k account batches (at an average of ~100 tran rows / 50 alloc rows per account) at about 80s / batch. Not stellar, but acceptable for my purposes.
Thanks, Iain
July 29, 2009 at 11:03 am
Hey - if it is acceptable for your purposes that is all that truly matters. Well, at least until your data volumes grow a bit! 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 30, 2009 at 2:46 am
Exactly 😀
Thankfully it's also a one off job, with a fixed data volume. The brief explicitly states "Performance is not an issue".
Until someone decides it isn't fast enough that is 😀
July 30, 2009 at 8:03 am
irobertson (7/30/2009)
Exactly 😀Thankfully it's also a one off job, with a fixed data volume. The brief explicitly states "Performance is not an issue".
Until someone decides it isn't fast enough that is 😀
Heck, I would have just done it with cursors then - much easier to program! :w00t:
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 30, 2009 at 8:12 am
Could this be the first ever 'we recommend a cursor' post on SSC?
Think I might print this page and frame it 🙂
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply