September 8, 2009 at 12:30 am
Hi.
I'm looking for any suggestions on how to accomplish the following:
I’ve got a table with receipts and transactions in with about 350 million records, which all have a receipt or transaction code (about 50 different codes in all). In separate tables, there is a list of mailed documents and SMS’s which is sent out to the clients for reminders of payments. The idea is to track which option (mail or SMS) we get the most payments in again.
If would start off with a mailed reminder 2 months before they should pay, then a mailed document in the next month, followed by various SMS, a letter of demand, and finally a letter of handing over for collection. (Very much simplified; but you get the idea)
For each of these letters and SMS’s they would like to know when do the people actually pay their accounts, in order to find out which is the most successful campaign to follow. If that they find that the first letter 2 months before the time, receives the least payments, they might decide to stop these letters in order to save costs in the long run. (PS, these accounts are paid yearly, that’s why 2 months before the time)
What the idea is with the current DW is that I should have an events table, for instance,
CREATE TABLE Fact_Events
ClientNumber int NOT NULL,
DateOfEvent smalldatetime NOT NULL,
EventType smallint NOT NULL
)
This events table would also have in other events of people phoning in for instance of any other events which might take affect the client or account.
Sample Data
Client NumberDate of EventEvent Type
1012009/05/0111 = First Reminder Mailed letter
1022009/05/011
1012009/06/0122 = Second Reminder Letter
1022009/06/012
1012009/07/0133 = First SMS
1022009/07/013
1012009/07/031010 = Cash payment
1022009/07/1544 = Second SMS
1022009/07/3144 = Final SMS
1022009/08/312121 = Penalty
1022009/09/302121 = Penalty
1022009/10/312121 = Penalty
1022009/11/051111 = Bank Deposit
Following from this data, it would be that the customer 101 paid after receiving the first SMS (3) and customer 102 after the final SMS (4).
What would be the best way of retrieving the data of payments (Numbers 10 & 11) and then finding the corresponding event (3 & 4) which lead to the specific payments? Keep in mind that I have 350 million records and that during a query which will be about once a week, about 50,000 records will have to be queried to find the campaign which worked the best.
Many thanks for any suggestions
September 8, 2009 at 1:26 pm
Someone else may come up with a better solution but this works:
WITH ctePayments AS
(
SELECT
ClientNumber,
MAX(DateOfEvent) AS PaymentDate
FROM
Fact_Events AS FE
WHERE
FE.EventType IN (10, 11)
GROUP BY
FE.ClientNumber
),
cteNonPayments AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY FE.ClientNumber ORDER BY EventType desc) AS row_id,
FE.ClientNumber,
FE.EventType,
FE.DateOfEvent AS NotificationDate,
CP.PaymentDate
FROM
Fact_Events AS FE LEFT JOIN
ctePayments AS CP ON
FE.ClientNumber = CP.ClientNumber
WHERE
FE.EventType <= 4
)
SELECT * FROM cteNonPayments WHERE row_Id = 1;
You can change the LEFT JOIN to an INNER if you don't care about instances where there has not been a payment yet.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
September 8, 2009 at 1:30 pm
Oh and then I remembered this article[/url] which may help you.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
September 8, 2009 at 1:36 pm
If I understand your requirement and your sample data correctly, it can be rephrased as follows:
There is a table holding approx 50k rows, based on a table with 350M rows (whereas the number of rows of the source table becomes irrelevant...).
Events stored in that table, excluding event types larger than 11 (e.g. 21 = penalty), can be ordered by client and either event type or date, resulting in an identical order of rows.
There is only one "payment history" per client.
If the statements above are valid, I'd perform a three step operation:
1) Number the rows per client ordered by EventType DESC
2) Perform a PIVOT (or Cross-Tab) to get payment and corresponding event per customer in separate columns
3) Aggregate the number of payments and corresponding events.
Note: I'd assume, the 50k rows source table has an index on ClientNumber and EventType.
In T-SQL:
CREATE TABLE #Fact_Events
(ClientNumber int NOT NULL,
DateOfEvent smalldatetime NOT NULL,
EventType smallint NOT NULL
)
INSERT INTO #Fact_Events
SELECT 101 ,'2009/05/01', 1 UNION all
SELECT 102 ,'2009/05/01', 1 UNION all
SELECT 101 ,'2009/06/01', 2 UNION all
SELECT 102 ,'2009/06/01', 2 UNION all
SELECT 101 ,'2009/07/01', 3 UNION all
SELECT 102 ,'2009/07/01', 3 UNION all
SELECT 101 ,'2009/07/03', 10 UNION all
SELECT 102 ,'2009/07/15', 4 UNION all
SELECT 102 ,'2009/07/31', 4 UNION all
SELECT 102 ,'2009/08/31', 21 UNION all
SELECT 102 ,'2009/09/30', 21 UNION all
SELECT 102 ,'2009/10/31', 21 UNION all
SELECT 102 ,'2009/11/05', 11
;WITH
-- step 1
cte AS (
SELECT
ClientNumber,
EventType,
ROW_NUMBER () OVER (PARTITION BY ClientNumber ORDER BY EventType DESC) AS row
FROM #Fact_Events
WHERE EventType < 12
),
-- step 2
cte2 AS (
SELECT
clientnumber,
max(CASE WHEN row = 1 THEN EventType ELSE '' END) AS payment,
max(CASE WHEN row = 2 THEN EventType ELSE '' END) AS EventType
FROM cte
GROUP BY clientnumber
)
-- step 3
SELECT payment, EventType, count(*) AS cnt
FROM cte2
GROUP BY payment, EventType
DROP TABLE #Fact_Events
/* result
paymentEventTypecnt
1031
1141
*/
Edit: It looks like you've got two different points of view...
Jacks solution will give you actual dates per customer as per your request
What would be the best way of retrieving the data of payments (Numbers 10 & 11) and then finding the corresponding event (3 & 4) which lead to the specific payments?
and mine refers to your requirement
For each of these letters and SMS’s they would like to know when do the people actually pay their accounts, in order to find out which is the most successful campaign to follow.
The solutions proposed should help you either way....
September 9, 2009 at 4:47 am
Hi guys
Thanks the response! I'll try the suggestions tomorrow, at present I'm struggling with an SSIS import. Will keep you posted
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply