Finding the next record after a certain date time

  • Hi.

    I'm looking for any suggestions on how to accomplish the following:

    I’ve got a table with receipts and transactions in with about 350 million records, which all have a receipt or transaction code (about 50 different codes in all). In separate tables, there is a list of mailed documents and SMS’s which is sent out to the clients for reminders of payments. The idea is to track which option (mail or SMS) we get the most payments in again.

    If would start off with a mailed reminder 2 months before they should pay, then a mailed document in the next month, followed by various SMS, a letter of demand, and finally a letter of handing over for collection. (Very much simplified; but you get the idea)

    For each of these letters and SMS’s they would like to know when do the people actually pay their accounts, in order to find out which is the most successful campaign to follow. If that they find that the first letter 2 months before the time, receives the least payments, they might decide to stop these letters in order to save costs in the long run. (PS, these accounts are paid yearly, that’s why 2 months before the time)

    What the idea is with the current DW is that I should have an events table, for instance,

    CREATE TABLE Fact_Events

    ClientNumber int NOT NULL,

    DateOfEvent smalldatetime NOT NULL,

    EventType smallint NOT NULL

    )

    This events table would also have in other events of people phoning in for instance of any other events which might take affect the client or account.

    Sample Data

    Client NumberDate of EventEvent Type

    1012009/05/0111 = First Reminder Mailed letter

    1022009/05/011

    1012009/06/0122 = Second Reminder Letter

    1022009/06/012

    1012009/07/0133 = First SMS

    1022009/07/013

    1012009/07/031010 = Cash payment

    1022009/07/1544 = Second SMS

    1022009/07/3144 = Final SMS

    1022009/08/312121 = Penalty

    1022009/09/302121 = Penalty

    1022009/10/312121 = Penalty

    1022009/11/051111 = Bank Deposit

    Following from this data, it would be that the customer 101 paid after receiving the first SMS (3) and customer 102 after the final SMS (4).

    What would be the best way of retrieving the data of payments (Numbers 10 & 11) and then finding the corresponding event (3 & 4) which lead to the specific payments? Keep in mind that I have 350 million records and that during a query which will be about once a week, about 50,000 records will have to be queried to find the campaign which worked the best.

    Many thanks for any suggestions

  • Someone else may come up with a better solution but this works:

    WITH ctePayments AS

    (

    SELECT

    ClientNumber,

    MAX(DateOfEvent) AS PaymentDate

    FROM

    Fact_Events AS FE

    WHERE

    FE.EventType IN (10, 11)

    GROUP BY

    FE.ClientNumber

    ),

    cteNonPayments AS

    (

    SELECT

    ROW_NUMBER() OVER(PARTITION BY FE.ClientNumber ORDER BY EventType desc) AS row_id,

    FE.ClientNumber,

    FE.EventType,

    FE.DateOfEvent AS NotificationDate,

    CP.PaymentDate

    FROM

    Fact_Events AS FE LEFT JOIN

    ctePayments AS CP ON

    FE.ClientNumber = CP.ClientNumber

    WHERE

    FE.EventType <= 4

    )

    SELECT * FROM cteNonPayments WHERE row_Id = 1;

    You can change the LEFT JOIN to an INNER if you don't care about instances where there has not been a payment yet.

  • Oh and then I remembered this article[/url] which may help you.

  • If I understand your requirement and your sample data correctly, it can be rephrased as follows:

    There is a table holding approx 50k rows, based on a table with 350M rows (whereas the number of rows of the source table becomes irrelevant...).

    Events stored in that table, excluding event types larger than 11 (e.g. 21 = penalty), can be ordered by client and either event type or date, resulting in an identical order of rows.

    There is only one "payment history" per client.

    If the statements above are valid, I'd perform a three step operation:

    1) Number the rows per client ordered by EventType DESC

    2) Perform a PIVOT (or Cross-Tab) to get payment and corresponding event per customer in separate columns

    3) Aggregate the number of payments and corresponding events.

    Note: I'd assume, the 50k rows source table has an index on ClientNumber and EventType.

    In T-SQL:

    CREATE TABLE #Fact_Events

    (ClientNumber int NOT NULL,

    DateOfEvent smalldatetime NOT NULL,

    EventType smallint NOT NULL

    )

    INSERT INTO #Fact_Events

    SELECT 101 ,'2009/05/01', 1 UNION all

    SELECT 102 ,'2009/05/01', 1 UNION all

    SELECT 101 ,'2009/06/01', 2 UNION all

    SELECT 102 ,'2009/06/01', 2 UNION all

    SELECT 101 ,'2009/07/01', 3 UNION all

    SELECT 102 ,'2009/07/01', 3 UNION all

    SELECT 101 ,'2009/07/03', 10 UNION all

    SELECT 102 ,'2009/07/15', 4 UNION all

    SELECT 102 ,'2009/07/31', 4 UNION all

    SELECT 102 ,'2009/08/31', 21 UNION all

    SELECT 102 ,'2009/09/30', 21 UNION all

    SELECT 102 ,'2009/10/31', 21 UNION all

    SELECT 102 ,'2009/11/05', 11

    ;WITH

    -- step 1

    cte AS (

    SELECT

    ClientNumber,

    EventType,

    ROW_NUMBER () OVER (PARTITION BY ClientNumber ORDER BY EventType DESC) AS row

    FROM #Fact_Events

    WHERE EventType < 12

    ),

    -- step 2

    cte2 AS (

    SELECT

    clientnumber,

    max(CASE WHEN row = 1 THEN EventType ELSE '' END) AS payment,

    max(CASE WHEN row = 2 THEN EventType ELSE '' END) AS EventType

    FROM cte

    GROUP BY clientnumber

    )

    -- step 3

    SELECT payment, EventType, count(*) AS cnt

    FROM cte2

    GROUP BY payment, EventType

    DROP TABLE #Fact_Events

    /* result

    paymentEventTypecnt

    1031

    1141

    */

    Edit: It looks like you've got two different points of view...

    Jacks solution will give you actual dates per customer as per your request

    What would be the best way of retrieving the data of payments (Numbers 10 & 11) and then finding the corresponding event (3 & 4) which lead to the specific payments?

    and mine refers to your requirement

    For each of these letters and SMS’s they would like to know when do the people actually pay their accounts, in order to find out which is the most successful campaign to follow.

    The solutions proposed should help you either way....



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi guys

    Thanks the response! I'll try the suggestions tomorrow, at present I'm struggling with an SSIS import. Will keep you posted

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply