How to model credit card transaction process?

  • Hi,

    I am involved in the creation of a data mart to model credit card transactions. The grain of our Transaction_Fact fact table is "one row per transaction processed". The dimensions surrounding this fact table are: Date, Merchant, Response, Card (one row per card processed), etc.

    The problem we are having is that the card number has a very high cardinality. We could say that one same card is processed no more than 3 times. It means that I will need to insert a new row into Card dimension every 3 rows I insert into Transaction_Fact. That makes my Card dimension a monster dimension that grows nearly at the same speed of the fact table. We could forget the Card dimension and bring the card number into the fact table, but we have several other attributes in the Card dimension (expiration date, card holder data, etc.) and I don't know what to do with them.

    How would you recommend we change the design to accommodate the rapidly changing Card dimension? Also, could you please point me to an article discussing the topic, other than the classic order header-detail case of study?

    Thanks,

    jcoira

    // jcoira

  • If you have to tract all that other stuff about a card, why worry about how many rows you have... it's gonna have to be stored somewhere. Perhaps a separate table (fact table?) for all of that?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for replying Jeff.

    I am worried because of everything I've read about the performance issues of monster dimensions. I know it is not good to have a dimension that grows along with your fact table. I just dont know how to avoid it in this case.

    jcoira

    // jcoira

  • Guess my other question would be... why do you even need this type of info in a fact table or dimension? This should be simple lookup data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As others have said, the data has to exist somewhere relationally. As for dimension/fact design, it all depends upon what type of analytics you want to do. Would you really want to browse a dimension all the way down to the card level? My guess is the card itself is not important to regular OLAP analytics. I doubt you would expose the card numbers themselves for browsing, CISP/security requirements notwithstanding. The card is almost a customer dimension.

    There are different types of analytics you can do, each with its own potential designs and frequency of use, and source.

    OLAP Analytics by card type can be done daily/weekly/monthly for gross reconciliation. They can be done monthly or ad hoc for analysis of processing fees or cost projections for promotions by card type. Reconciliation to the transaction would likely be at the relational level only.

    OLAP analytics by customer can be done weekly/monthly for evaluation of loyalty patterns. Data mining queries can be done monthly/ad hoc to identify customers who may churn or be open to promotions.

    Each of these is potentially a different dimension design. While you do want to be concerned about performance, in my opinion your first concern should be meeting your business requirements.

    Larry

    Larry

  • I would agree with Larry, the key here is business requirements - what question are you trying to answer?

    Joe

  • Lets analyze two possible scenarios:

    1- Analysts do need to browse to the card number or cardholder level when creating or modifying anti-fraud rules. Hence, I need to store the data at the most granular level, which is the card number (this is not my scenario, but I am very interested in how it could be designed)

    2- Analysts do NOT need to go down to those levels. Cart type would do for them. However, I still need to support relational reports involving the card number and cardholder data (this is my scenario)

    How would you suggest I deal with each of them?

    I really appreciate your help on this. I have been struggling to come up with a "clean" design for some time now.

    // jcoira

  • With each fact row representing a credit card tender, there is nothing wrong with storing both a tender type FK and a tender (card number) id FK on the fact row. Whether or not you create an OLAP dimension for the card depends upon usage. I personally recommend storing a surrogate key only and not the card number on the fact row. My reasons have to do with PCI compliance - let whatever cardnumber masking/encryption/hashing you are doing happen on a narrow table that is not heavily used.

    CREATE TABLE TenderFact (

    TranId int NOT NULL, --could be surrogate key or

    --whatever fields it takes to uniquely id a tran

    TranDateId int NOT NULL, --HHMM-level analysis beyond scope of this reply

    TenderId int NOT NULL, --the credit card FK

    TenderTypeId int NOT NULL, --Visa, MC, etc

    TenderExpirationDateId int NOT NULL,

    TenderAmount money NOT NULL,

    ServicerTranId varchar(xx) NULL,

    AuthCode varchar(xx) NOT NULL,

    AuthType int NOT NULL, --automated, online, outsourced,

    --manual, none (accepted without auth)

    IsReturnFlag tinyint NOT NULL,

    TaxExemptTranFlag tinyint NOT NULL,

    DebitCreditOptionFlag tinyint NOT NULL,

    --other flags of interest

    --Processing fee-related fields

    AddDate datetime,

    AddSourceKey ...

    )

    For scenario 1, I claim you do not need to use a standard OLAP dimension to browse to the card number. What you are talking about is data mining. You are going to set up your "cases" specific to mining and train your model accordingly. DMX queries would then be used to ask the likelihood a particular transaction is fraudulent. For users manually "reviewing fraud rules" I can see browsing some transactions relationally, but not needing an OLAP dimension to be able to peruse all cards. The fact rows having a surrogate foreign key for the card number likely works just fine, unless you are trying to detect fraud for "cards with the following digits in them" or other similar analysis. I don't see enough value for random, occasional searches to justify a dimension with millions of members exposed. YMMV.

    For scenario 2, the fact row design could be the same and still support the stated reporting need. But your other relational uses of the data may push your design in a different direction. Again - what are your other requirements?

    I guess I don't see the issue. Yes, I would worry about end-user performance browsing a credit card dimension in ProClarity/pivot tables, etc. But considering that PCI compliance issues would likely prevent a normal end-user from seeing the entire card number anyway, the point may be moot. You might as well just display the surrogate key. Should you choose to display a high cardinality dimension in general, see if there are ways to use groupings (manual in 2000, somewhat automated in 2005).

    Am I missing something else in your question?

    Larry

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply