Beginner - Splitting a source table into Dimension and Fact tables

  • I am in a situation similar to below, where a source table has both measures and context, where the amount paid columns are used heavily in reporting:

    PAYMENT

    PAYMENT_ID (INT, PK)

    CLIENT_ID (INT foreign key to client table)

    ACCOUNT_ID (INT)

    DATE_OF_PAYMENT (DATETIME)

    INVOICE_ID (INT)

    AMT_PAID_BY_CHEQUE (DECIMAL, used in aggregations)

    AMT_PAID_BY_CASH (DECIMAL, used in aggregations)

    AMT_PAID_BY_CREDIT (DECIMAL, used in aggregations)

    If this were transformed to a snowflake or hybrid schema table, where one of the main functions would be measuring payments by payment type, is there a better way than below? Is the below a decent solution?

    DIM_PAYMENT

    PAYMENT_ID (INT, PK)

    CLIENT_ID (INT foreign key to client dimension)

    DATE_OF_PAYMENT (Foreign key to date table, or datetime)

    INVOICE_ID (INT, foreign key to invoice dimension)

    FACT_PAYMENT_COMPONENT

    PAYMENT_ID INT

    PAYMENT_COMPONENT_IND (Indicator for Cash, Cheque or Credit)

    AMOUNT (DECIMAL)

  • In my opinion, you should keep the payment related info only in the fact, not in the dimension. And you should create a dimension for all the entities(dimensions for client, account and invoice)

    And in the fact to keep the reference for these three dimensions and your measures: the amount and the payment date.

    You should have all the references to the dimensions in the fact table, not in the dimension.

    For the payment type you could create a separate dimension to contain all three values, but the easiest way would be the way you implement it - with a degenerate dimension.

    Hope this helps.

  • The payment table looks almost as a fact table to me. Just remove the payment_id and invoice_id, and group by the other columns. You need to create a dimension for client, one for account (or merge both into a single dimension depending on how you relate them) and one for time (payment_date). You need to define if cheque, cash and credit should be a dimension or are separate measures (which can be added up). I'd leave them as separate measures.

    This can go more complex if you want to include your invoice date in your payments fact. If each account represents a different product, or something else that we can't see.

    It could be something like this:

    PAYMENT

    ACCOUNT_ID (INT)

    DATE_OF_PAYMENT (DATE)

    AMT_PAID_BY_CHEQUE (DECIMAL, used in aggregations)

    AMT_PAID_BY_CASH (DECIMAL, used in aggregations)

    AMT_PAID_BY_CREDIT (DECIMAL, used in aggregations)

    ACCOUNT

    ACCOUNT_ID

    CLIENT_NAME

    LOCATION

    SOME_MORE_COLUMNS_TO_DESCRIBE_ACCOUNT

    TIME

    DATE

    LONG_DATE

    SHORT_DATE

    WEEK

    MONTH

    QUARTER

    SEMESTER

    YEAR

    DAY_OF_WEEK

    ETC

    EDIT: Try to get The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling book

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • In my opinion, the first design you've had wasn't far off. I would do the following:

    PAYMENT

    PAYMENT_ID (INT, PK)

    CLIENT_ID (INT foreign key to client table)

    ACCOUNT_ID (INT)

    DATE_OF_PAYMENT (DATETIME)

    INVOICE_ID (INT)

    PAYMENT_TYPE_ID (INT)

    AMT_PAID (DECIMAL)

    PAYMENT_TYPE

    PAYMENT_TYPE_ID (INT, PK)

    PAYMENT_TYPE_NAME (VARCHAR)

    ...

    (Assuming that you already have a dimension for date, account and client)

    Add a dimension for payment type. Your fact table will have multiple entries per invoice...once for each payment type. Doing it that way will allow you to analyze the data by both invoice and payment type.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply