July 20, 2015 at 9:33 pm
I am in a situation similar to below, where a source table has both measures and context, where the amount paid columns are used heavily in reporting:
PAYMENT
PAYMENT_ID (INT, PK)
CLIENT_ID (INT foreign key to client table)
ACCOUNT_ID (INT)
DATE_OF_PAYMENT (DATETIME)
INVOICE_ID (INT)
AMT_PAID_BY_CHEQUE (DECIMAL, used in aggregations)
AMT_PAID_BY_CASH (DECIMAL, used in aggregations)
AMT_PAID_BY_CREDIT (DECIMAL, used in aggregations)
If this were transformed to a snowflake or hybrid schema table, where one of the main functions would be measuring payments by payment type, is there a better way than below? Is the below a decent solution?
DIM_PAYMENT
PAYMENT_ID (INT, PK)
CLIENT_ID (INT foreign key to client dimension)
DATE_OF_PAYMENT (Foreign key to date table, or datetime)
INVOICE_ID (INT, foreign key to invoice dimension)
FACT_PAYMENT_COMPONENT
PAYMENT_ID INT
PAYMENT_COMPONENT_IND (Indicator for Cash, Cheque or Credit)
AMOUNT (DECIMAL)
July 21, 2015 at 6:07 am
In my opinion, you should keep the payment related info only in the fact, not in the dimension. And you should create a dimension for all the entities(dimensions for client, account and invoice)
And in the fact to keep the reference for these three dimensions and your measures: the amount and the payment date.
You should have all the references to the dimensions in the fact table, not in the dimension.
For the payment type you could create a separate dimension to contain all three values, but the easiest way would be the way you implement it - with a degenerate dimension.
Hope this helps.
July 21, 2015 at 6:35 am
The payment table looks almost as a fact table to me. Just remove the payment_id and invoice_id, and group by the other columns. You need to create a dimension for client, one for account (or merge both into a single dimension depending on how you relate them) and one for time (payment_date). You need to define if cheque, cash and credit should be a dimension or are separate measures (which can be added up). I'd leave them as separate measures.
This can go more complex if you want to include your invoice date in your payments fact. If each account represents a different product, or something else that we can't see.
It could be something like this:
PAYMENT
ACCOUNT_ID (INT)
DATE_OF_PAYMENT (DATE)
AMT_PAID_BY_CHEQUE (DECIMAL, used in aggregations)
AMT_PAID_BY_CASH (DECIMAL, used in aggregations)
AMT_PAID_BY_CREDIT (DECIMAL, used in aggregations)
ACCOUNT
ACCOUNT_ID
CLIENT_NAME
LOCATION
SOME_MORE_COLUMNS_TO_DESCRIBE_ACCOUNT
TIME
DATE
LONG_DATE
SHORT_DATE
WEEK
MONTH
QUARTER
SEMESTER
YEAR
DAY_OF_WEEK
ETC
EDIT: Try to get The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling book
July 21, 2015 at 9:32 am
In my opinion, the first design you've had wasn't far off. I would do the following:
PAYMENT
PAYMENT_ID (INT, PK)
CLIENT_ID (INT foreign key to client table)
ACCOUNT_ID (INT)
DATE_OF_PAYMENT (DATETIME)
INVOICE_ID (INT)
PAYMENT_TYPE_ID (INT)
AMT_PAID (DECIMAL)
PAYMENT_TYPE
PAYMENT_TYPE_ID (INT, PK)
PAYMENT_TYPE_NAME (VARCHAR)
...
(Assuming that you already have a dimension for date, account and client)
Add a dimension for payment type. Your fact table will have multiple entries per invoice...once for each payment type. Doing it that way will allow you to analyze the data by both invoice and payment type.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply