Data Warehouse Fact Table Design Stumbling Block

  • I'm looking for a good idea on the way to layout my fact table and still follow best practice. This looked like a good forum for that. Here's what I have:

    I'm designing a DW for the convenience store industry. This is just for transactional data that happens on the point of sale system. Each transaction may have multiple items and also multiple payment types. To explain in a real world scenario: Say the customer buys a coke and bag of chips and $12 in fuel. Say the total is $14. The customer then hands the cashier a gift card with $10 on it and pays cash for the rest. This leaves us with 3 items sold and 2 payment types.

    One of the business rules is to be able to report on what type of items were purchased with a certain payment type. ie. Gift Card, Credit Card, Debit Card, Cash, etc. They also want to be able to report on the dollar amount of each payment type. And, of course, there is a need to report on item sales. Here's where I'm stumped. I have this many to many relationship between Item Sales and Payment types with the organization, date, transaction number being the one thing unique between the two. I have tried to find any kind of example of this type of design, but every design I've found only contains the Sales and doesn't address the payment of the sale in this fashion.

    I have come up with a couple of ideas, but I'm not sure which way to go with this. I have attached an image of the fact table layout for the sales transaction not including the payment data. One option is to add the following columns to the FactSales table: PaymentKey1, PaymentKey1Amount, PaymentKey2, PaymentKey2Amount, PaymentKey3, PaymentKey3Amount, PaymentKey4, PaymentKey4Amount, PaymentKey5, PaymentKey5Amount. I really don't like this idea because it goes against all I've ever learned about good database design. This will also cause a challenge in reporting in having to pivot the data and then breaking it out into a separate data set to prevent duplication of amounts.

    Another option I came up with would be to create a completely separate fact table for payment data and duplicate the organization, date and transaction number in this fact table so I can join the two fact tables together when reporting. I don't know if that will effect performance though.

    Any suggestions or ideas on this are greatly appreciated!

    Sumac

  • Sumac (3/24/2011)


    I'm looking for a good idea on the way to layout my fact table and still follow best practice. This looked like a good forum for that. Here's what I have:

    I'm designing a DW for the convenience store industry. This is just for transactional data that happens on the point of sale system. Each transaction may have multiple items and also multiple payment types. To explain in a real world scenario: Say the customer buys a coke and bag of chips and $12 in fuel. Say the total is $14. The customer then hands the cashier a gift card with $10 on it and pays cash for the rest. This leaves us with 3 items sold and 2 payment types.

    One of the business rules is to be able to report on what type of items were purchased with a certain payment type. ie. Gift Card, Credit Card, Debit Card, Cash, etc. They also want to be able to report on the dollar amount of each payment type. And, of course, there is a need to report on item sales. Here's where I'm stumped. I have this many to many relationship between Item Sales and Payment types with the organization, date, transaction number being the one thing unique between the two. I have tried to find any kind of example of this type of design, but every design I've found only contains the Sales and doesn't address the payment of the sale in this fashion.

    I have come up with a couple of ideas, but I'm not sure which way to go with this. I have attached an image of the fact table layout for the sales transaction not including the payment data. One option is to add the following columns to the FactSales table: PaymentKey1, PaymentKey1Amount, PaymentKey2, PaymentKey2Amount, PaymentKey3, PaymentKey3Amount, PaymentKey4, PaymentKey4Amount, PaymentKey5, PaymentKey5Amount. I really don't like this idea because it goes against all I've ever learned about good database design. This will also cause a challenge in reporting in having to pivot the data and then breaking it out into a separate data set to prevent duplication of amounts.

    Another option I came up with would be to create a completely separate fact table for payment data and duplicate the organization, date and transaction number in this fact table so I can join the two fact tables together when reporting. I don't know if that will effect performance though.

    Any suggestions or ideas on this are greatly appreciated!

    Sumac

    Hi Sumac,

    I agree with you that the best approach would probably be to create a separate fact table for payments. By doing this, you would be able to create a payment type dimension and join to it for the necessary.

    As you pointed out, joining two fact tables will affect performance. Although joining two fact tables are appropriate for certain types of cubes/reports, it wouldn't be ideal to do that just for the transation information. I would therefore recommend that you take it one step further and create a "Transaction" dimension.

    This would be a little unconventional because of the fact that the dimension would almost have as many records as the fact table...but I think it is appropriate in this case and would allow you not only ultimate flexibility, but also good performance (you would not have to join the two fact tables every time).

    I hope this helps.

    Martin.

  • My approach would have a Payment Type dimension with a Payment Type attribute that has the specific type when just one payment method was used (i.e. Cash, Debit Card, Gift Card), and something else meaningful when multiple was used (Gift Card and Cash, or a generic Multiple Types) that I'd discuss with the business on their needs.

    Then I'd have a few flags like Gift Card Used, Cash Used, Debit Card Used.

    This could then be applied at both Transaction and Transaction Line level.

  • I'm designing a DW for the convenience store industry. This is just for transactional data that happens on the point of sale system. Each transaction may have multiple items and also multiple payment types. To explain in a real world scenario: Say the customer buys a coke and bag of chips and $12 in fuel. Say the total is $14. The customer then hands the cashier a gift card with $10 on it and pays cash for the rest. This leaves us with 3 items sold and 2 payment types.

    I'm wondering how you know how the gift card should be applied. Does the 10 gift card get applied completely to the gas? The most obvious answer is that these items would be prorated. All the item costs would be multiplied by 10/14ths to get the gift card cost and the rest is cash.

    This would create a fact table with a level of granularity of by item by payment type. This would allow you to use a single fact table. Each transaction would have the number of records corresponding to the number of items X number of payment types. For most transactions, that would be x * 1, as most people don't pay with multiple types, but many will have two times the items, and there's no upper limit.

    Whatever you do, stay away from the multiple column solution which rightly concerns you.

    Establishing the grain is THE crucial first step in fact table design. I think this proposal gives you everything you would want, as long as you can pro rate the payment types.

  • I agree totally on identifying the grain. The first thing I thought when reading this last week was you need a Transaction. Your case may be different, but when working with retail in the past, we weren't allowed to arbitrarily assign payment types to products. You 'could' do the allocation (ie 4 payment types, 10 products, each payment type allocates 1/10 of it's total to each of the products) but this isn't 'true' per se. If you work with a transaction (ie the 'docket'), then the payment types are many to many, and the products to transaction are also many to many. The transaction is what relates the customer (assuming they'e known/identifiable) to the product.

    I haven't looked at it in a while, but i wouldn't mind betting that the white paper by Marco Russo - Many to Many revolution - might even have an example/sample that you could work from to give you ideas on how to solve this.

    Steve.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply