September 26, 2022 at 11:33 am
Hi all,
I have a couple of modelling questions using several star schemas.
There are different entities which I consider facts (they all share similar dimensions, dimcustomer, dimdate, dimshop, etc.), for example, orders, payments, inventory, credit cards movements and loans.
Most of these have thousands of entries every day and each one has metrics which I would like to report from.
If all of these are independent facts though, and thinking about facts shouldn’t be joined between them, a few problems present:
Actually, payments and orders are related because payments are (sometimes) made for the orders. Both have a different granularity so I think they can’t be together in the same fact table.
For reporting purposes, I am going to need to join all facts together to get the full revenue. I assume this is not something strange to do, but again I am joining facts together, is that ok?
Below there’s a rough schema of this example. I hope it helps.
Thank you so much in advance.
September 26, 2022 at 12:10 pm
Joining fact tables isn't a great idea - people may get confused, performance won't be great, etc.
Do payments always relate to an order or can there be another source?
I'd be looking at creating a DimOrder table that can be used against order and payment facts.
The other concern I have with your design is the nullable columns in the payments fact. You probably have more work to do on the design/meaning of the payments table, i.e. is it for order payments only or a multi-purpose payments table?
September 26, 2022 at 12:25 pm
Hi Mark,
thanks for your reply 🙂
Payments can be related to an order or an account/customer, that's the reason order_id and customer_id can be NULL as it is one or the other. I would really like to keep all type of payments together for reporting purposes.
Would you suggest a dimOrder table as a bridge table between these two facts then? Although ultimately the join would be the same and I am not sure it would be easier to understand for the analysts.
September 26, 2022 at 12:34 pm
I'd suggest DimOrder as a normal conformed dimension.
I'd also suggest that you have negative surrogate keys that will allow you to report smarter:
It's common practice to have -1 as an unknown surrogate key, which I am assuming you're aware of.
In your example, I'd add a -1 and a -2 row to your DimOrder table for loading facts and the same to DimCustomer.
-1 will be for where it is genuinely unknown (an payment is against an order but you haven't captured which one).
-2 will be for where you know it is a payment that is not against an order.
Rinse and repeat for the vice versa situation with customers.
You can then have NOT NULL surrogate keys in your payment fact table and easily filter out non-order payments (order_id=-2) and also only order payments (customer_id = -2)
This also keeps with your requirement to, quite reasonably, keep all the payments together.
Does that make sense and satisfy your reporting requirement?
September 26, 2022 at 12:57 pm
You are absolutely right, I would use -1 but I've written not null so that is incorrect. I love the -2 idea, I was just going to use -1 for everything.
About joining the facts which worries me quite a lot. Is this what you suggested? I understand this dimension will have multiple rows of the same order (as many as payments made for it).
September 26, 2022 at 1:29 pm
I'd have one DimOrder row per order. Say the order total is £10,000, made on June 1st and they pay in three instalments of £5,000, £3,000 and £2,000, made on the first of each month in June, July and August this year.
You would have one DimOrder record with order_id 123.
You would have one FactOrderLine record with order_id 123
You would have three FactPayment records with order_id 123, each with a date key related to their date of payment.
The key bit here is understanding that order lines and payments may not be related. You might have six order lines for order 123, with a grand total of the £10,000 mentioned earlier, four of £2,000 and two of £1,000. There's no realistic way to relate the payment to order lines, even if your accountants choose to attribute the payments on a pro rata basis.
Make sure that you mentally separate orders and payments as two different processes - one is in effect a contract for you to deliver goods and the other is a contract for the customer to pay for the goods. Remember it's not a retail shop model, where you pay for your cola, bread, milk and beans on the same day you "order" them by putting them in your shopping cart, taking them to the till, paying for the whole lot and leaving with them.
However, (and remember I don't know your business model) you may have customers that do walk-ups to your trade counter and pay on the day for urgent items, i.e. no order id, just scan the bar code, pay and go. These items would be the food retail equivalent.
For your model, I'd have an order fact (which may have the order total on for easier reporting) and order line fact with each separate line on (no need for a separate order line dim, just denormalise items like order_item_id, order_price, etc. because they only relate to that individual order item). Model that separately to the payments side because an order could be paid off in instalments and you can then have multiple payments for one order.
I appreciate it's a lot to take in - I'd consider investing in Ralph Kimball's book The Data Warehouse Toolkit; I first got my copy in 2003 and have used it regularly since. It has most of the concepts we've discussed here, aside from the ETL tips and tricks around using -2 as a more informative surrogate key than -1 for all supposed unknowns.
September 27, 2022 at 7:36 am
Thanks Mark for your help.
I am not sure I understand 100% what you mean but I will think about it.
I bought the book yesterday 🙂 Glad to hear another recommendation about it.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply