Factless Fact and Slowly Changing Dimensions

  • I'm working on the same bakery problem... <g>

    Product--(1,M)--ProductIngredient--(M,1)---Ingredient

    Both Product and Ingredient are slowly changing dimensions, because ingredient costs change over time, and so sometimes Product prices have to as well. So I add a new primary key to both Ingredient and Product, so that the new primary key points at a cost during a given period of time (Between StartDate and EndDate).

    My measure for calculating the unit cost for a single Product used to be really simple:

    Per Recipe Cost = CALCULATE(SUMX('ProductIngredient', [IngredientWeight_Kg] *
            RELATED('Ingredient'[Price per Kg]) )

    Does the "ProductIngredient" table remain static, and the Ingredient and Product dimension tables become slowly changing dimensions (because the prices are only good for a range of dates)?  I'm watching Ken Puls' presentation on Implementing Slowly Changing Dimensions "Quickly Solving the SCD Challenge".

    In his presentation, he basically creates multiple copies of a customer (One will be at one address, one at another... that's fine) So if I follow the pattern, when the price of an ingredient changes, a new Ingredient will be added with a start date and an end date (IngredientPK, IngredientSK, Name, UnitPrice, PricePerKg, StartDate, EndDate).

    If I do the same for Product, how do I relate the Product to the Ingredient? Just use the Surrogate key? But then how do I know if I'm pointed at the right version?

    Product would be very similar to Ingredient

    (ProductPK, ProductSK, Name, SellPrice, [MakePrice], StartDate, EndDate)

    ... so now ProductPK would point to a Product selling at given price for a period of time.

    If I were doing this in purely SQL, I guess I could do something like CROSS APPLY with a (ProductKey, Date) and retrieve the latest price for each ingredient, and then use that in [IngredientUnitPrice] * [RecipeWeight], and sum it to get the total cost, but this is making my head hurt. Is there a good article on Slowly Changing dimensions that does more than cover "this is what it is"? (I'm doing this in DAX, but you gotta start somewhere!... and I gotta start with understanding this better!

    Thanks!

    Pieter

     

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I notice that there's no mention of product sales … hence your reference to 'factless', I assume.

    Are you therefore simply building a 'Product cube'?

    What sort of reporting/analysis are you intending to do, once it's in place? Unless there's a massive amount of data, building out dims and facts might be overkill.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • There's a Sales fact, but that part is easy, once all the dimension stuff is fixed, so I left it out.  Product is related to Sales too. I'm just having trouble figuring out how to relate the Ingredients and the Recipe.

    Ingredient is fairly simple

    (IngredientPK, IngredientSK, Name, UnitPrice, UnitSize, PricePerKg, StartDate, EndDate)

    so is Product

    (ProductPK, ProductSK, Name, SellPrice, {unitCost}, StartDate, EndDate)

    But if "recipe" is static (IngredientSK, ProductSK, Weight), then how do I sort out the changing unitcost?  (Doesn't the Recipe table remain static?... it's just describing the relationship between ingredients (well, weight proportions). And once I get that, how do I get the current price of an ingredient (I'd almost always have it on a chart with a date on the X-axis. Yeah, use CALCULATE() and maybe SUMX()?

    This is the DAX if the price isn't changing... so I can just use RELATED() to get the price, because there's only one.

    SUMX('Recipe', [Weight] * RELATED(Ingredient[PricePerKg]) )

    I know I have to merge the Date and the Purchase and fill down. (gets {Date, UnitPrice} so that each Ingredient has a {IngredientKey, DateKey, UnitPrice} for each Date. Do I just use FILTER() inside the RELATED?..

    RELATED(FILTER(Ingredient,[Date]=MAX(DimDate[Date]),[UnitPrice]) ?

    Is that the answer to the riddle?

  • It is, indeed, a tricky one. Surrogate keys + date logic looks like the way forward.

    It's not an elegant star schema model.

    I hope that others respond with their ideas.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi Pieter,

    can you share the model you are working on?

    😎

    There are several ways of approaching this problem, some are good, some are bad and some are ugly!

    For performance, one might have to dig into advanced DAX concepts and "scopes"!

  • This is a tricky modeling question, and one that is difficult to answer without a lot of context and spending some time considering all the options (and side-effects of those options). That's probably why you cannot find any references to help out.

    I'll try to give you some general ideas, but please weigh it up against the context of your requirements. The design will largely depend on the questions you need to be able to answer.

    In my opinion you've already answered your own question to an extent, but you need to slightly change your perspective in order for it to "feel right" 🙂

    Relationships between dimensions make analytics in a star-schema difficult, and it is usually much easier if the relationships exist between dimensions and a fact table. I usually try to model it that way, even in a M:M scenario. Sometimes, as it is the case here, you will benefit from a fact table that describes the relationship between two dimensions, even if there are no real transactions or facts. I've blogged about a similar scenario, and although you may not have the volatility I describe there, I think the concepts still fit the use case to an extent: https://martinschoombee.com/2011/11/16/temporal-fact-tables/

    If you think about this as a fact-less fact you would be halfway there, but in my opinion it would be much better to see your ProductIngredient table as a semi-additive fact table...almost like an inventory snapshot, but only containing a record whenever there is a change in one of the dimension records, or the relationship between the two dimensions.

    For example, let's say you have product X with ingredients A, B & C and the following dimension structures (using your terminology here to make it relatable, and assuming the PK is the business key while the SK is a true surrogate key):

    • Product: ProductPK, ProductSK, Name, UnitPrice, Weight, SellingPrice, StartDate, EndDate
    • Ingredient: IngredientPK, IngredientSK, Name, UnitPrice, UnitSize, PricePerKg, StartDate, EndDate

     

    Assuming Product X's SK is 1 and ingredients A, B & C's SK's are 1, 2 & 3 respectively, the entries in your ProductIngredient fact table could look something like this:

    ----------------------------------------------------------------------------------------------------
    DateSK  |  ProductSK  |  IngredientSK | ProductPrice | IngredientAmount | IngredientUnitCost
    ----------------------------------------------------------------------------------------------------
    20240101 1 1 12.49 0.1 0.12
    20240101 1 2 12.49 0.01 0.05
    20240101 1 3 12.49 0.23 1.59

     

    The trick here is to add a new set of records that represents the relationship between the product and its ingredients every time anything changes...in other words when there is a type-2 change in either dimension. For example, if the cost of ingredient A changes and you insert a new dimension record with SK=4, then that would also drive new entries to the fact table for all products that use that ingredient. The fact table could look something like this after the change:

    ----------------------------------------------------------------------------------------------------
    DateSK  |  ProductSK  |  IngredientSK | ProductPrice | IngredientAmount | IngredientUnitCost
    ----------------------------------------------------------------------------------------------------
    20240101 1 1 12.49 0.1 0.12
    20240101 1 2 12.49 0.01 0.05
    20240101 1 3 12.49 0.23 1.59

    20240501 1 4 12.49 0.1 0.15
    20240501 1 2 12.49 0.01 0.05
    20240501 1 3 12.49 0.23 1.59

     

    This design works assuming that you may want to add the costs of a product's ingredients, but only ever want to aggregate the selling price of products by using an average across time...or the latest value which is similar to what you would have done if this was an inventory snapshot table. The difference between this and a periodic snapshot table is the fact that you only ever insert records when the relationship changes, not every day, week or month like you would with a snapshot.

     

    A few other things that may be useful, albeit not included in my examples:

    • Adding Date keys for both the Start & End dates could be useful as well, if you want to be able to calculate for how long a given price and/or relationship was valid.
    • Consider adding all additive and semi-additive attributes from the dimensions to the fact table, as it would make any aggregations or calculations a lot simpler.
    • Although you could argue that by creating a fact table like this you would no longer need type-2 dimensions, I would still keep both as type-2 because it would be useful for other fact tables like Sales, etc.
    • The same argument can be made for the additive and semi-additive attributes in both dimensions. I would keep them in the dimension tables as well, because it may be useful to have access to those metrics for other calculations.
  • Eirikur,

    (Oh wait, before I go too far, all of this mess is in DAX/PowerBI...  "Run away, run away!" --King Arthur)

    What do you mean by Scopes? ISINSCOPE() kinds of scopes?  the "Product" SCD2 isn't bad.

    add a new Primary Key that points to "the price/attribute values of a product for a given period of time" (So it would have an Effective Start Date and Effective End Date). While I could store the computed unit cost, the point of having the Ingredient/Purchases as a slowly changing dimension is so that I can create a measure to calculate the cost of a loaf over time (so time would be on the X-axis of my graph).

    If I can post a big terrible file up here, I will. But I doubt it. And I doubt anybody would want to wade through it.

    How do you want the model? I could do it as a series of CREATE TABLE statements... not perfect, but it would give you an idea of what the tables look like. It *was* working "until things changed" (Ken Puls, talking to the Vancouver PowerBI User Group and explaining how to handle slowly changing dimensions... except the only thing changing in his model was "parent" dimensions. His Customers were moving around.) The part that got me was "well, what if that slowly changing dimension is participating in a relationship with a factless fact (Recipe) table?

  • Wow, Martin! Thanks! I understand most of it. The part I was waaaay wrong on is that YES, the changing prices DO get somewhat merged into the "Recipe"/ProductIngredient table.  (I can cheat and just scale the recipe linearly, which isn't quite true, but who's counting? And besides, I can only deal with so much complexity at once!  Like, one starter consists of something like 7 parts water, 4 parts flour, 3 parts potato starch by volume). And starter is listed as an ingredient... Do I create a CTE/table-valued function to "explode" the ratio to an actual weight? (fairly simple) and then add the rest of the ingredients and do the same old Price * Quantity?

    (I swear, I had no clue this thing would get so complicated so fast!)

    thanks a million!

    Pieter

  • Martin,

    Oh, like thaaaaaat!!!  That's a lot to digest, but I have my Ralph Kimball secret decoder ring working. Makes sense!  now to figure out how to create a table function that returns the latest price for each ingredient (relative to the date in the evaluation context). It shouldn't be too bad, because the only time I would use a Slowly Changing Dimension in my report would be when I'm explaining the change in manufacturing cost.

    (While I'm at it, say I have a bunch of recurring business costs, like rent and gas and the like. I can sum those by quarter (say) and then divide by 13 to get a reasonable estimate of a weekly amount. Is that the right way to do it? I understand the numbers won't be absolutely perfect, but it'll be a lot closer than what he currently has.)

    "The design in itself is pretty simple, but your ETL transformations become very complex when tracking multiple changing attributes like this" (Martin's Blog - Temporal Fact Tables (martinschoombee.com).

    If I'm only tracking Product[SellPrice], Product[unitCost], Ingredient[PurchasePrice], I don't need crazy ETL...  all I need is a way to get the "current" price (according to the evaluation context, and if there isn't one, return the last price). (right?)

  • pietlinden wrote:

    Like, one starter consists of something like 7 parts water, 4 parts flour, 3 parts potato starch by volume). And starter is listed as an ingredient... Do I create a CTE/table-valued function to "explode" the ratio to an actual weight? (fairly simple) and then add the rest of the ingredients and do the same old Price * Quantity?

    It's difficult to say without spending more time on the context, but you will benefit from standardizing on the UOM as far as possible because you will not be able to aggregate across UOMs. In my mind "starter" needs to be the product and the constituent parts the ingredients.

  • pietlinden wrote:

    (While I'm at it, say I have a bunch of recurring business costs, like rent and gas and the like. I can sum those by quarter (say) and then divide by 13 to get a reasonable estimate of a weekly amount. Is that the right way to do it? I understand the numbers won't be absolutely perfect, but it'll be a lot closer than what he currently has.)

    That's a question for the business, because you may need to take seasonality into account. If there's a lot of seasonality, they may want to take the prior full year and divide by 52 to get a weekly.

     

    pietlinden wrote:

    If I'm only tracking Product[SellPrice], Product[unitCost], Ingredient[PurchasePrice], I don't need crazy ETL...  all I need is a way to get the "current" price (according to the evaluation context, and if there isn't one, return the last price). (right?)

    I would not attempt to do all of this in Power BI. The logic will become too complex too quickly, and my recommendation would be to use something outside of Power BI to perform the logic and persist the data.

    I'll say to you what I said to somebody else the other day: "You don't have a DAX problem, you have a modeling problem". At some point it's not worth trying to fake a complex ETL process in Power BI...it's not an ETL tool and will only get you so far.

    Hope this helps 🙂

  • Martin,

    I think I understand your answer. (Took a couple days!) Basically, the ProductIngredient table (ProductID, IngredientID, Weight...) factless fact handles the slowly changing dimension by pointing each new version of a Product to a new set of ingredients (at a new set of prices).

    Is there another really good data modeling book besides Data Warehouse Toolkit? (Or did I just gloss over important chapters without understanding them? Maybe I need a workbook to go with that book!)

    Thanks!

    Pieter

  • I'm not sure about the detail in the physical book as it's been a while since I've read it, but here is the official on the Kimball website: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/timespan-fact-table/

    As you can see, there isn't much detail there and that's the difficult part. It's not always that easy to bring the theory & application together, and the lines are sometimes blurred between the different concepts.

    In my opinion, it's easier to see your scenario as Bridge table that explains a M:M relationship...but because of the factual attributes it's better to treat it like a fact table and not just a Bridge table. That slight change in perspective makes everything easier to understand.

    And if we change the name of that Bridge table to "Recipe" maybe, it may be a little easier to understand from a conceptual point of view.

    The main point to understand is this: If you have a table that explains a relationship between two tables, and one of those tables (or both) is a type-2 dimension, then each type-2 change has to drive a change (new record(s)) in the table that explains the relationship as well.

    Second to that: Power BI is not the place to try and mimic type-2 dimensions. Your time and effort will be much better spent doing the logic outside of Power BI and persisting the data somewhere, whether that is a database or parquet file (if you're one of the cool kids). Recreating the history of a type-2 dimension during every refresh in Power BI is not efficient nor is it a good design.

    Hope this helps 🙂

     

  • but only ever want to aggregate the selling price of products by using an average across time...or the latest value which is similar to what you would have done if this was an inventory snapshot table

    Really the only ways I would want to look at selling price is to use it to determine simple margin, [SellPrice] - [TotalIngredientCost], and to basically compare it to Total Ingredient Cost (or Cost per Loaf) across time. The part that was giving me fits before that I *think* I understand now was how to graph the unit cost per ingredient across time for a loaf. (well, not exactly, but [unit cost of ingredient] * [recipe weight]). Then any costs where the ingredient price will be relatively flat and the ones that change more quickly will have a non-zero slope. So I'd have a line graph of each unit cost ( Price/Unit * Recipe Weight ), so I can see which ingredients are contributing most to the cost and how fast they're changing over time.

    The "recipe" table is basically a bridge table between Product and Ingredient with one attribute, Weight. That makes sense.

    How do I track for changes in Cost/Price? Just add a trigger to Product[UnitPrice] and Purchase[UnitCost] and maybe call a stored procedure from the trigger to (1) find all recipes that use that Ingredient, (2) generate a new version of the Product (increment primary key), (3) generate a new time-stamped version of the recipe to point to the new prices (so new {productkey, ingredientkey, weight} records in the bridge table).

    (Or did I completely miss something?)

    Thanks!

    Would reading "The Data Warehouse ETL Toolkit" help me? (Since his stuff is largely theoretical, it should still apply, right?)

    • This reply was modified 5 months, 1 week ago by  pietlinden.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply