Generating (semi-)fake purchase data

  • I know how to generate fake data (thanks to Jeff Moden's articles!)... the problem is that I'm trying to generate purchase data where the unit price of a product increases on some products over time.  Normally, I'd get the changing price information from all the purchase invoices, but I don't have those. So I have to make it up. I was thinking I could do something like select random(ish) records and use a systems table to get just bulk records (that part is fine), but how do I fake price increases (where the changes aren't completely fake, because grains, for example, rise by 50% or more over time.)

    I can create dummy values... maybe my multiplier is screwy?

    use bakery;
    go

    DECLARE @Range INT = 500;
    DECLARE @StartValue DECIMAL(5,2);

    /* some ingredients are only purchased every 6 weeks (grains) */
    /* add a 1/0 random number to determine whether to apply the RT_FakePct */

    /* JOIN ALL OF THIS TO vwBaseIngredientRecipe */

    DECLARE @FirstBakeEventDate DATE;
    SELECT @FirstBakeEventDate = MIN(BakeDate) FROM BakeEvents;

    WITH cteWeeklyPrices(IngredientID, IngredientName, WeekNo, NewPrice)
    AS (
    SELECT x.IngredientID
    ,x.IngredientName
    ,x.WeekNo
    --,x.UnitPrice
    --,x.FakePct
    --,RT_fakePct = 1 + SUM(x.NewChangePct) OVER (PARTITION BY x.IngredientID ORDER BY x.WeekNo)
    --,x.ChangeY
    --,x.NewChangePct
    ,NewPrice2 = CAST(ROUND(x.UnitPrice * (1 + SUM(x.FakePct) OVER (PARTITION BY x.IngredientID ORDER BY x.WeekNo)),2) AS DECIMAL(5,2))
    FROM
    (
    SELECT i.IngredientID
    ,i.IngredientName
    ,i.UnitPrice
    ,PerKgPrice = i.UnitPrice/(i.IngredientWeight * 1000) /* convert from Kg to g */
    ,t.N AS WeekNo
    --,t.Num AS WeekNo
    ,FakePct = ABS(CHECKSUM(NEWID()))%100 / 10000.0
    ,ChangeY = ABS(CHECKSUM(NEWID())%2) /* basically T/F to decide whether to apply the change */
    ,NewChangePct = IIF(ABS(CHECKSUM(NEWID())%2) = 0, 0, ABS(CHECKSUM(NEWID()))%10 / 10000.0 ) --was%100
    --, NewPrice = (1 + ABS(CHECKSUM(NEWID()))%100 / 10000.0) * i.UnitPrice
    , NewPrice = i.UnitPrice * (1 + IIF(ABS(CHECKSUM(NEWID())%2) = 0, 0, ABS(CHECKSUM(NEWID()))%100 / 10000.0 ))
    FROM Ingredient i
    --CROSS JOIN Scratch.dbo.Tally t
    CROSS APPLY Scratch.dbo.fnTally(1,76) t
    WHERE i.IngredientID NOT IN (17,62) /* Foil pans, plastic bags */
    ) x
    )
    SELECTbw.WeekNo
    ,bw.LoafKey
    ,bw.LoafQty
    ,ri.Yield
    ,ri.IngredientName
    ,ri.RecipeWeight
    ,wp.NewPrice
    FROM
    (SELECT LoafKey,
    LoafQty,
    WeekNo = 1 + DATEDIFF(week,@FirstBakeEventDate,BakeDate)
    FROM BakeEvents be) bw
    INNER JOIN cteWeeklyPrices wp
    ON bw.WeekNo = wp.WeekNo
    INNER JOIN vwBaseIngredientRecipe ri
    ON ri.RecipeID = bw.LoafKey
    ORDER BY LoafKey, WeekNo;

    (I was trying to isolate the fake increases in the Common Table Expression... just to keep things from being hideous. The problem isn't that they don't increase, it's that they increase kind of monotonically instead of more randomly as I would like (for things besides grains, that's okay, but for grains the price changes are more dramatic).

    I was thinking I could just generate a set of positive random values (easy enough) and then do a running total over Weeks and then multiply, but how do I increase the randomness?  ("That's unpossible! There's no such thing as "semi-random"! Something is true or it's not!)  Well, increase for some groups of ingredients. (Like the "flour increases in price by 50% in 18 months part.)

    Sorry, but I think I'm just missing the blindingly obvious. =(

    Pieter

  • You're already generating a series using dbo.fnTally so maybe raise fn.N by an exponent power.  Either POWER or EXP function

    declare @increase_pct       decimal(4,3)=0.025;

    select *,
    power(1+@increase_pct, fn.n) compound_trend,
    exp(@increase_pct * fn.n) AS continuous_compound_trend
    from dbo.fnTally(1, 10) fn;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thanks Steve! Like I said, I felt like I was missing something that should have been blatantly obvious! Thanks for clarifying!

    So this is my plan (after staring at this for a while). Since each (IngredientID, PurchaseDate) combination returns a table of  new price values, I can just use TOP(1) ORDER BY NEWID() in my APPLY to get a sort of random new price.

    I can basically use one of the random number generator tricks with @increase_pct to get a new value within a constricted range, so I may do that. I just don't want all of the increases to be perfectly linear.

  • Hi pietlinden, maybe you want to generate random prices for a base price of 50, plus or minus 5.  Or maybe you want a base price of 50, plus or minus 5 where the +/- increases over time.  Or maybe you want a base price of 50, plus or minus 5 where +/- increases over time and the base price increases over time as well.  I'm not sure why you need to generate the rows tho

    declare
    @base_price decimal(14,4)=50.0000,
    @random_plus_minus decimal(14,4)=05.0000,
    @weeks int=52,
    @increase_pct decimal(5,4)=0.12/52;

    select fn.n, v.random_price_change, v.trend, v.trend-v.trend_lag diff_trend,
    cast((v.random_price_change-@random_plus_minus)*v.trend as decimal(14,4)) random_price_diff,
    cast(@base_price+(v.random_price_change-@random_plus_minus) as decimal(14,4)) constant_baseline_with_with_constant_random_difference_in_price,
    cast(@base_price+(v.random_price_change-@random_plus_minus)*v.trend as decimal(14,4)) constant_baseline_with_increasing_random_difference_in_price,
    cast(@base_price*v.trend+(v.random_price_change-@random_plus_minus)*v.trend as decimal(14,4)) increasing_baseline_with_increasing_random_difference_in_price
    from dbo.fnTally(1, @weeks) fn
    cross apply (values (1 + (abs(checksum(newid())) % (cast(@random_plus_minus * 2 as int))),
    exp(@increase_pct * fn.n),
    exp(@increase_pct * (fn.n - 1)))) v(random_price_change, trend, trend_lag);

     

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • I'm trying to simulate price changes of some of my ingredients over time. (Not really necessary to change the prices of more than just some of the most commonly used ingredients to see how the changes ripple through all the products being made.) And that's what I'm trying to do -- like simulating the 50% increase in flour prices in the space of 18 months. I guess I could just make it somewhat linear...

    So basically, I'd do that and then if I'm smart, maybe use a system-versioned table, and use that to build my slowly changing Ingredient dimension, and then use LAG() to find ingredients that have changed price, and then create new versions of the "recipe" or "ProductIngredient" table and new versions of Product so that I can track the change in ingredient cost and the corresponding change in Product cost and corresponding drop in margin. (Does that answer your question?)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply