Little challenge for Algorithm gurus

  • Hello everyone,

    To make my story shorter I will spare you the 'why' or the background info on the topic...

    I have several tables that contain the log of every price variation for different item types. For my example I will use 3 tables and limit the number of columns to what we need to worry about for this problem... In real life these table each have few million rows but we don't need that many right now... here is the script to create/populate the sample tables

    USE tempdb

    GO

    CREATE TABLE dbo.ItemAPriceLog(

    ItemAPriceID INT IDENTITY(1,1) NOT NULL PRIMARY KEY

    , Entered DATETIME NOT NULL DEFAULT (GETDATE())

    , Cost money NOT NULL

    -- , other columns are present, but no need to elaborate here

    )

    GO

    CREATE TABLE dbo.ItemBPriceLog(

    ItemBPriceID INT IDENTITY(1,1) NOT NULL PRIMARY KEY

    , Entered DATETIME NOT NULL DEFAULT (GETDATE())

    , Cost money NOT NULL

    -- , other columns are present, but no need to elaborate here

    )

    GO

    CREATE TABLE dbo.ItemCPriceLog(

    ItemCPriceID INT IDENTITY(1,1) NOT NULL PRIMARY KEY

    , Entered DATETIME NOT NULL DEFAULT (GETDATE())

    , Cost money NOT NULL

    -- , other columns are present, but no need to elaborate here

    )

    GO

    -- Insert dummy entries

    INSERT INTO dbo.ItemAPriceLog (Entered, Cost)

    SELECT DISTINCT DATEADD(mi, ROW_NUMBER() OVER(ORDER BY o.ID * c.Type DESC), o.crdate)

    , CONVERT(money, o.ID) * CONVERT(money, c.Type)

    FROM dbo.sysobjects o

    cross join dbo.syscolumns c

    WHERE o.id < 1000 -- Needed otherwise the script will return prices in the billions...

    GO

    -- Insert dummy entries

    INSERT INTO dbo.ItemBPriceLog (Entered, Cost)

    SELECT DISTINCT DATEADD(mi, ROW_NUMBER() OVER(ORDER BY o.ID * c.Type DESC) * 2, o.crdate)

    , CONVERT(money, o.ID) * CONVERT(money, o.UID) * CONVERT(money, c.xprec)

    FROM dbo.sysobjects o

    cross join dbo.syscolumns c

    WHERE o.id < 1000 -- Needed otherwise the script will return prices in the billions...

    GO

    -- Insert dummy entries

    INSERT INTO dbo.ItemCPriceLog (Entered, Cost)

    SELECT DISTINCT DATEADD(mi, ROW_NUMBER() OVER(ORDER BY o.ID * c.TYPE * o.UID DESC) + o.ID + o.UID, o.crdate)

    , CONVERT(money, o.ID) * CONVERT(money, c.Type) + CONVERT(money, o.UID) * CONVERT(money, c.Type)

    FROM dbo.sysobjects o

    cross join dbo.syscolumns c

    WHERE o.id < 1000 -- Needed otherwise the script will return prices in the billions...

    GO

    I am working on a script that will populate some sort of "price mapping" table.

    Each row will have a foreign key to the PriceID of each one of the price log tables, as well as an IDENTITY column and a datetime column containing the date/time the row prices were effective as of.

    The table should contain one row for each time the price of one of the item changed.

    Once the table is fully populated someone can easily query it to find out what were the prices of each item for any given time. (the alternative being to query each one of the individual table with a WHERE clause based on the 'Entered' value...)

    Many reports will depend on the IDENTITY column of that mapping table...

    Long story short the mapping table would look like something like this:

    CREATE TABLE dbo.PriceInstances(

    InstanceID INT IDENTITY(1,1) NOT NULL PRIMARY KEY

    , ItemAPriceID INT NOT NULL CONSTRAINT FK_PriceInstances_ItemAPriceLog FOREIGN KEY REFERENCES dbo.ItemAPriceLog(ItemAPriceID)

    , ItemBPriceID INT NOT NULL CONSTRAINT FK_PriceInstances_ItemBPriceLog FOREIGN KEY REFERENCES dbo.ItemBPriceLog(ItemBPriceID)

    , ItemCPriceID INT NOT NULL CONSTRAINT FK_PriceInstances_ItemCPriceLog FOREIGN KEY REFERENCES dbo.ItemCPriceLog(ItemCPriceID)

    , EffectiveDateTime DATETIME NOT NULL

    )

    GO

    The challenge is to figure out an algorithm that will populate that table without relying on slow cursors or loops... It will run nightly as a job on a reporting database server.

  • Is the "EffectiveDate" in PriceInstances the same as the "DateEntered" in the other three tables?

    If so, and you have a DateEntered of 17 July 08 for ItemA, and a DateEntered of 16 July for ItemB, and a DateEntered of 1 Jan 08 for ItemC, and various other dates between 1 Jan and 17 July for ItemA and ItemB, what does it need to do?

    Does PriceInstance end up with an entry with an EffectiveDate of 17 July 08, with the ItemA price from 17 July, the ItemB price from 16 July, and the ItemC price from 1 Jan?

    On the 16th, when ItemB changed prices, does it get an entry with 16 July EffectiveDate and the price for ItemB from 16 July, the price for ItemA from whatever is most recent before that, and the price for ItemC from 1 Jan?

    Or, alternately, does it get an entry at the beginning of each day, with the then-current price for each item?

    On another note, why separate price log tables for each item? Why not an "ItemID" column in a generic price log all in one table?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • FYI I just changed the script that populates the 3 price table a little bit that way it generates entries that are a little bit more realistic...

    Thanks for your reply,

    The reason why the price log table are all different is a long story... and it's outside my control. I could import all the data in a single table for the purpose of my project but it wouldn't help me in finding what all the prices were for each item type at any given time.

    The EffectiveDateTime in PriceInstances will end up being equal to the most recent 'Entered' value between the 3 PriceLog tables... I know what I am explaining sounds like crazy non sense... so here is a concrete example... It should all make more sense after you read through it...

    -- Populate the tables

    delete from dbo.ItemAPriceLog

    delete from dbo.ItemBPriceLog

    delete from dbo.ItemCPriceLog

    delete from dbo.PriceInstances

    SET IDENTITY_INSERT dbo.ItemAPriceLog ON

    INSERT INTO dbo.ItemAPriceLog (ItemAPriceID, Entered, Cost)

    SELECT 1, '6/6/2004 10:28:34', 11.22

    UNION ALL SELECT 2, '6/6/2004 11:05:44', 11.11

    SET IDENTITY_INSERT dbo.ItemAPriceLog OFF

    SET IDENTITY_INSERT dbo.ItemBPriceLog ON

    INSERT INTO dbo.ItemBPriceLog (ItemBPriceID, Entered, Cost)

    SELECT 1, '6/6/2004 10:25:44', 7.70

    UNION ALL SELECT 2, '6/6/2004 10:26:11', 7.60

    UNION ALL SELECT 3, '6/6/2004 10:44:22', 7.88

    UNION ALL SELECT 4, '6/6/2004 11:00:55', 7.77

    UNION ALL SELECT 5, '6/6/2004 11:45:12', 7.10

    SET IDENTITY_INSERT dbo.ItemBPriceLog OFF

    SET IDENTITY_INSERT dbo.ItemCPriceLog ON

    INSERT INTO dbo.ItemCPriceLog (ItemCPriceID, Entered, Cost)

    SELECT 1, '6/6/2004 10:25:48', 10.25

    UNION ALL SELECT 2, '6/6/2004 11:25:14', 10.33

    UNION ALL SELECT 3, '6/6/2004 13:5:22', 10.10

    SET IDENTITY_INSERT dbo.ItemCPriceLog OFF

    -- Populate PriceInstances table based on what we've got in the individual price log tables

    -- FYI: Modify the table structure of PriceInstances to allow for NULL values in the ItemPrice columns...

    INSERT INTO dbo.PriceInstances (ItemAPriceID, ItemBPriceID, ItemCPriceID, EffectiveDateTime)

    -- At first all we've got is a price for ItemB

    SELECT NULL, 1, NULL, (SELECT Entered FROM dbo.ItemBPriceLog WHERE ItemBPriceID = 1)

    -- A new price instance is inserted once the price of Item C changes

    UNION ALL SELECT NULL, 1, 1, (SELECT Entered FROM dbo.ItemCPriceLog WHERE ItemCPriceID = 1)

    -- Price of Item B changes... no change with Item C and we still don't have a price for ItemA

    UNION ALL SELECT NULL, 2, 1, (SELECT Entered FROM dbo.ItemBPriceLog WHERE ItemBPriceID = 2)

    -- Price for ItemA has changed...

    UNION ALL SELECT 1, 2, 1, (SELECT Entered FROM dbo.ItemAPriceLog WHERE ItemAPriceID = 1)

    -- Price for ItemB has changed

    UNION ALL SELECT 1, 3, 1, (SELECT Entered FROM dbo.ItemBPriceLog WHERE ItemBPriceID = 3)

    -- Price for ItemB has changed again

    UNION ALL SELECT 1, 4, 1, (SELECT Entered FROM dbo.ItemBPriceLog WHERE ItemBPriceID = 4)

    -- Price for ItemA has changed

    UNION ALL SELECT 2, 4, 1, (SELECT Entered FROM dbo.ItemAPriceLog WHERE ItemAPriceID = 2)

    -- Price for ItemC has changed

    UNION ALL SELECT 2, 4, 2, (SELECT Entered FROM dbo.ItemCPriceLog WHERE ItemCPriceID = 2)

    -- Price for ItemB has changed

    UNION ALL SELECT 2, 5, 2, (SELECT Entered FROM dbo.ItemBPriceLog WHERE ItemBPriceID = 5)

    -- Price for ItemC has changed

    UNION ALL SELECT 2, 5, 3, (SELECT Entered FROM dbo.ItemCPriceLog WHERE ItemCPriceID = 3)

  • Here's how I'd solve this:

    Create a Numbers table.

    Use that to create a Calendar table, with all dates in the range you're querying against.

    Do a From-To self-join for each log table, by selecting the date and the next entry's date.

    Do a join between the From-To record and the calendar table, so you have the entry effective for all dates for each item.

    Join those together, and join that to the original data. That will give you a list of all items, what prices were effective, on the dates in each table.

    Insert that into the final table.

    Is that clear enough, or should I try to write up samples of what I mean?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Select Entered as EffectiveDateTime

    , (select top (1) ItemCPriceID from ItemCPriceLog C where dte.Entered >= C.Entered order by c.Entered ) IDC

    , (select top (1) ItemBPriceID from ItemBPriceLog B where dte.Entered >= B.Entered order by B.Entered ) IDB

    , (select top (1) ItemAPriceID from ItemAPriceLog A where dte.Entered >= A.Entered order by A.Entered ) IDA

    FROM(

    select Entered from ItemCPriceLog union

    select Entered from ItemBPriceLog union

    select Entered from ItemAPriceLog

    ) dte


    * Noel

  • Note: To speed this up you should have indices on all Datetime Columns!

    CREATE NONCLUSTERED INDEX IX_ItemCPriceLog ON ItemCPriceLog( Entered )

    CREATE NONCLUSTERED INDEX IX_ItemBPriceLog ON ItemBPriceLog( Entered )

    CREATE NONCLUSTERED INDEX IX_ItemAPriceLog ON ItemAPriceLog( Entered )

    You might as well retrieve Cost insted of the ID and then no joins will be necessary.


    * Noel

  • Thanks to both of you, very clever solutions 🙂 I knew there would be an easier way then what I started doing on my side before posting this...

    clustered indexes -> The tables have those already, both I haven't mentioned them up there

    as for getting the 'costs' instead of the ID and then do the joins... this would be good but in fact I have few other columns in each one of these sub tables (i.e. the UserID of the person who entered the new price value, etc...) so I need to maintain the relationship

    thanks again for the solution!

  • small change to the script proposed,

    to get what I need in the right order the code has to be...

    SELECT Entered AS EffectiveDateTime

    , (SELECT TOP (1) ItemAPriceID FROM ItemAPriceLog A WHERE A.Entered <= dte.Entered ORDER BY A.Entered DESC) IDA

    , (SELECT TOP (1) ItemBPriceID FROM ItemBPriceLog B WHERE B.Entered <= dte.Entered ORDER BY B.Entered DESC) IDB

    , (SELECT TOP (1) ItemCPriceID FROM ItemCPriceLog C WHERE C.Entered <= dte.Entered ORDER BY C.Entered DESC) IDC

    FROM(

    SELECT Entered FROM ItemCPriceLog UNION

    SELECT Entered FROM ItemBPriceLog UNION

    SELECT Entered FROM ItemAPriceLog

    ) dte

    ORDER BY Entered

    Works perfectly!

    thanks again

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply