Log Table - Determining Old Value vs New Value

  • Hi guys

    Wondering if someone can help me with performance tuning a query I've written.

    My query

    1. pulls a dataset from a log table into a temp table

    2. applies a row number for each record set to determine the order that the data arrived

    3. uses a while loop to iterate through each field that is different from the next row (this includes a nested while loop as there could be more than one difference within each row)

    4. using dynamic sql, gets the value of each field that was different from the previous and stores these values in another temp table

    I'm just wondering if I'm missing a much simpler way to run this using windowing functions as opposed to the dynamic sql. This code was working fine for small datasets but it's currently crawling when it's hitting anything over 30k records.

    The only other option I can think of is to change the format of the log table to store the data as Old Value and New Value for each of the fields we want to check on (5 fields altogether).

  • Are you familiar with the LAG and LEAD functions? I think you could use them to avoid the loops.

    John

  • First I've heard of them john, any good articles you'd recommend or would a google suffice?

  • The examples in Books Online are a good place to start. Here's an illustration (not tested) that I cobbled together. You might also want to bear in mind that storing the old values will double your disk requirements and probably affect performance as well. You can get the old value from the previous row in the table. Write a view if you need to display old and new values side by side.

    WITH OldandNew AS (

    SELECT

    RowNo AS RowNumber

    ,LAG(RowNo,1) OVER (ORDER BY RowNo) AS PreviousRowNumber

    ,Col1 AS Newcol1

    ,LAG(col1,1) OVER (ORDER BY RowNo) AS OldCol1

    ,Col2 AS Newcol2

    ,LAG(col2,1) OVER (ORDER BY RowNo) AS OldCol2

    FROM MyTable

    )

    SELECT

    RowNumber

    ,PreviousRowNumber

    ,Newcol1

    ,OldCol1

    ,Newcol2

    ,OldCol2

    FROM OldandNew

    WHERE NewCol1 <> OldCol1

    OR NewCol2 <> OldCol2

    John

  • 1) Get a copy of Itzik Ben-Gan's Windowing Function book.

    2) Fabiano Amorim did a great blog series on Simpletalk.com on WFs

    3) You shouldn't need to put the data into a temp table. Huge waste of effort. Just do your work on the actual table, which has to have either an identity or a datetime added (or both) that you can use to process the records in proper order.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks guys for the replies

    John - I got it working using the method you outlined, I used LEAD instead of LAG

    Massive improvement on the performance too

    Now I just need to check the data....oh the joys

    SQLGuru - Great minds think alike, you were the second person to suggest Itzik Ben-Gan's stuff on this, I hadn't done my research but he looks like the authority on all such things

  • mitzyturbo (11/1/2016)


    Thanks guys for the replies

    John - I got it working using the method you outlined, I used LEAD instead of LAG

    Massive improvement on the performance too

    Now I just need to check the data....oh the joys

    SQLGuru - Great minds think alike, you were the second person to suggest Itzik Ben-Gan's stuff on this, I hadn't done my research but he looks like the authority on all such things

    This part of John's suggested code does not handle NULLs:

    WHERE NewCol1 <> OldCol1

    OR NewCol2 <> OldCol2

    If your data may contain NULLs, I suggest that you consider using this construct instead, which does:

    WHERE EXISTS

    (

    select NewCol1, NewCol2

    except

    select OldCol1, OldCol2

    )

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Cluster the log table on the relevant arrival datetime that you are comparing by (rather than, presumably, a meaningless identity value). Then query directly from the table, joining to itself using the clustering key to do a seek to find the row to join to.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks Phil

    I've tweaked the code slightly so that there are no NULL values as I was thinking the exact same thing.

    I'm currently working on getting rid of the temp tables altogether and wrapping it all into one select.

    Unfortunately the original developers of the database, in their wisdom, split one entity into two different log tables so I need to join / merge my log data set and apply a row number per modified date

    For now, using the temp tables isn't too bad, 60k rows being returned in less than 20 seconds which is a huge improvement

  • Properly clustering should still allow a merge join, which should be much faster than anything you're having to do now.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • As I have 2 log tables to pull data from, how would this work?

    do you mean by just having a clustered index by the modified date on both log tables that the select will still run quicker?

  • mitzyturbo (11/1/2016)


    As I have 2 log tables to pull data from, how would this work?

    do you mean by just having a clustered index by the modified date on both log tables that the select will still run quicker?

    Yes, exactly. I would think the "date added" would be sequential on each table, so fragmentation would not be an issue. [But some minor fragmentation is not that big a deal anyway.] Be sure to review the fillfactor on the table also, and make sure it is at least 98 or 99, assuming since this is a history table that you don't modify rows.

    But by being clustered on the date, SQL would "know" the rows were in order, and it could do a very efficient join on the two tables, including to a prior row, if needed.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply