Create a delta from downloaded info

  • I'm looking for some suggestions on how to create a delta file that will some me differences from yesterdays downloaded information

    versus today's. The table has about 25 fields, but no mod_date in the data, and any one of 10-15 could have changed. We are currently doing a full replace from Staging table to the Live table, now they are wanting to only see changes.

    Thanks.

  • Bruin wrote:

    I'm looking for some suggestions on how to create a delta file that will some me differences from yesterdays downloaded information versus today's.

    1. Write a query to identify the differences between the data in the tables
    2. Output the results of the query to a file

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Can that be driven from information schema tables so I don't have to hard code field names do the compares, how

    about the decimal and float values?

    any examples?

     

    Thanks.

     

     

  • Yeah, EXCEPT should to that just fine:

    SELECT * /*...list_of_cols...*/

    FROM dbo.today

    EXCEPT

    SELECT * /*...list_of_cols...*/

    FROM dbo.yesterday

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • What does your desired output look like? Is a list of all the source rows which are different good enough? Or do you want to see differences at the data-item level (eg, row id <x>, column <n> old value: <old val>, new value: <new val>)?

    It may be obvious, but Scott's SELECT * EXCEPT idea works only as long as the columns in both tables match exactly.

    • This reply was modified 3 years, 5 months ago by  Phil Parkin.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • I would like to see diffs at the data-item level, this would help in the validation process.

    Thanks.

  • compare table values and create a file with diff's.

  • any examples of a compare script and output diff's?

     

    Thanks.

  • Bruin wrote:

    any examples of a compare script and output diff's?

    Thanks.

    There are many ways to display the "Diffs".  Please provide and explicit example of what you'd like to see for your "Diff" report.  And, when you do, remember that there really needs to be some form of key that guarantees uniqueness of each row or this is going to be mostly impossible to accomplish.  Remember that EXCEPT only show you where there are no matches and so does not show you a Before'n'After.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My table as a PK(unique) of:

    [InvoiceNbr] ASC,

    [Line] ASC,

    [SubLine] ASC,

    [InvoiceDate] ASC,

    [ShipDate] ASC

    The process that creates the file to load to SQL does a FULL refresh of the data, meaning it does a MTD pull each time it executes. The diff file needs to display the PK along with any values on the record that differ from yesterday's pull of the information. It needs to be dynamic in case fields are added or deleted from the table.

    Thanks.

  • Bruin wrote:

    My table as a PK(unique) of:

    [InvoiceNbr] ASC, [Line] ASC, [SubLine] ASC, [InvoiceDate] ASC, [ShipDate] ASC

    The process that creates the file to load to SQL does a FULL refresh of the data, meaning it does a MTD pull each time it executes. The diff file needs to display the PK along with any values on the record that differ from yesterday's pull of the information. It needs to be dynamic in case fields are added or deleted from the table.

    Thanks.

    It seems like you're telling me that an invoice date can change?  That's not going to make for a Martha Stuart moment.  I seems to me that the InvoiceNbr, Line, and Subline would be the only things need to make a unique row.

    Heh... and you're description of what is needed to find the differences has always been understood but you really need to describe how you want them displayed.  A graphic would be good for that.  It would really be nice if you also provided some readily consumable data even if it's just for the PK and a couple of other columns.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Bruin wrote:

    My table as a PK(unique) of:

    [InvoiceNbr] ASC, [Line] ASC, [SubLine] ASC, [InvoiceDate] ASC, [ShipDate] ASC

    The process that creates the file to load to SQL does a FULL refresh of the data, meaning it does a MTD pull each time it executes. The diff file needs to display the PK along with any values on the record that differ from yesterday's pull of the information. It needs to be dynamic in case fields are added or deleted from the table.

    Thanks.

    I would move away from the idea that this should be dynamic - if a new column is added in the source system and you want that data to be considered in the destination, then the column must be created in the destination and populated *exactly* the same way it was populated in the source system.  That is a required change and as part of that change request you can and should update any downstream processes (like this one).

    That means you can easily setup and use EXCEPT to identify changes from the source to the target - specifying each column to be compared.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • small sample of data, and the output desired.

    Row 2 is written as a delta to a new table since acctgrp and channel changed.

    Output:

    Old:display yesterday's record

    New:display today's record

    Just so I have an audit of data changes..

    Invoicenbr,line,subline,invoicedate,shipdate,acctgrp,channel,qtyshipped
    78123,1,0,8-13-2021,8-10-2021,1002,rrb,800 -- Yesterday's data
    78123,1,0,8-13-2021,8-10-2021,1004,xzb,800 -- Today's data

     

     

     

  • Bruin wrote:

    small sample of data, and the output desired.

    Row 2 is written as a delta to a new table since acctgrp and channel changed.

    Output: Old:display yesterday's record New:display today's record

    Just so I have an audit of data changes..

    Invoicenbr,line,subline,invoicedate,shipdate,acctgrp,channel,qtyshipped
    78123,1,0,8-13-2021,8-10-2021,1002,rrb,800 -- Yesterday's data
    78123,1,0,8-13-2021,8-10-2021,1004,xzb,800 -- Today's data

    Heh... ok.  That's just "whole row" auditing and it's pretty simple to do.  It would be nice if you had done the rest to make this easy.  Please see the first link (again... I've asked you before) in my signature line below for what I'm talking and how it would really save you and the people that are trying to help you a shedload of time... especially YOU. 😉

    So post some readily consumable data to help us help you so we can write some tested code instead of playing multiple "tag you're it" posts.  I recommend you post a couple of lines that haven't changed in both, some that have, and some that have only been added to one, and some that have only been deleted from one.

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is confusing - it really isn't clear what you want to accomplish.  I am thinking you might want a temporal table - but since you haven't provided the requested data it really isn't clear.

    With a temporal table - you would extract the full data from the source system, then use EXCEPT to identify rows that have changed or rows that are new.  Using that set of data you can then update the existing rows with the new data and insert the new rows.  The temporal table will create a history row when you update the existing data and you can then query for old/new values and when they changed.

    If that is what you are trying to accomplish - please provide the requested data.  This will need some existing data - some new rows to be inserted and some rows to be updated, in the form of create and insert statements.  A final table and a staging table...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply