How to create a delta result-set by looking at a date_edited column

  • Hello all,

    I have a stored procedure that looks at a user profile table, to create a list of users and their preferences in a single row

    The format is: 


    user_id    instrument_preferences    style_preferences
    811LR3GJUTH3FSSS90VTGWHCU6C0     Clarinet, Voice     Show/Broad, Folk

    Then using SSIS, I call the stored procedure and dump the data in a flat file to then transfer via sftp.

    I have now to figure out, after the first run, to only add to the first flat file only the user records that have been changed. The user profile table contains a column named: date_edited, this column can be either NULL, or have a date value in it. I am not concerned about how far in the past this value is, it is a pretty simple requirement since the actual preferences table does not have a date changed column, so I am only using the value from the user table. 

    I was wondering if I could get a recommendation on how to do this? Is it something I should do inside the SSIS package, or is it something I should do in the stored procedure. 

    If someone could offer a helping hand, that would be awesome.

    I am including a copy of my stored procedure as is:


    ALTER PROCEDURE [dbo].[UserPreferences]
    AS
    BEGIN

            -- Create main table
            CREATE TABLE [#UserPreferences]
            (
             [ID] INT IDENTITY(1, 1) NOT NULL
            , [user_id] CHAR(32) NOT NULL
            , [instrument] VARCHAR(100) NULL                        
            , [style] VARCHAR(100) NULL                        
            , [time_added] DATETIME
            )
        
      SET NOCOUNT ON;
     
            INSERT INTO[#UserPreferences] ([user_id], [instrument], [style], [time_added])    
     
            SELECT DISTINCT
             [p].[user_id]
            , [pi].[instrument]
            , [ps].[style]
            , GETDATE()
            FROM [dbo].[profile] [p]
            INNER JOIN (SELECT DISTINCT
                         .[PID]
                        , .[instrument_id]
                        , [ir].[description] [instrument]
                        , ROW_NUMBER() OVER ( PARTITION BY .[PID] ORDER BY .[instrument_id] ) [RN]
                        FROM [dbo].[profile_instruments]
                        INNER JOIN [dbo].[instrument_ref] [ir]
                            ON [ir].[instrument_id] = .[instrument_id]
                        WHERE .[customer_data] = 1) [pi]
                ON [pi].[PID] = [p].[PID]
            FULL OUTER JOIN (SELECT DISTINCT
                             .[PID]
                             , .[style_id]
                             , [sr].[description] [style]
                             , ROW_NUMBER() OVER ( PARTITION BY .[PID] ORDER BY .[style_id] ) [RN]
                             FROM [dbo].[profile_styles]
                             INNER JOIN [dbo].[styles_ref] [sr]
                                ON [sr].[style_id] = .[style_id]
                            WHERE .[customer_data] = 1) [ps]
                ON [ps].[PID] = [p].[PID]
                AND [pi].[RN] = [ps].[RN]
            WHERE [p].[date_created] > DATEADD(yy, -2, GETDATE())
                AND [p].[customer_data] = 1
            ORDER BY [p].[user_id];
            
            -- Now reformat to single record per user
            SELECT
             rsp.[user_id]
            , STUFF(( SELECT ', ' + CAST([rsp_i].[instrument] AS VARCHAR(10))
                             FROM [#UserPreferences] [rsp_i] WITH (NOLOCK)
                             WHERE [rsp_i].[user_id] = [rsp].[user_id]
                  FOR
                      XML PATH('')), 1, 1, '') [instrument_preferences]
            , STUFF(( SELECT ', ' + CAST([rsp_s].[style] AS VARCHAR(10))
                             FROM [#UserPreferences] [rsp_s]
                             WHERE [rsp_s].[user_id] = [rsp].[user_id]
                  FOR
                      XML PATH('')), 1, 1, '') [style_preferences]
            FROM [#UserPreferences] [rsp]
            GROUP BY [rsp].[user_id];

    AND a bit more output data :


    shopper_id    instrument_preferences    style_preferences
    81WR3GJUTH3FSSS90VTGW1HCU6C0     Clarinet, Voice     Show/Broad, Folk
    J5CM9V4L4H6SH1RBGRJNRJPRUSB7     Guitar, Piano     Jazz, Pop
    QJKXNV484SGYZYXVN35WCTV3UD59     Guitar     Country
    3C15HYT9R6MFUVHZ76UUDUV8LYZX     Guitar, Piano, Voice     Movie/TV, Pop, Show/Broad
    3324CUY6ZNFY3NDW547H42BXGN6P     Guitar, Piano     Classical, Movie/TV

    Thanks a lot for your help.

  • If you're going to perform this function on a regular basis, then you need to have an initial run that will likely be different than the ongoing runs.   As an example, if the first time you run it, you simply limit the records coming from the user profile table to those with a NOT NULL value for the date edited field, then you will get a record for all user profile table records that have changed since the table was created.   However, having done that once, you may then need to be sure that you don't include those records again until they make another change.   The only way to do that is to then change the sproc to only look for records where the date edited field is > the last run date of the procedure.   If this is scheduled to run, then you just need to keep track of the run date and keep that value somewhere that your sproc can get to so that it can be used in the query.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Monday, July 10, 2017 12:21 PM

    If you're going to perform this function on a regular basis, then you need to have an initial run that will likely be different than the ongoing runs.   As an example, if the first time you run it, you simply limit the records coming from the user profile table to those with a NOT NULL value for the date edited field, then you will get a record for all user profile table records that have changed since the table was created.   However, having done that once, you may then need to be sure that you don't include those records again until they make another change.   The only way to do that is to then change the sproc to only look for records where the date edited field is > the last run date of the procedure.   If this is scheduled to run, then you just need to keep track of the run date and keep that value somewhere that your sproc can get to so that it can be used in the query.

    Hello Steve, and thank you for your explanation. Where would I need to store the value of 'last_run_date'. Also if in the first original run I only select the records that are not null on the date_edited field, when will I add the other records, I also need to include the ones that date_edited is null. I hope this makes sense, once again, thank you.

  • itortu - Monday, July 10, 2017 1:20 PM

    sgmunson - Monday, July 10, 2017 12:21 PM

    If you're going to perform this function on a regular basis, then you need to have an initial run that will likely be different than the ongoing runs.   As an example, if the first time you run it, you simply limit the records coming from the user profile table to those with a NOT NULL value for the date edited field, then you will get a record for all user profile table records that have changed since the table was created.   However, having done that once, you may then need to be sure that you don't include those records again until they make another change.   The only way to do that is to then change the sproc to only look for records where the date edited field is > the last run date of the procedure.   If this is scheduled to run, then you just need to keep track of the run date and keep that value somewhere that your sproc can get to so that it can be used in the query.

    Hello Steve, and thank you for your explanation. Where would I need to store the value of 'last_run_date'. Also if in the first original run I only select the records that are not null on the date_edited field, when will I add the other records, I also need to include the ones that date_edited is null. I hope this makes sense, once again, thank you.

    You could create a table to hold the "last_run_date" value, and when the stored procedure runs, it could update that value.   As to when to include records that have a date_edited value that is NULL, I made an assumption based on your initial post that if the record hadn't changed, there would be no need to ever provide those records, but if that's not the case, then providing those on the initial run might make the most sense, as then you could just not filter out ANY of the records, and only the ongoing process would filter based on the date_edited being > last_run_date.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I would recommend storing the date_edited column in the target data too.
    Then you can run a query to get max(date_edited) from the target table and use that date to select only those rows from source where date_edited >= max(date_edited).
    Run those results into a staging table and MERGE into your destination.

    But, this assumes that either
    1) No additional rows are created in the source data, or
    2) Any additional rows created in the source data will have date_edited set to creation date.

    Otherwise the new rows will never get selected.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • sgmunson - Monday, July 10, 2017 1:31 PM

    itortu - Monday, July 10, 2017 1:20 PM

    sgmunson - Monday, July 10, 2017 12:21 PM

    If you're going to perform this function on a regular basis, then you need to have an initial run that will likely be different than the ongoing runs.   As an example, if the first time you run it, you simply limit the records coming from the user profile table to those with a NOT NULL value for the date edited field, then you will get a record for all user profile table records that have changed since the table was created.   However, having done that once, you may then need to be sure that you don't include those records again until they make another change.   The only way to do that is to then change the sproc to only look for records where the date edited field is > the last run date of the procedure.   If this is scheduled to run, then you just need to keep track of the run date and keep that value somewhere that your sproc can get to so that it can be used in the query.

    Hello Steve, and thank you for your explanation. Where would I need to store the value of 'last_run_date'. Also if in the first original run I only select the records that are not null on the date_edited field, when will I add the other records, I also need to include the ones that date_edited is null. I hope this makes sense, once again, thank you.

    You could create a table to hold the "last_run_date" value, and when the stored procedure runs, it could update that value.   As to when to include records that have a date_edited value that is NULL, I made an assumption based on your initial post that if the record hadn't changed, there would be no need to ever provide those records, but if that's not the case, then providing those on the initial run might make the most sense, as then you could just not filter out ANY of the records, and only the ongoing process would filter based on the date_edited being > last_run_date.

    Your last understanding of my likely poorly explained original post, is correct. I need to include all records in the initial run,and then after that, use the date_edited > last_run_date to filter.

  • itortu - Monday, July 10, 2017 3:02 PM

    sgmunson - Monday, July 10, 2017 1:31 PM

    itortu - Monday, July 10, 2017 1:20 PM

    sgmunson - Monday, July 10, 2017 12:21 PM

    If you're going to perform this function on a regular basis, then you need to have an initial run that will likely be different than the ongoing runs.   As an example, if the first time you run it, you simply limit the records coming from the user profile table to those with a NOT NULL value for the date edited field, then you will get a record for all user profile table records that have changed since the table was created.   However, having done that once, you may then need to be sure that you don't include those records again until they make another change.   The only way to do that is to then change the sproc to only look for records where the date edited field is > the last run date of the procedure.   If this is scheduled to run, then you just need to keep track of the run date and keep that value somewhere that your sproc can get to so that it can be used in the query.

    Hello Steve, and thank you for your explanation. Where would I need to store the value of 'last_run_date'. Also if in the first original run I only select the records that are not null on the date_edited field, when will I add the other records, I also need to include the ones that date_edited is null. I hope this makes sense, once again, thank you.

    You could create a table to hold the "last_run_date" value, and when the stored procedure runs, it could update that value.   As to when to include records that have a date_edited value that is NULL, I made an assumption based on your initial post that if the record hadn't changed, there would be no need to ever provide those records, but if that's not the case, then providing those on the initial run might make the most sense, as then you could just not filter out ANY of the records, and only the ongoing process would filter based on the date_edited being > last_run_date.

    Your last understanding of my likely poorly explained original post, is correct. I need to include all records in the initial run,and then after that, use the date_edited > last_run_date to filter.

    If I want to follow your recommendation, will that imply that I have to replace using a temporary table and rather use a permanent table? Thank you.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply