Stop Duplicates - SQL Task - SSIS 2010

  • hi

    fairly new to SSIS

    I have to run an excel sheet every quarter (every 3 months)

    there is no unique ID column

    there is a column that will have multiple entries of a OrderDate  

    for example the Dates for the first excel document it will have a OrderDate date range from 01/01/2001 to 31/03/2001

    the next Excel Document will have OrderDate date range from 01/04/2001 to 30/06/2001

    is there an SQL  task that can be added to stop if someone trys to e.g re run the second Excel document , stop adding / avoid the duplicate data

    please help

  • I assume that you mean SSIS 2012? There is no 2010, however, when launched SSDT 2012 used VS2010.

    Personally, my way, is by having a column for the file name and feeding this into your dataflow. This means you can see if a file has already been loaded by checking if the filename exists in the table.
    In simple steps:

    1. Store the filename of the file your're going to load in a variable/parameter
    2. Create a variable called something like "ExistingRecords" of data type int.
    3. Use an Execute T-SQL Task and run some SQL along the lines of: SELECT COUNT(*) AS Records FROM YourTable YT WHERE YT.FileName = ?; Pass the value of your filename variable/parameter in the SQL, and store the return result in your "ExistingRecords" variable.
    4. Change the Precedence Constraint from your Execute T-SQL task to your next task (your dataflow task?) to Expression and Constraint.
    5. Leave the value as Success.
    6. In the expression enter: @[User::ExistingRecords] == 0

    This will mean that if any existing records are found, any tasks after your Execute T-SQL Task will not be run (provided that they have no Logical OR precedent constraints that evaluate to TRUE).

    This does depend on your filename being different each time, and you may need to go back and put the data your already loaded data if it is.

    If not, you could load the data into a staging table on your SQL server. Then check the min/max dates in the staging data against against the dateswithin your production table; if they exist don't load the data. Then, regardless of it you load or not, delete/truncate the data from the staging table.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Hi

    I am using Visual Studio 2010, and my file name is not different

    file name is always called

    OrderProcessData.xlsx

    please help

  • joanna.seldon - Tuesday, June 27, 2017 5:07 AM

    hi

    fairly new to SSIS

    I have to run an excel sheet every quarter (every 3 months)

    there is no unique ID column

    there is a column that will have multiple entries of a OrderDate  

    for example the Dates for the first excel document it will have a OrderDate date range from 01/01/2001 to 31/03/2001

    the next Excel Document will have OrderDate date range from 01/04/2001 to 30/06/2001

    is there an SQL  task that can be added to stop if someone trys to e.g re run the second Excel document , stop adding / avoid the duplicate data

    please help

    Presumably your target table has some sort of unique key ... some combination of columns which cannot be duplicated? Can you not enforce uniqueness at the table level using an appropriate constraint? If not, this sounds like your table design needs to be revisited.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • hi

    there is no unique id for these records

    the data looks like

    OrderDate   Department   OrderProcessedCheckPersonel  PackedDate    Unit

    01/01/2001    Catering       John Smith                                02/01/2001        1
    01/01/2001    Kitchen        John Smith                                 01/01/2001        3
    01/01/2001    Kitchen        John Smith                                 01/01/2001        3
    02/01/2001    Catering      John Smith                                 02/01/2001        1
    02/01/2001    Home          Peter Jones                                02/01/2001        1

    is there a way of if row 2 (row 1 contains the column names) of the excel file fully matches a row in a SQL table then fail ..if not add?

    please help

  • joanna.seldon - Tuesday, June 27, 2017 6:08 AM

    hi

    there is no unique id for these records

    the data looks like

    OrderDate   Department   OrderProcessedCheckPersonel  PackedDate    Unit

    01/01/2001    Catering       John Smith                                02/01/2001        1
    01/01/2001    Kitchen        John Smith                                 01/01/2001        3
    01/01/2001    Kitchen        John Smith                                 01/01/2001        3
    02/01/2001    Catering      John Smith                                 02/01/2001        1
    02/01/2001    Home          Peter Jones                                02/01/2001        1

    is there a way of if row 2 (row 1 contains the column names) of the excel file fully matches a row in a SQL table then fail ..if not add?

    please help

    Use Thom's solution. If rows 2 and 3 of your sample data are truly valid, I can't see a way. What is the point of allowing data such as this ... it seems meaningless to have two fully duplicated rows of data?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • If the names are always the same, and there is no unique reference, i would suggest a staging table. Then you check for existing records and either not load the file, or only load "new" records.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • hi

    I got to use above really well with a fixed file name

    how do you use the above if the file name is dynamic ?  I have added a foreach loop container to help pass the file name into a variable

    but the difficulty then is using the variable

    please help

  • What are you trying to use the variable to do exactly? In SSIS you reference a variable by using the following expression:
    @[User::YourVariableName]

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • brill

    all working , great

    thanks team

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply