Stop Duplicates - SQL Task - SSIS 2010

Question

Stop Duplicates - SQL Task - SSIS 2010

joanna.seldon

Ten Centuries

Points: 1188
More actions
June 27, 2017 at 5:07 am

#343352

hi
fairly new to SSIS
I have to run an excel sheet every quarter (every 3 months)
there is no unique ID column
there is a column that will have multiple entries of a OrderDate
for example the Dates for the first excel document it will have a OrderDate date range from 01/01/2001 to 31/03/2001
the next Excel Document will have OrderDate date range from 01/04/2001 to 30/06/2001
is there an SQL task that can be added to stop if someone trys to e.g re run the second Excel document , stop adding / avoid the duplicate data
please help

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99130 More actions · Answer 1

I assume that you mean SSIS 2012? There is no 2010, however, when launched SSDT 2012 used VS2010.

Personally, my way, is by having a column for the file name and feeding this into your dataflow. This means you can see if a file has already been loaded by checking if the filename exists in the table.
In simple steps:

Store the filename of the file your're going to load in a variable/parameter
Create a variable called something like "ExistingRecords" of data type int.
Use an Execute T-SQL Task and run some SQL along the lines of: SELECT COUNT(*) AS Records FROM YourTable YT WHERE YT.FileName = ?; Pass the value of your filename variable/parameter in the SQL, and store the return result in your "ExistingRecords" variable.
Change the Precedence Constraint from your Execute T-SQL task to your next task (your dataflow task?) to Expression and Constraint.
Leave the value as Success.
In the expression enter: @[User::ExistingRecords] == 0

This will mean that if any existing records are found, any tasks after your Execute T-SQL Task will not be run (provided that they have no Logical OR precedent constraints that evaluate to TRUE).

This does depend on your filename being different each time, and you may need to go back and put the data your already loaded data if it is.

If not, you could load the data into a staging table on your SQL server. Then check the min/max dates in the staging data against against the dateswithin your production table; if they exist don't load the data. Then, regardless of it you load or not, delete/truncate the data from the staging table.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

joanna.seldon Ten Centuries Points: 1188 More actions · Answer 2

Hi

I am using Visual Studio 2010, and my file name is not different

file name is always called

OrderProcessData.xlsx

please help

Phil Parkin SSC Guru Points: 246962 More actions · Answer 3

joanna.seldon - Tuesday, June 27, 2017 5:07 AM
hi
fairly new to SSIS
I have to run an excel sheet every quarter (every 3 months)
there is no unique ID column
there is a column that will have multiple entries of a OrderDate
for example the Dates for the first excel document it will have a OrderDate date range from 01/01/2001 to 31/03/2001
the next Excel Document will have OrderDate date range from 01/04/2001 to 30/06/2001
is there an SQL task that can be added to stop if someone trys to e.g re run the second Excel document , stop adding / avoid the duplicate data
please help

Presumably your target table has some sort of unique key ... some combination of columns which cannot be duplicated? Can you not enforce uniqueness at the table level using an appropriate constraint? If not, this sounds like your table design needs to be revisited.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

joanna.seldon Ten Centuries Points: 1188 More actions · Answer 4

hi

there is no unique id for these records

the data looks like

OrderDate Department OrderProcessedCheckPersonel PackedDate Unit

01/01/2001   Catering       John Smith                                02/01/2001        1
01/01/2001    Kitchen        John Smith                                 01/01/2001        3
01/01/2001    Kitchen        John Smith                                 01/01/2001        3
02/01/2001    Catering      John Smith                                 02/01/2001        1
02/01/2001    Home          Peter Jones                                02/01/2001        1

is there a way of if row 2 (row 1 contains the column names) of the excel file fully matches a row in a SQL table then fail ..if not add?

please help

Phil Parkin SSC Guru Points: 246962 More actions · Answer 5

joanna.seldon - Tuesday, June 27, 2017 6:08 AM
hi
there is no unique id for these records
the data looks like
OrderDate Department   OrderProcessedCheckPersonel PackedDate    Unit
01/01/2001   Catering       John Smith                                02/01/2001        1
01/01/2001    Kitchen        John Smith                                 01/01/2001        3
01/01/2001    Kitchen        John Smith                                 01/01/2001        3
02/01/2001    Catering      John Smith                                 02/01/2001        1
02/01/2001    Home          Peter Jones                                02/01/2001        1
is there a way of if row 2 (row 1 contains the column names) of the excel file fully matches a row in a SQL table then fail ..if not add?
please help

Use Thom's solution. If rows 2 and 3 of your sample data are truly valid, I can't see a way. What is the point of allowing data such as this ... it seems meaningless to have two fully duplicated rows of data?

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Thom A SSC Guru Points: 99130 More actions · Answer 6

If the names are always the same, and there is no unique reference, i would suggest a staging table. Then you check for existing records and either not load the file, or only load "new" records.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

joanna.seldon Ten Centuries Points: 1188 More actions · Answer 7

hi

I got to use above really well with a fixed file name

how do you use the above if the file name is dynamic ? I have added a foreach loop container to help pass the file name into a variable

but the difficulty then is using the variable

please help

Thom A SSC Guru Points: 99130 More actions · Answer 8

What are you trying to use the variable to do exactly? In SSIS you reference a variable by using the following expression:
@[User::YourVariableName]

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

joanna.seldon Ten Centuries Points: 1188 More actions · Answer 9

joanna.seldon

Ten Centuries

Points: 1188

June 30, 2017 at 5:07 am

#1948901

brill

all working , great

thanks team