setting up DTS package to not import duplicates

Question

Post reply

setting up DTS package to not import duplicates

Ed Rooney

SSC Enthusiast

Points: 128
More actions
April 27, 2006 at 2:14 pm

#91266

i want to set up a DTS package usign the SQL Server 2000 Enterprise Manager that will import data from a fixed width text file into a database table. and i knwo how to do that part. but what i need is to make it so that it will only import a record if it is not a duplicate record. how would i do that?

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Pam Brisjar SSChampion Points: 12094 More actions · Answer 1

Three possibilities come to mind:

1) Insert trigger on the database to check for dupes and reject.

2) Make the transform task a VB script, run a sql query to check for dupes and only DTS those that aren't.

3) DTS to a "staging" table, then run a SQL command / stored proc to copy from the staging table to the desired table.

Gift Peddie SSC Guru Points: 73570 More actions · Answer 2

You have two problems when you import text SQL Server sees null values because of the nature of text so if you import into a table with primary key your import will fail. So first import into a temp table then destination make sure the destination table is set up with unique constraint or even better unique index because you can use IGNORE_DUP_KEY with your create index statement. Your import will continue but the duplicate will not be inserted. Run a search for UNIQUE Constraint and IGNORE_DUP_KEY with limitations in the BOL (books online). One more thing unique constraint is nullable while unique index is not. Hope this helps.

Kind regards,

Gift Peddie

Kind regards,
Gift Peddie

Ed Rooney SSC Enthusiast Points: 128 More actions · Answer 3

interesting idea for the trigger - but what woudl the code look liek to check for dupes and reject?

Ed Rooney SSC Enthusiast Points: 128 More actions · Answer 4

Ed Rooney

SSC Enthusiast

Points: 128

May 16, 2006 at 10:20 am

#638205

anyone?

Pam Brisjar SSChampion Points: 12094 More actions · Answer 5

Have a look at the examples here:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_create2_7eeq.asp

Essentially, you'd do a select on inserted and the table in question, joining on the PK / unique data.

Very rough pseudo-code:

If EXISTS(select.....)

begin

RAISERROR ('Non-unique insert.', 16, 1)

ROLLBACK TRANSACTION

end

Ed Rooney SSC Enthusiast Points: 128 More actions · Answer 6

"Essentially, you'd do a select on inserted and the table in question, joining on the PK / unique data. "

thanks for the reply but thats the part i am really having trouble with.

lets just say my data looked like this, both in the database and the text file i am importing from:

ID PK

FNAME

LNAME

then how would i make my select statement?

i know that the following would find duplicates

SELECT ID

FROM my_table

WHERE ID= the_id_being_imported

but how do i access the id of the record its trying to import? is there a special variable or something for that?

Pam Brisjar SSChampion Points: 12094 More actions · Answer 7

inserted is the name of the "table" in the insert trigger.

An example from the link:

SELECT @min_lvl = min_lvl,

@max_lvl = max_lvl,

@emp_lvl = i.job_lvl,

@job_id = i.job_id

FROM employee e INNER JOIN inserted i ON e.emp_id = i.emp_id

Ed Rooney SSC Enthusiast Points: 128 More actions · Answer 8

ok i'm sorry if i am being dumb but i dont get what your doing there:

SELECT @min_lvl = min_lvl,

@max_lvl = max_lvl,

@emp_lvl = i.job_lvl,

@job_id = i.job_id

FROM employee e INNER JOIN inserted i ON e.emp_id = i.emp_id

i dont really get what you are selecting here. could you please explain it?

Pam Brisjar SSChampion Points: 12094 More actions · Answer 9

"employee" is the table in the database

"inserted" is the 'table' containing the data to be inserted.

so, for your example:

SELECT ID

FROM my_table

WHERE ID= the_id_being_imported

it would look like:

SELECT ID

FROM my_table t

INNER JOIN inserted i

on t.ID = i.ID

Ed Rooney SSC Enthusiast Points: 128 More actions · Answer 10

ok you must have posted as i was writign this post - so it looks liek i am on the right track. thank you so much for your help!

Ed Rooney SSC Enthusiast Points: 128 More actions · Answer 11

ok now one more question - the data i am importign into my table froim the text file does not have a single column that is unique. so i am appentign an identity int column to each record inserted. that identity int is the PK.

but i think that a combination fo 3 feilds from the incoming data could be used to determine if the record is unique or not.

so the question is, is there a way to do the inner join on 3 feilds?

Pam Brisjar SSChampion Points: 12094 More actions · Answer 12

sure

INNER JOIN inserted i ON

t.Col1 = i.col1

and

t.col2 = i.col2

and

t.col3 = i.col3

There's also the CHECKSUM function that you might want to look into.

Prasenjit Biswas SSC Rookie Points: 33 More actions · Answer 13

If you are importing via a DTS package and have a text field as PK but no Identity column, the trigger solution WONT WORK.

Here is what you can do to reject duplicate rows during bulk insert (assuming you have a text primary key or no PK column). Select the column you want to function as a unique identifier or the PK column. Create an Index on the PK column and a UNIQUE Constraint with IGNORE_DUP_KEY on. This will eliminate all duplicate rows during DTS bulk copy. (you wont need transformation logic, etc to handle this. yes it will throw an error but it will not stop at the first duplicate and will run all the records)