April 5, 2009 at 4:41 pm
I have a table where the schema is as follows
id bigtin identity(1,1)
sn varchar(50)
create primary key pk_id on table(id)
create unique index ui_sn on table(sn)
now if i use the import/export wizord, i would like to import a text file, but if there is a duplicate, i would like it to ignore it and keep going, In the import wizord, there is even a drop down to continue on error. But it always seems to stop as soon as it finds a duplicate.
I rather not have to import the text file into a temp table, and then to a insert statment with a left outer join on itself and saying where table2.sn is null (one time i even ran this and it still says there was a duplicate!!! which is freaking impossible!!!) there were no other inserts going on to the table at the time!
just to clear up my sql statment was
insert into table1 t1 (select t2 from table2 left outer joing table1 t1 on t1.sn = t2.sn where t1 is null)
did i do the statment wrong? i mean i ran this before successfully with no problems
Im sorry for the cross post , i found out later this might be more relevant in this forum
April 5, 2009 at 5:32 pm
i think the statement you pasted is missing which column in t1 is null; was that a copy/paste error?
it looks a little like pseudo code;
this is how I would do it:
insert into t1 (col1,col2,col3...)
select t2.col1,t2.col2.t2.col3
from table2 t2
left outer join table1 t1 on t2.sn = t1.sn
where t1.sn is null
Lowell
April 5, 2009 at 6:08 pm
yes it was a copy and past error
April 5, 2009 at 7:20 pm
An alternative method (albeit less satisfactory) would be to define the UNIQUE index with IGNORE_DUP_KEY = ON.
Another method would be to use WHERE NOT EXISTS or the EXCEPT operator.
Cheers,
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
April 6, 2009 at 7:59 am
xgcmcbain (4/5/2009)
I rather not have to import the text file into a temp table, and then to a insert statment with a left outer join on itself and saying where table2.sn is null (one time i even ran this and it still says there was a duplicate!!! which is freaking impossible!!!) there were no other inserts going on to the table at the time!did i do the statment wrong? i mean i ran this before successfully with no problems
Most probably the described problem was caused by the fact that the offending row was included twice in the file you were importing.
What you did is not a self join, you joined two different tables to find rows that already exist in the target table. What you did not was to check whether there is a duplicity inside the new data in temporary table.
IMHO importing the data into a temp table without any constraints, and then moving them into the target table with all necessary checks (for duplicity and other things) is preferred way to do this. In many cases, you need to insert rows that does not exist and update those that already exist - this will allow you to do it easily. Of course, you could also use OPENROWSET or some similar way to open the source file instead of importing it into a temp table, but then the rest including checks will be the same.
April 6, 2009 at 9:22 am
Vladan (4/6/2009)
xgcmcbain (4/5/2009)
I rather not have to import the text file into a temp table, and then to a insert statment with a left outer join on itself and saying where table2.sn is null (one time i even ran this and it still says there was a duplicate!!! which is freaking impossible!!!) there were no other inserts going on to the table at the time!did i do the statment wrong? i mean i ran this before successfully with no problems
Most probably the described problem was caused by the fact that the offending row was included twice in the file you were importing.
What you did is not a self join, you joined two different tables to find rows that already exist in the target table. What you did not was to check whether there is a duplicity inside the new data in temporary table.
IMHO importing the data into a temp table without any constraints, and then moving them into the target table with all necessary checks (for duplicity and other things) is preferred way to do this. In many cases, you need to insert rows that does not exist and update those that already exist - this will allow you to do it easily. Of course, you could also use OPENROWSET or some similar way to open the source file instead of importing it into a temp table, but then the rest including checks will be the same.
im sorry i forgot to mention the list i was importing ,was a distinct list, i was thinking maybe it was a null value or something and made sure that wasnt the case either
April 6, 2009 at 9:25 am
it just seems stupid to me that i have to write some vb or c# script to loop every record and insert one record at a time. i would think the import/export wizord should have this functionality
April 6, 2009 at 9:37 am
Hi
I had the similar requirement, where in i disabled the Unique Index on the column and loaded everything and deleted the duplicate records and enabled index.
Please let me know gurus if this is the right way or not!!
April 6, 2009 at 9:43 am
i do it very similarly, Vijaya Kadiyala;
i always use a staging table, because once it's in a table, it's so easy to analyze...
BULK INSERT or bcp can be your friend there for big data files;
...GROUP BY [ what should be the primary key columns] HAVING count(*) > 1
identifies problems like this instantly for me, and then it's a simple insert from my staging table.
if THAT fails, i can compare the staging table and see what went wrong.
Lowell
April 6, 2009 at 9:46 am
Hi Lowell
thank you for your confirmation on my approach 🙂
April 7, 2009 at 12:37 am
xgcmcbain (4/6/2009)
it just seems stupid to me that i have to write some vb or c# script to loop every record and insert one record at a time. i would think the import/export wizord should have this functionality
I didn't suggest to import individual rows - what I meant was something like Lowell posted. Import the file into temporary table, eliminate any duplicities inside the list using GROUP BY (and check other things if necessary), and then insert into the target table using left join with IS NULL.
This allows you to decide how you wish to handle situations when imported data have some problem, e.g. insert all rows with some error into another table to be processed manually, aggregate multiple rows with the same primary key into one row (or ignore older of the duplicate rows or whatever it is you need). No loops, standard set-based processing.
April 7, 2009 at 12:55 pm
Vladan (4/7/2009)
xgcmcbain (4/6/2009)
it just seems stupid to me that i have to write some vb or c# script to loop every record and insert one record at a time. i would think the import/export wizord should have this functionalityI didn't suggest to import individual rows - what I meant was something like Lowell posted. Import the file into temporary table, eliminate any duplicities inside the list using GROUP BY (and check other things if necessary), and then insert into the target table using left join with IS NULL.
This allows you to decide how you wish to handle situations when imported data have some problem, e.g. insert all rows with some error into another table to be processed manually, aggregate multiple rows with the same primary key into one row (or ignore older of the duplicate rows or whatever it is you need). No loops, standard set-based processing.
this is already the process that i do. 99% of the time.
odly one time id didnt work said there was a duplicate, but i know that was impossible.
my main point is i cant belive the import explort wizord doenst allow this, what i could code up in 5 minutes, after how many years SQL Server has been working on their import explort wizord.
Alternatively i could use SSIS but god thats cofusing and a pain , im a code person, and sometimes have no idea wha to click, need to take a class in that
April 8, 2009 at 4:55 pm
xgcmcbain (4/7/2009)
my main point is i cant belive the import explort wizord doenst allow this, what i could code up in 5 minutes, after how many years SQL Server has been working on their import explort wizord.Alternatively i could use SSIS but god thats cofusing and a pain , im a code person, and sometimes have no idea wha to click, need to take a class in that
🙂
The Import/Export wizard allows you to choose how you'd lke the source to be imported by specifying a query. This query can use DISTINCT/JOIN/etc.
I'm not sure if it works for flat files, but it does work between two tables.
Unfortunately, "remove duplicates" means different things to different people. That's why you need to write a specific SQL query (or SSIS data flow) to implement what YOU mean.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply