June 27, 2011 at 10:43 am
Is there a way in t-sql or ssis to go through a table row by row and insert each record into another table and if it gets to a row where it violates the primary key constraint to insert it into another table?
Basically I have a holding table with all records in it...
I need something that will read each row and insert into a good table...But there are dups based on 8 columns in the good table so if it reads a row and it violates that constraint, I need it to insert into a bad table...
I tried doing this with ssis for each loop container but when it tries to insert into the good table, it just fails the whole process because it sees there are dups in the table...
Sorry if this comes off confusing, let me know if you are and I can try to explain better what Im trying to accomplish...
Thanks in advance!
June 27, 2011 at 10:48 am
why not just use a set base operation regardless,and exclude anything that might duplicate? by joining the source to the original table, you can automatically exclude them, or find the ones that already exist.
--find items that already exist.
SELECT ColumnList
FROM AnotherTable
INNER JOIN PrimaryTable
ON AnotherTable.PK = PrimaryTable.PK
--reread, and saw you said dupes are based on 8 columns
AND AnotherTable.col1 = PrimaryTable.col1
AND AnotherTable.col2 = PrimaryTable.col2
AND AnotherTable.col3 = PrimaryTable.col3
AND AnotherTable.col4 = PrimaryTable.col4
--andd more joins as needed
--insert everything that does not exist.
INSERT INTO PrimaryTable(ColumnList)
SELECT ColumnList
FROM AnotherTable
LEFT OUTER JOIN PrimaryTable
ON AnotherTable.PK = PrimaryTable.PK
--reread, and saw you said dupes are based on 8 columns
AND AnotherTable.col1 = PrimaryTable.col1
AND AnotherTable.col2 = PrimaryTable.col2
AND AnotherTable.col3 = PrimaryTable.col3
AND AnotherTable.col4 = PrimaryTable.col4
WHERE PrimaryTable.PK IS NULL
Lowell
June 27, 2011 at 10:56 am
There are no records in the other table yet...Here is the breakdown of what I am trying to do:
3 Tables:
Primary Table: all records - no constraints
Good Table : currently no records, primary key based on 8 columns
Bad Table: currently no records, no constraints
I want to take the primary table and read it through row by row and insert each row into the good table...When it gets to a row in the primary table that would be considered a duplicate record based off those 8 columns in the good table, I need to insert that row into the bad table?
Is there a way to do this through T-SQL or SSIS?
June 27, 2011 at 11:13 am
still easy to do via TSQL;
you can use the row_number() function to partition the data by those 8 columns.
wrap teh results of that with an outer select where rowNumberAlais = 1 and those are the "good" records, and everything greater than 1 is the "bad"/duplicates:
assuming my same example Table:
--stuff that goes in the 'Good' table
SELECT ColumnList
FROM (
--put all 8 columns that make the data "unique"
select row_number() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3 ) AS RW,
ColumnList
FROM MainTable
) MyAlias
WHERE RW = 1
--stuff that goes int he "bad"/dupe table
SELECT ColumnList
FROM (
--put all 8 columns that make the data "unique"
select row_number() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3 ) AS RW,
ColumnList
FROM MainTable
) MyAlias
WHERE RW <> 1
Lowell
June 27, 2011 at 11:15 am
Look at using the MERGE statement. For example at
http://technet.microsoft.com/en-us/library/bb510625.aspx
or with some example code at:
http://www.codeproject.com/KB/database/SQL-2008-Merge-Statement.aspx
June 27, 2011 at 1:24 pm
Thank you so much Lowell...
That did it for me! Just for giggles: Can you explain that Row Number function and OVER PARTITION BY clause as it relates to these queries? I tried reading about it from the msdn site, but got confused...
June 27, 2011 at 1:35 pm
asm1212 (6/27/2011)
Thank you so much Lowell...That did it for me! Just for giggles: Can you explain that Row Number function and OVER PARTITION BY clause as it relates to these queries? I tried reading about it from the msdn site, but got confused...
sure I'll try;
the row_number() function lets you create an incrementing number for each row in a set or results, based on an ORDER BY.
SELECT row_number() OVER (ORDER BY name) AS RW,name from sys.tables
there are similar, sister functions like RANK and DENSE RANK you can read up on as well.
one of the powerful options to the row_number() function is the PARTITION BY statement...it lets you create a "running count" (based on some ORDER BY) for each time a grouping/PARTITION BY column changes
SELECT row_number() OVER (PARTITION BY TYPE ORDER BY type,name) AS RW,type,name from sys.objects
in your specific case we grouped by a bunch of columns (PARTITION BY actually) so if any item had a RW value of greater than 1, it was a "duplicate" based on that partitioning.
you may have tripped over a situation like "i need the data grouped like {this}, but i need the other column data that is part of the rows that match that group;
before this function was added to SQL (it was around in Oracle and PLSQL long before we go it), that was a lot harder to do.
I'd recommend searching for a few examples here on SSC and test driving them to see how handy that thing is...it's a good tool to put in the SQL toolbox.
Lowell
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply