May 25, 2007 at 1:40 pm
Oh brilliant minds, young grasshopper seeks your guidance:
Nightly, data is exported from a non-SQL Server database to a text file. Then, via DTS, the data is imported and appended to an existing table in a SQL Server 2000 database. Sometimes the data contains new records; sometimes the data is meant to update existing records. However, the data is merely appending to the existing data so this results in duplicate records or multiple records with the same primary field. (The SQL Server database does not have any primary keys).
How can I set up the import so:
1) If the primary field exists, then the existing data is updated.
2) If the primary field does not exist, then the new data is appended (inserted) to the table.
Can this be done directly through the import job? Or should the data be first imported into a disposable (temp) table and then imported into the existing table?
David
May 29, 2007 at 4:34 am
Hi David,
Create stored procedure that checks whether the primary key and do the update or insert to the table. Then in the DTS designer, you import the text file into temp table then call your stored procedure. It will be different story, if you have the ETL tools software such as Data Mirror or Informatica.
Hopefully it answers your question and it may not be the only solution.
Cheers
May 29, 2007 at 7:33 am
you might be able to do this with a data pump, but I'd follow Lanny's advice. It's easy and quick. Import into a staging table and set up a proc for the last step.
DTS:
- truncate staging table
- import data
- call proc
The proc should do an update on PK matches and then an insert of what doesn't match. Or delete the matches after the update and insert everything else.
May 29, 2007 at 9:44 am
A Data Driven Query task will do what you want, but it's slow because it processes the source data row by row. I prefer the staging table/stored procedure approach mentioned above and have used it many times.
Greg
Greg
May 29, 2007 at 10:44 am
If the nightly extract contains all the information you need in the table rather then writing proc's for applying all the changes (Insert, Update, Delete) just wipe out the table and insert all the new data. This way any changes that needed to be applied are done in one step.
Ben Sullins
bensullins.com
Beer is my primary key...
May 29, 2007 at 1:34 pm
Thanks for the help. Can you provide any how-to steps?
May 29, 2007 at 11:31 pm
Add an Identity column and delete the dupes when the import is done... much faster than trying to build an "UPSERT" (merge) in SQL Server 2000.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2007 at 2:21 pm
I can delete the duplicate records, but what I need to do is delete the current record in favor of the imported record when the primary field already exists in the table. The imported record is not necessarily a duplicate; the import may just result in a 2nd record with a duplicate primary field with the remaining fields different. Perhaps I could add an IDENITY column and simply delete a record where the primary field of the current table equals the primary field of the record to be imported and where the IDENTITY column is less than that of the incoming record.
However, the imported data does not have an IDENTITY column and is originating from a non-SQL Server database. So would this then have to be a two part process – 1st import the data and then delete the older record? Or can this all be incorporated into DTS or one script?
May 30, 2007 at 5:12 pm
The best answer was the three-stage DTS procedure. The proc would actually be simple -- an update and an insert statement.
Let's say your SQL Server table has an identity column and four data columns (ID, C1, C2, C3, C4) and the staging table just has the data columns. Also, that C1 and C2 form the natural key -- these fields identify a duplicate row. So the proc would first update the non-key fields from duplicates:
update dbo.OriginalTable set C3 = ST.C3, C4 = ST.C4 from dbo.OriginalTable as OT inner join dbo.StagingTable as ST on OT.C1 = ST.C1 and OT.C2 = ST.C2;
Then insert those rows from the staging table that are not duplicates:
insert into OriginalTable (C1, C2, C3, C4) select C1, C2, C3, C4 from StagingTable as ST where not exist ( select * from OriginalTable as OT where OT.C1 = ST .C1 and OT.C2 = ST.C2);
That's it. Of course, it would help to index the key field(s) of both tables.
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
May 30, 2007 at 7:26 pm
Tomm's code for an upsert is good... think I'd do it in a stored procedure rather than DTS, but it'll do the trick either way.
I still use BULK INSERT and a dupe delete, though... speed freak, I guess.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2007 at 7:38 pm
Thanks for all the input. I think I got it! Thanks again everyone.
May 30, 2007 at 7:42 pm
What did you end up with, David?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2007 at 8:16 pm
I'm just gonna let the nightly import via DTS run and then set up a job to check for and delete the older record. This is the stored procedure that will run after the import.
CREATE PROCEDURE nwDuplicateCheckerAll_Title
AS
SET NOCOUNT ON
BEGIN TRANSACTION DuplicateChecker
DELETE A
FROM All_Title A
INNER JOIN All_Title B
ON A.KeyID = B.KeyID
AND A.[DateAdd] = B.[DateAdd]
WHERE A.LastColumn < B.LastColumn
COMMIT TRANSACTION DuplicateChecker
SET NOCOUNT OFF
I'm treating the KeyID and DateAdd as a composite primary key. I added an identity column - "LastColumn." How does it look?
June 1, 2007 at 10:39 am
So, if I read this correctly, you are inserting all the records into the table, then going through and deleting the older duplicates. This seems straightforward but I have a few questions.
You've had at least a couple of suggestions here that don't raise the concerns I've listed (I'm making some assumptions on the Bulk Insert idea). It's your database, of course, but you asked.
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply