February 18, 2005 at 1:31 pm
Hello,
I have an INSERT SP that reads from an SQL Server table and inserts records into table that has primary key defined on it. The SP works fine, but now I am getting the file with duplicate records. This, obviously, is an issue due to primary key constraint. What I need to do is modify my SP to INSERT the first occurance of the duplicated record and ignore the other. Is this possible? If so, any suggestion on how to do this and what approach to take? I am familiar with T-SQL, but limited knowledge of advanced T-SQL and I imagine this requires a bit complex SQL INSERT statement. Following is my current INSERT statement:
SET NOCOUNT ON
SET XACT_ABORT ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRAN
INSERT INTO dbo.Prop
(PropGUID, ListingID, SourceID, Address, City, Zip, ListPrice, PropCID, PropTID)
SELECT A.PropGUID, B.ListingID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, PropTID
FROM dbo.tmp_Listings B INNER JOIN dbo.PropR A ON A.ListingID = B.ListingID
AND A.PropCID = B.PropCID AND A.SourceID = B.SourceID
COMMIT TRAN
GO
If you can provide any suggestions, that would help greatly.
Thanks.
JN
February 18, 2005 at 1:37 pm
>>INSERT the first occurance of the duplicated record
Solution to this always starts with a requirements question - how is "First" defined for your business ?
SQL row data is unordered - while it is easy to think of something as coming "first" in a file, once that data is in a SQL table, the concept no longer applies. You need to supply the business rules as to which column or columns define "first"
February 18, 2005 at 1:40 pm
JN, what logic are you using to determine what the first occurrence is? Is it arbitrary?
Also, are you saying that your select statement that feeds the INSERT statement has identical RECORDS, or two records with the same PK for dbo.Prop?
If you have duplicate records, try a select distinct... If you have multple occurrences of the PK but they are distinctly different records, how do you want to determine the first occurrence?
February 18, 2005 at 1:43 pm
Once you've decided on a definition of "first", see this thread:
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=162585
February 18, 2005 at 1:50 pm
Yeah, the first occurance is arbitrary. It doesn't matter whether A is first or B is first, I just want to take whichever record the INSERT reads first.
The records are distince (one field, which we don't care for), but PK is duplicated.
Thanks.
February 18, 2005 at 1:53 pm
Check out PW's thread. His derived table join still applies - only I guess it won't matter what field you aggregate. RH
February 18, 2005 at 2:37 pm
SELECT B.ListingID, B.SourceID, B.PropCID
FROM dbo.tmp_Listings B
Inner Join dbo.PropR A
On A.ListingID = B.MLSID AND A.SourceID = B.SourceID AND A.PropCID = B.PropCID
Inner Join
(
Select MIN(A.ListingID) AS FirstListingID, MIN(A.SourceID) AS FirstSourceID,
MIN(A.PropCID) AS FirstPropCID
From dbo.PropR A
Group By A.ListingID, A.SourceID, A.PropCID
) vtFirstListingByListSourcPropCID
On (vtFirstListingByListSourcPropCID.FirstListingID = A.ListingID And
vtFirstListingByListSourcPropCatID.FirstSourceID = A.SourceID AND
vtFirstListingByListSourcPropCID.FirstPropCID = A.PropCID )
I re-wrote the statement as above, but I get no records. Is that right statement? I don't need to match against any table. Table tmp_Listings has all the dups in it.
Thanks.
February 18, 2005 at 3:10 pm
Nope, you're taking aggregates of 3 different columns - and the Group By in the virtual table is incorrect, because you're grouping on the things you're aggregating.
This method works by taking one non-key attribute as an indicator of "First", you're trying to span it across 3 key attributes which it isn't designed for.
Which 1 of the non-join atributes (B.Address, B.City, B.Zip, B.Price, B.PropCID, PropTID) would be unique, where taking the Min() or Max() of it would give a unique value within each keyset ?
February 18, 2005 at 3:23 pm
This is going to be tricky, it seems! The field that makes the duplicated record unique is a UserID, which I am not even importing, but if I need to I can, I guess. All the other fields have same values repeated inlcuding key fields. Basically, it's like two agents are working on the same case so case is duplicated, though really it's the same case with two different agents assigned to it.
Thanks.
February 18, 2005 at 4:01 pm
JN, this brings us back to the distinct idea. You just stated that "all other fields have same values repeated including key fields". This means that you do not have multiple unique combinations per primary key. Do a select distinct to get just one row per unique combination.
February 18, 2005 at 4:13 pm
I think I might have confused you. Yes, I do have multiple unique combination per primary key. See below:
ID1 ID2 ID3 Adress City Zip
1 A T 123 My St. My 12345
ID1 ID2 ID3 Adress City Zip
1 A T 123 My St. My 12345
Now add a field User with values ME and YOU to above records and they become unique, but I am not importing the field User, as I don't need that for my import purspose.
I hope I am not misunderstanding you?!
I tried using Distinct, but it brought the same # records with dups.
Thanks.
February 18, 2005 at 4:35 pm
Never mind the previous post... I was working with wrong statement so DISTINCT wasn't working. Anyway, I opened my original statement and added back the DISTINCT and it worked!! I am getting all records except 126 records, which are duplicated.
Thanks all for your help.
JN
February 22, 2005 at 10:46 am
Hello,
Well, it turns out I do need to import the UserID field, which is the only field that makes a record unique, though key fields are duplicated. If I wasn't importing the UserID then a SELECT DISTINCT ... statement worked fine importing only distinct records by LSID regardless of UserID, but now that UserID is being imported I get error that 'records cannot be inserted due to primary key constraint.' As adding a UserID makes a record unique, but key fields are duplicated, which is not allowed by the constraint and I don't really care for multiple UserIDs for a key field - I just want whichever record is listed first during the import (arbitrary). As per PW's suggestion, I wrote the following statement, but it still brings all records with duplicate key field values. Can anyone tell why I am still getting duplicate key fields instead of the first occurrence of UserID? Does this solution not work for my scenario or do I need to further modify it? Please suggest whatever you think might help resolve this duplicate key fields import.
Thanks.
********************************
SELECT A.PropeGUID, B.LSID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, B.PropTID, B.UserID
FROM dbo.PropR A
INNER JOIN dbo.tmp_Listings B
ON B.LSID = A.ListingID AND B.PropCID = A.PropCID AND B.SourceID = A.SourceID
INNER JOIN
(
Select LSID, SourceID, PropCID, MIN(UserID) AS FirstUserID
From dbo.tmp_Listings
Group By LSID, SourceID, PropCID
) vtFirstListingByUserID
On (vtFirstListingByUserID.LSID = A.LISTINGID AND
vtFirstListingByUserID.SourceID = A.SourceID AND
vtFirstListingByUserID.PropCategoryID = A.PropCID )
********************************
February 22, 2005 at 10:55 am
Sure, based on your example + the user ID you have:
ID1 ID2 ID3 Adress City Zip UserID
1 A T 123 My St. My 12345 ME
ID1 ID2 ID3 Adress City Zip UserID
1 A T 123 My St. My 12345 You
In this case, if you group by ID1, ID2, ID3, Address, City, and Zip while performing a MIN (userID) you will have all indentical information plust "FirstUserID" = ME.
Therefore, you can't just join on the original columns because you have more than on match per unique ID1, ID2, ID3, Address, City, and Zip.
You will need to add a join condition on FirstUserID (which is 'ME') so you only select one of the two (or more) records.
Based on PW's example, you should be looking at something like (I have only changed the last line):
SELECT A.PropeGUID, B.LSID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, B.PropTID, B.UserID
FROM dbo.PropR A
INNER JOIN dbo.tmp_Listings B
ON B.LSID = A.ListingID AND B.PropCID = A.PropCID AND B.SourceID = A.SourceID
INNER JOIN
(
Select LSID, SourceID, PropCID, MIN(UserID) AS FirstUserID
From dbo.tmp_Listings
Group By LSID, SourceID, PropCID
) vtFirstListingByUserID
On (vtFirstListingByUserID.LSID = A.LISTINGID AND
vtFirstListingByUserID.SourceID = A.SourceID AND
vtFirstListingByUserID.PropCategoryID = A.PropCID
vtFirstListingByUserID.FirstUserID=A.UserID )
February 22, 2005 at 11:06 am
Thanks, Ryan! That was it!
JN
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply