INSERT Question

Question

Post reply

INSERT Question

JN-161845

SSCrazy

Points: 2052
More actions
February 18, 2005 at 1:31 pm

#74863

Hello,
I have an INSERT SP that reads from an SQL Server table and inserts records into table that has primary key defined on it. The SP works fine, but now I am getting the file with duplicate records. This, obviously, is an issue due to primary key constraint. What I need to do is modify my SP to INSERT the first occurance of the duplicated record and ignore the other. Is this possible? If so, any suggestion on how to do this and what approach to take? I am familiar with T-SQL, but limited knowledge of advanced T-SQL and I imagine this requires a bit complex SQL INSERT statement. Following is my current INSERT statement:
SET NOCOUNT ON
SET XACT_ABORT ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRAN
INSERT INTO dbo.Prop
                      (PropGUID, ListingID, SourceID, Address, City, Zip, ListPrice, PropCID, PropTID)
SELECT      A.PropGUID, B.ListingID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, PropTID
FROM dbo.tmp_Listings B   INNER JOIN dbo.PropR A ON A.ListingID = B.ListingID
     AND A.PropCID = B.PropCID AND A.SourceID = B.SourceID
COMMIT TRAN
GO
If you can provide any suggestions, that would help greatly.
Thanks.
JN

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

>>INSERT the first occurance of the duplicated record

Solution to this always starts with a requirements question - how is "First" defined for your business ?

SQL row data is unordered - while it is easy to think of something as coming "first" in a file, once that data is in a SQL table, the concept no longer applies. You need to supply the business rules as to which column or columns define "first"

rhunt SSCertifiable Points: 5868 More actions · Answer 2

JN, what logic are you using to determine what the first occurrence is? Is it arbitrary?

Also, are you saying that your select statement that feeds the INSERT statement has identical RECORDS, or two records with the same PK for dbo.Prop?

If you have duplicate records, try a select distinct... If you have multple occurrences of the PK but they are distinctly different records, how do you want to determine the first occurrence?

PW-201837 SSC-Insane Points: 20805 More actions · Answer 3

Once you've decided on a definition of "first", see this thread:

http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=162585

JN-161845 SSCrazy Points: 2052 More actions · Answer 4

Yeah, the first occurance is arbitrary. It doesn't matter whether A is first or B is first, I just want to take whichever record the INSERT reads first.

The records are distince (one field, which we don't care for), but PK is duplicated.

Thanks.

rhunt SSCertifiable Points: 5868 More actions · Answer 5

Check out PW's thread. His derived table join still applies - only I guess it won't matter what field you aggregate. RH

JN-161845 SSCrazy Points: 2052 More actions · Answer 6

SELECT B.ListingID, B.SourceID, B.PropCID

FROM dbo.tmp_Listings B

Inner Join dbo.PropR A

On A.ListingID = B.MLSID AND A.SourceID = B.SourceID AND A.PropCID = B.PropCID

Inner Join

(

Select MIN(A.ListingID) AS FirstListingID, MIN(A.SourceID) AS FirstSourceID,

MIN(A.PropCID) AS FirstPropCID

From dbo.PropR A

Group By A.ListingID, A.SourceID, A.PropCID

) vtFirstListingByListSourcPropCID

On (vtFirstListingByListSourcPropCID.FirstListingID = A.ListingID And

vtFirstListingByListSourcPropCatID.FirstSourceID = A.SourceID AND

vtFirstListingByListSourcPropCID.FirstPropCID = A.PropCID )

I re-wrote the statement as above, but I get no records. Is that right statement? I don't need to match against any table. Table tmp_Listings has all the dups in it.

Thanks.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 7

Nope, you're taking aggregates of 3 different columns - and the Group By in the virtual table is incorrect, because you're grouping on the things you're aggregating.

This method works by taking one non-key attribute as an indicator of "First", you're trying to span it across 3 key attributes which it isn't designed for.

Which 1 of the non-join atributes (B.Address, B.City, B.Zip, B.Price, B.PropCID, PropTID) would be unique, where taking the Min() or Max() of it would give a unique value within each keyset ?

JN-161845 SSCrazy Points: 2052 More actions · Answer 8

This is going to be tricky, it seems! The field that makes the duplicated record unique is a UserID, which I am not even importing, but if I need to I can, I guess. All the other fields have same values repeated inlcuding key fields. Basically, it's like two agents are working on the same case so case is duplicated, though really it's the same case with two different agents assigned to it.

Thanks.

rhunt SSCertifiable Points: 5868 More actions · Answer 9

JN, this brings us back to the distinct idea. You just stated that "all other fields have same values repeated including key fields". This means that you do not have multiple unique combinations per primary key. Do a select distinct to get just one row per unique combination.

JN-161845 SSCrazy Points: 2052 More actions · Answer 10

I think I might have confused you. Yes, I do have multiple unique combination per primary key. See below:

ID1 ID2 ID3 Adress City Zip

1 A T 123 My St. My 12345

ID1 ID2 ID3 Adress City Zip

1 A T 123 My St. My 12345

Now add a field User with values ME and YOU to above records and they become unique, but I am not importing the field User, as I don't need that for my import purspose.

I hope I am not misunderstanding you?!

I tried using Distinct, but it brought the same # records with dups.

Thanks.

JN-161845 SSCrazy Points: 2052 More actions · Answer 11

Never mind the previous post... I was working with wrong statement so DISTINCT wasn't working. Anyway, I opened my original statement and added back the DISTINCT and it worked!! I am getting all records except 126 records, which are duplicated.

Thanks all for your help.

JN

JN-161845 SSCrazy Points: 2052 More actions · Answer 12

Hello,

Well, it turns out I do need to import the UserID field, which is the only field that makes a record unique, though key fields are duplicated. If I wasn't importing the UserID then a SELECT DISTINCT ... statement worked fine importing only distinct records by LSID regardless of UserID, but now that UserID is being imported I get error that 'records cannot be inserted due to primary key constraint.' As adding a UserID makes a record unique, but key fields are duplicated, which is not allowed by the constraint and I don't really care for multiple UserIDs for a key field - I just want whichever record is listed first during the import (arbitrary). As per PW's suggestion, I wrote the following statement, but it still brings all records with duplicate key field values. Can anyone tell why I am still getting duplicate key fields instead of the first occurrence of UserID? Does this solution not work for my scenario or do I need to further modify it? Please suggest whatever you think might help resolve this duplicate key fields import.

Thanks.

********************************

SELECT A.PropeGUID, B.LSID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, B.PropTID, B.UserID

FROM dbo.PropR A

INNER JOIN dbo.tmp_Listings B

ON B.LSID = A.ListingID AND B.PropCID = A.PropCID AND B.SourceID = A.SourceID

INNER JOIN

(

Select LSID, SourceID, PropCID, MIN(UserID) AS FirstUserID

From dbo.tmp_Listings

Group By LSID, SourceID, PropCID

) vtFirstListingByUserID

On (vtFirstListingByUserID.LSID = A.LISTINGID AND

vtFirstListingByUserID.SourceID = A.SourceID AND

vtFirstListingByUserID.PropCategoryID = A.PropCID )

********************************

rhunt SSCertifiable Points: 5868 More actions · Answer 13

Sure, based on your example + the user ID you have:

ID1 ID2 ID3 Adress City Zip UserID

1 A T 123 My St. My 12345 ME

ID1 ID2 ID3 Adress City Zip UserID

1 A T 123 My St. My 12345 You

In this case, if you group by ID1, ID2, ID3, Address, City, and Zip while performing a MIN (userID) you will have all indentical information plust "FirstUserID" = ME.

Therefore, you can't just join on the original columns because you have more than on match per unique ID1, ID2, ID3, Address, City, and Zip.

You will need to add a join condition on FirstUserID (which is 'ME') so you only select one of the two (or more) records.

Based on PW's example, you should be looking at something like (I have only changed the last line):

SELECT A.PropeGUID, B.LSID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, B.PropTID, B.UserID

FROM dbo.PropR A

INNER JOIN dbo.tmp_Listings B

ON B.LSID = A.ListingID AND B.PropCID = A.PropCID AND B.SourceID = A.SourceID

INNER JOIN

(

Select LSID, SourceID, PropCID, MIN(UserID) AS FirstUserID

From dbo.tmp_Listings

Group By LSID, SourceID, PropCID

) vtFirstListingByUserID

On (vtFirstListingByUserID.LSID = A.LISTINGID AND

vtFirstListingByUserID.SourceID = A.SourceID AND

vtFirstListingByUserID.PropCategoryID = A.PropCID

vtFirstListingByUserID.FirstUserID=A.UserID )

JN-161845 SSCrazy Points: 2052 More actions · Answer 14

JN-161845

SSCrazy

Points: 2052

February 22, 2005 at 11:06 am

#543535

Thanks, Ryan! That was it!

JN