INSERT Question

  • Hello,

    I have an INSERT SP that reads from an SQL Server table and inserts records into table that has primary key defined on it. The SP works fine, but now I am getting the file with duplicate records. This, obviously, is an issue due to primary key constraint. What I need to do is modify my SP to INSERT the first occurance of the duplicated record and ignore the other. Is this possible? If so, any suggestion on how to do this and what approach to take? I am familiar with T-SQL, but limited knowledge of advanced T-SQL and I imagine this requires a bit complex SQL INSERT statement. Following is my current INSERT statement:

    SET NOCOUNT ON

    SET XACT_ABORT ON

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    BEGIN TRAN

    INSERT INTO dbo.Prop

                          (PropGUID, ListingID, SourceID, Address, City, Zip, ListPrice, PropCID, PropTID)

    SELECT      A.PropGUID, B.ListingID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, PropTID

    FROM  dbo.tmp_Listings B   INNER JOIN dbo.PropR A  ON A.ListingID = B.ListingID

           AND A.PropCID = B.PropCID AND A.SourceID = B.SourceID

    COMMIT TRAN

    GO

    If you can provide any suggestions, that would help greatly.

    Thanks.

    JN

  • >>INSERT the first occurance of the duplicated record

    Solution to this always starts with a requirements question - how is "First" defined for your business ?

    SQL row data is unordered - while it is easy to think of something as coming "first" in a file, once that data is in a SQL table, the concept no longer applies. You need to supply the business rules as to which column or columns define "first"

  • JN, what logic are you using to determine what the first occurrence is?  Is it arbitrary?

    Also, are you saying that your select statement that feeds the INSERT statement has identical RECORDS, or two records with the same PK for dbo.Prop?

    If you have duplicate records, try a select distinct...  If you have multple occurrences of the PK but they are distinctly different records, how do you want to determine the first occurrence?

  • Once you've decided on a definition of "first", see this thread:

    http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=162585

     

  • Yeah, the first occurance is arbitrary. It doesn't matter whether A is first or B is first, I just want to take whichever record the INSERT reads first.

    The records are distince (one field, which we don't care for), but PK is duplicated.

    Thanks.

  • Check out PW's thread.  His derived table join still applies - only I guess it won't matter what field you aggregate. RH

  • SELECT  B.ListingID, B.SourceID, B.PropCID

    FROM  dbo.tmp_Listings B

    Inner Join dbo.PropR A

      On A.ListingID = B.MLSID AND A.SourceID = B.SourceID AND A.PropCID = B.PropCID

    Inner Join

    (

      Select MIN(A.ListingID) AS FirstListingID, MIN(A.SourceID) AS FirstSourceID,

     MIN(A.PropCID) AS FirstPropCID

      From   dbo.PropR A

      Group By A.ListingID, A.SourceID, A.PropCID

    ) vtFirstListingByListSourcPropCID

      On (vtFirstListingByListSourcPropCID.FirstListingID = A.ListingID And

          vtFirstListingByListSourcPropCatID.FirstSourceID = A.SourceID AND

          vtFirstListingByListSourcPropCID.FirstPropCID = A.PropCID )

    I re-wrote the statement as above, but I get no records. Is that right statement? I don't need to match against any table. Table tmp_Listings has all the dups in it.

    Thanks.

  • Nope, you're taking aggregates of 3 different columns - and the Group By in the virtual table is incorrect, because you're grouping on the things you're aggregating.

    This method works by taking one non-key attribute as an indicator of "First", you're trying to span it across 3 key attributes which it isn't designed for.

    Which 1 of the non-join atributes (B.Address, B.City, B.Zip, B.Price, B.PropCID, PropTID) would be unique, where taking the Min() or Max() of it would give a unique value within each keyset ?

     

  • This is going to be tricky, it seems! The field that makes the duplicated record unique is a UserID, which I am not even importing, but if I need to I can, I guess. All the other fields have same values repeated inlcuding key fields. Basically, it's like two agents are working on the same case so case is duplicated, though really it's the same case with two different agents assigned to it.

    Thanks.

  • JN, this brings us back to the distinct idea.  You just stated that "all other fields have same values repeated including key fields".  This means that you do not have multiple unique combinations per primary key.  Do a select distinct to get just one row per unique combination.

  • I think I might have confused you. Yes, I do have multiple unique combination per primary key. See below:

    ID1   ID2   ID3   Adress         City   Zip

    1      A      T     123 My St.    My    12345

    ID1   ID2   ID3   Adress         City   Zip

    1      A      T     123 My St.    My    12345

    Now add a field User with values ME and YOU to above records and they become unique, but I am not importing the field User, as I don't need that for my import purspose.

    I hope I am not misunderstanding you?!

    I tried using Distinct, but it brought the same # records with dups.

    Thanks.

  • Never mind the previous post... I was working with wrong statement so DISTINCT wasn't working. Anyway, I opened my original statement and added back the DISTINCT and it worked!! I am getting all records except 126 records, which are duplicated.

    Thanks all for your help.

    JN

  • Hello,

    Well, it turns out I do need to import the UserID field, which is the only field that makes a record unique, though key fields are duplicated. If I wasn't importing the UserID then a SELECT DISTINCT ... statement worked fine importing only distinct records by LSID regardless of UserID, but now that UserID is being imported I get error that 'records cannot be inserted due to primary key constraint.' As adding a UserID makes a record unique, but key fields are duplicated, which is not allowed by the constraint and I don't really care for multiple UserIDs for a key field - I just want whichever record is listed first during the import (arbitrary). As per PW's suggestion, I wrote the following statement, but it still brings all records with duplicate key field values. Can anyone tell why I am still getting duplicate key fields instead of the first occurrence of UserID? Does this solution not work for my scenario or do I need to further modify it? Please suggest whatever you think might help resolve this duplicate key fields import.

    Thanks.

    ********************************

    SELECT   A.PropeGUID, B.LSID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, B.PropTID, B.UserID

    FROM  dbo.PropR A

    INNER JOIN dbo.tmp_Listings B

    ON B.LSID = A.ListingID AND B.PropCID = A.PropCID AND B.SourceID = A.SourceID

    INNER JOIN

    (

      Select LSID, SourceID, PropCID, MIN(UserID) AS FirstUserID

      From   dbo.tmp_Listings

      Group By LSID, SourceID, PropCID

    ) vtFirstListingByUserID

      On (vtFirstListingByUserID.LSID = A.LISTINGID AND

          vtFirstListingByUserID.SourceID = A.SourceID AND

          vtFirstListingByUserID.PropCategoryID = A.PropCID )

    ********************************

  • Sure, based on your example + the user ID you have:

    ID1   ID2   ID3   Adress         City   Zip      UserID

    1      A      T     123 My St.    My    12345    ME

    ID1   ID2   ID3   Adress         City   Zip      UserID

    1      A      T     123 My St.    My    12345    You

    In this case, if you group by ID1, ID2, ID3, Address, City, and Zip while performing a MIN (userID) you will have all indentical information plust "FirstUserID" = ME.

    Therefore, you can't just join on the original columns because you have more than on match per unique ID1, ID2, ID3, Address, City, and Zip. 

    You will need to add a join condition on FirstUserID (which is 'ME') so you only select one of the two (or more) records.

    Based on PW's example, you should be looking at something like (I have only changed the last line):

    SELECT   A.PropeGUID, B.LSID, B.SourceID, B.Address, B.City, B.Zip, B.Price, B.PropCID, B.PropTID, B.UserID

    FROM  dbo.PropR A

    INNER JOIN dbo.tmp_Listings B

    ON B.LSID = A.ListingID AND B.PropCID = A.PropCID AND B.SourceID = A.SourceID

    INNER JOIN

    (

      Select LSID, SourceID, PropCID, MIN(UserID) AS FirstUserID

      From   dbo.tmp_Listings

      Group By LSID, SourceID, PropCID

    ) vtFirstListingByUserID

      On (vtFirstListingByUserID.LSID = A.LISTINGID AND

          vtFirstListingByUserID.SourceID = A.SourceID AND

          vtFirstListingByUserID.PropCategoryID = A.PropCID

     vtFirstListingByUserID.FirstUserID=A.UserID )

  • Thanks, Ryan! That was it!

    JN

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply