DISTINCT problem

  • I have data that I am importing from a text file into SQL that has some records that I am unsure how to handle.

    DML/DDL:

    CREATE TABLE [AcctImports]

    (

    [ID] [int] IDENTITY(1,1) Not Null,

    [FirstName] [varchar] (50) Null,

    [LastName] [varchar] (50) Null,

    [ClientName] [varchar] (50) Null,

    [SSN] [varchar] (50) Null,

    [AccountNumber] [varchar] (50) Null,

    GO

    CREATE TABLE [Client]

    (

    [ID] [int] IDENTITY(1,1) Not Null,

    [FirstName] [varchar] (50) Null,

    [LastName] [varchar] (50) Null,

    [ClientName] [varchar] (50) Null,

    [SSN] [varchar] (50) Null

    )

    GO

    CREATE TABLE [Acct]

    (

    [ID] [int] IDENTITY(1,1) Not Null,

    [ClientID] [int] Null,

    [AccountNumber] [varchar] (50) Null,

    )

    GO

    Select 'Joe' as FirstName, 'Client' As LastName, 'Joe Client' As ClientName, '000-00-0000' As SSN, '111111' As AccountNumber

    Into [AcctImports]

    Union

    Select 'Joe' as FirstName, 'Client' As LastName, 'Joe Client Betelgeuse' As ClientName, '000-00-0000' As SSN, '222222' As AccountNumber

    Union

    Select 'Bob' as FirstName, 'Client' As LastName, 'Bob Client' As ClientName, '000-00-0009' As SSN, '333333' As AccountNumber

    The data contains all the information necessary to put both the Client, and their Accounts into their respective tables. However, since one Client may have several accounts in the AcctImports table, it is necessary to only set up a single Client in the Client table, and then add all their accounts to the Acct table by the AccountNumber and ClientID (ID Column from the Client Table)

    My initial method was to grab the DISTINCT SSN from the Import table, insert it to the Client table, then insert the remaining data by joining on SSN back to the Client table, like this:

    --STEP 1 INSERT MISSING SSNs (Only Once!!!)

    INSERT Client(SSN)

    SELECT DISTINCT SSN from AcctImports where SSN not in (select distinct SSN from Client)

    -- STEP 2 ADD DATA THAT MATCHES NEW SSNs

    Insert Client

    (FirstName, LastName, ClientName, SSN)

    SELECT S.firstName, S.LastName, S.ClientName

    FROM AcctImport S

    LEFT JOIN Client C ON s.SSN = C.ssn and c.FirstName is NULL

    As you can see, this will result in ALL the data, including the "Betelgeuse" row being set up as a Client when really all that should be added is the two rows for each distinct SSN to the Client table and the other data gets added to the Acct table. I've tried several different iterations joining on the Name fields etc., but to no avail.

    Can any of you guru's out there point me in the right direction? I don't want you to do the work for me, just a hint on how I might accomplish this would be helpful.

    TIA,

    KK

  • If I understand this correctly you want distinct Clients in Clients table. Why do you want the ClientName field in this table. My point is Client with SSN 000-00-0000 has two client names Joe Client and Joe Client Betelgeuse. Which Client name should be inserted ?

  • Just Joe Client in the Client table, but both Accounts for Joe in the Acct table. (One Account named "Joe Client", and one named "Joe Client Betelgeuse".) The ClientName is really the Account Name and not necessarily a Client. There are, however ridiculous it may seem, columns in the Client table for the clients First, Last and Full Name. But these should only contain "real" first and last names, and not "Account Names" even though they may be synonymous or somewhat similar. Since the import file is specifically geared towards the creation of the Accounts, all the information for each account is available. It is merely an added benefit that the data also contains enough information to also add missing Clients to the table. I just have to have only one of each SSN in the Client table.

    Having said that -

    One possible solution might be to put a constraint on the Client table that would disallow duplicate SSNs but that would entail additional error handling when the insert failed. Nonetheless, it is an option as I have full reign over the DB and can add any indices or constraints necessary based on the recommendations of this outstanding and always helpful forum.

    Thanks for the response!

    KK

  • If your SQL Version is 2005 and above you can use the following query to insert the distinct clients

    ;WITH Client1 AS

    (

    SELECTFirstName,LastName,ClientName,SSN,RANK() OVER (PARTITION BY SSN ORDER BY ClientName) AS Rnk

    FROMAcctImports

    WHERESSN NOT IN (Select SSN From Client)

    )

    INSERT INTO Client

    SELECT FirstName,LastName,ClientName,SSN

    FROM Client1

    WHERE Rnk=1

  • Of course, CTE! Why didn't I think of that! (It's been a long week! LOL)

    Thanks a million!

    Have a great weekend!

    KK

  • The only mod I made was that I had to add DISTINCT to the CTE. It worked flawlessly! Thanks again!

    I have posted final version for the benefit of future generations... 🙂

    WITH Client1 AS

    (

    SELECT DISTINCT FirstName,LastName,ClientName,SSN,RANK() OVER (PARTITION BY SSN ORDER BY ClientName) AS Rnk

    FROM AcctImports

    WHERE SSN NOT IN (Select SSN From Client)

    )

    INSERT INTO Client

    SELECT FirstName,LastName,ClientName,SSN

    FROM Client1

    WHERE Rnk=1

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply