Need help selecting which record to keep for duplicate users

  • Here is some sample data to work with:

    --===== If #MyTable already exists, drop it

    IF OBJECT_ID('TempDB..#MyTable','U') IS NOT NULL

    DROP TABLE #MyTable

    --===== Create #MyTable

    CREATE TABLE [dbo].[#MyTable](

    PositionID VARCHAR(10) NOT NULL,

    Username VARCHAR(50) NULL,

    JobCode varchar(10) NULL,

    PositionTypeCode VARCHAR(10)NULL,

    CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED

    (

    [PositionID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    --===== Insert test data into mytable

    INSERT INTO #MyTable

    ( PositionID ,

    Username ,

    JobCode ,

    PositionTypeCode

    )

    SELECT '1AB','user1','1111','AMP' UNION ALL

    SELECT '2AB','user1','1234','FAC' UNION ALL

    SELECT '3AB','user2','6577','AMP' UNION ALL

    SELECT '4AB','user2','1258','AMP' UNION ALL

    SELECT '5AB','user3','4567','FAC' UNION ALL

    SELECT '6AB','user3','2345','FAC' UNION ALL

    SELECT '7AB','user4','5666','AMP' UNION ALL

    SELECT '8AB','user4','5666','AMP' UNION ALL

    SELECT '9AB','user5','4578','FAC' UNION ALL

    SELECT '10AB','user5','0045','STU' UNION ALL

    SELECT '11AB','user6','5556','CLS' UNION ALL

    SELECT '12AB','user6','2345','AMP'

    --=====Return Results

    SELECT PositionID ,

    Username ,

    JobCode ,

    PositionTypeCode FROM #MyTable

    ORDER BY Username

    --====Drop table

    DROP TABLE #MyTable

    From the dataset there are multiple records with multiple users and I have a set of business rules that tells me which record I need to keep for each user. The business rules are as follows:

    1) If a user has a PositionTypeCode = 'AMP' and 'FAC' I need to return the record of AMP.

    2) If a user has 2 PositionTypeCode ='AMP' and 2 different JobCode's I need to return the record with the lowest job code

    3) If a user has 2 PositionTypeCode = 'FAC' no records should be returned for this user

    4) If a user has 2 PositionTypeCode ='AMP' and 2 of the same JobCodes I need to return the first record SQL sees (I'm assuming Top 1)

    5) If a user has a PositionTypeCode = 'FAC' and PositionTypeCode ='STU' I need to return the 'FAC' record.

    6) If the user doesn't mean any of this criteria, return the first record SQL sees (again I am assuming Top 1)

    So following the above logic I would expect to see the following records from the sample dataset:

    PositionIDUsernameJobCodePositionTypeCode

    1AB user1 1111 AMP --1st business rule

    4AB user2 1258 AMP --2nd business rule

    7AB user4 5666 AMP --4th business rule(assuming top 1 returns this 1)

    9AB user5 4578 FAC --5th business rule

    11AB user6 5556 CLS --6th business rule (again assuming top 1)

    I am assuming I would have some nested if statements, but I am not sure how to scan the entire table applying these rules. Any help is greatly appreciated; I hope my question is clear.

  • Please show us what you've tried so far and where you get stuck.

    Your request sounds like we're supposed to code it for you. That's called consulting.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Should be enough to get you started...

    WITH Summary AS (

    SELECT Username ,

    SUM(CASE WHEN PositionTypeCode = 'AMP' THEN 1 ELSE 0 END) AS AMP,

    SUM(CASE WHEN PositionTypeCode = 'FAC' THEN 1 ELSE 0 END) AS FAC,

    COUNT(DISTINCT JobCode) AS DistinctJobCodes,

    MIN(JobCode) AS MinJobCode,

    COUNT(*) AS Total

    FROM #MyTable

    GROUP BY Username)

    SELECT t.PositionID ,

    t.Username ,

    t.JobCode ,

    t.PositionTypeCode

    FROM #MyTable t

    INNER JOIN Summary s ON t.Username=s.Username

    AND ((s.AMP=1 AND s.FAC=1 AND t.PositionTypeCode = 'AMP') -- Rule1

    OR (s.AMP=2 AND s.DistinctJobCodes=2 AND s.MinJobCode=t.JobCode) -- Rule2

    OR (s.FAC=2 AND 1=0)) -- Rule3

    ORDER BY t.Username;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks Mark, actually I started with a CTE and got a little stuck, thank you for the input.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply