Duplicate records

  • Hi All:
    I appreciate your help with this situation:

    After creating a union between 2 tables in a CTE, there are some contracts duplicates, like these:
    ContractNum    user name
    G0110641    John Doe                        
    G0110641    N/A
        
    G0110648    Jane Doe                            
    G0110648    N/A

    The duplicated should be removed from the query and leave only those rows which the username is not "N/A", as these:
    ContractNum   User name
    G0110641       John Doe                        
    G0110648       Jane Doe                         

    Is it any way to do this?

    Thanks in advance

    Mary

  • WITH CTE_NA AS
    (
    SELECT
     ContractNum
    , Name
    , ROW_NUMBER()OVER(PARTITION BY ContractNum Order BY CASE WHEN Name='N/A' THEN 1 ELSE 2 END) AS seq
    FROM #USER
    )

    SELECT * FROM CTE_NA WHERE seq=2

  • This can be simplified by using ISNULL and GROUP BY
    😎


    USE TEEST;
    GO
    SET NOCOUNT ON;

    ;WITH SAMPLE_DATA(CONTRACT_NUM,USERNAME) AS
    (
      SELECT 'G0110641','John Doe ' UNION ALL
      SELECT 'G0110641','N/A'       UNION ALL
      SELECT 'G0110648','N/A'       UNION ALL
      SELECT 'G0110648','Jane Doe'  UNION ALL
      SELECT 'G0110648','Jane Doe'
    )
    SELECT
      SD.CONTRACT_NUM
     ,MAX(NULLIF(SD.USERNAME,'N/A')) AS USERNAME
    FROM  SAMPLE_DATA  SD
    GROUP BY SD.CONTRACT_NUM;

    Output

    CONTRACT_NUM USERNAME
    ------------ ---------
    G0110641     John Doe
    G0110648     Jane Doe

  • msramos - Friday, August 10, 2018 2:32 PM

    Hi All:
    I appreciate your help with this situation:

    After creating a union between 2 tables in a CTE, there are some contracts duplicates, like these:
    ContractNum    user name
    G0110641    John Doe                        
    G0110641    N/A
        
    G0110648    Jane Doe                            
    G0110648    N/A

    The duplicated should be removed from the query and leave only those rows which the username is not "N/A", as these:
    ContractNum   User name
    G0110641       John Doe                        
    G0110648       Jane Doe                         

    Is it any way to do this?

    Thanks in advance

    Mary


    Why not this?
    WITH SAMPLE_DATA(CONTRACT_NUM,USERNAME) AS
    (
    SELECT 'G0110641','John Doe ' UNION ALL
    SELECT 'G0110641','N/A'   UNION ALL
    SELECT 'G0110648','N/A'   UNION ALL
    SELECT 'G0110648','Jane Doe' UNION ALL
    SELECT 'G0110648','Jane Doe' UNION ALL
    SELECT 'G0110646','Jammy Doe' UNION ALL
    SELECT 'G0110647','Jane Doe'
    )
    SELECT CONTRACT_NUM,USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from SAMPLE_DATA
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP

    Saravanan

  • As we do on this site, we don't assume, we state the facts.
    So, just to put things into perspective, some solutions are less equal than others, there are cardinality induced time bombs and last but not least, deduplication implies that no duplicates exists in the output 😉
    😎
    On that note, here is a test harness to play around with, feel free to comment and improve it, since I'm very busy with work and just threw this together on the fly.

    USE TEEST;
    GO
    SET NOCOUNT ON;
    -- https://www.sqlservercentral.com/Forums/1985803/Duplicate-records

    DECLARE @SAMPLE_SIZE  INT   = 10000000;
    DECLARE @INT_BUCKET  INT   =   0;
    DECLARE @CHAR_BUCKET  VARCHAR(50) =   '';
    DECLARE @TEST_SIZE  INT   =  1000;
    DECLARE @timer TABLE (T_TXT VARCHAR(100) NOT NULL,T_RCOUNT INT NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()));

    --/* -- TEST HARNESS DATA, UNCOMMENT THIS LINE TO BYPASS THE DATA SET GENERATION
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE_FULL_SET') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE_FULL_SET;
    CREATE TABLE dbo.TBL_TEST_DEDUPE_FULL_SET
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_FULL_SET_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)
    INSERT INTO dbo.TBL_TEST_DEDUPE_FULL_SET WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      CONCAT
      (
       STUFF('AAAAAAAAAA'
       ,(ABS(CHECKSUM(NEWID())) % 10) + 1
       ,1
       ,CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,10000 + (ABS(CHECKSUM(NEWID())) % 5000)
      )
    ,ISNULL(CONCAT
      (
       CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,CHAR(32)
       ,CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
      ) + CHAR(97 + NULLIF((ABS(CHECKSUM(NEWID())) % 4),0)),'N/A')
    FROM  NUMS NM
    ORDER BY NM.N ASC
    ;
    -- */
    -------------------------------------------------------------------------------
    -- TEST SIZE 1000
    SET @TEST_SIZE = 1000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    -------------------------------------------------------------------------------
    -- TEST SIZE 5000
    SET @TEST_SIZE = 5000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    -- CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 10000
    SET @TEST_SIZE = 10000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 50000
    SET @TEST_SIZE = 50000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 100000
    SET @TEST_SIZE = 100000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 500000
    SET @TEST_SIZE = 500000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 1000000
    SET @TEST_SIZE = 1000000;
    -------------------------------------------------------------------------------

    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 5000000
    SET @TEST_SIZE = 5000000;
    -------------------------------------------------------------------------------

    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 10000000
    SET @TEST_SIZE = 10000000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD
    WHERE SD.USERNAME <> 'N/A'
    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    SELECT
      T.T_TXT
     ,T.T_RCOUNT
     ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
    FROM @timer T
    GROUP BY T.T_TXT
       ,T.T_RCOUNT
    ORDER BY DURATION;

    And the results on one of my mediocre i5 16Gb laptops with a rusty spinner

    To summarize:

    • The ROW_NUMBER method goes bonkers when reaching 100K entries
    • The COUNT method has a hash match which does not scale too good
    • All methods rely on the right index being in place, otherwise they are all equally slow.
  • Eirikur Eiriksson - Sunday, August 12, 2018 7:35 AM

    As we do on this site, we don't assume, we state the facts.
    So, just to put things into perspective, some solutions are less equal than others, there are cardinality induced time bombs and last but not least, deduplication implies that no duplicates exists in the output 😉
    😎
    On that note, here is a test harness to play around with, feel free to comment and improve it, since I'm very busy with work and just threw this together on the fly.

    USE TEEST;
    GO
    SET NOCOUNT ON;
    -- https://www.sqlservercentral.com/Forums/1985803/Duplicate-records

    DECLARE @SAMPLE_SIZE  INT   = 10000000;
    DECLARE @INT_BUCKET  INT   =   0;
    DECLARE @CHAR_BUCKET  VARCHAR(50) =   '';
    DECLARE @TEST_SIZE  INT   =  1000;
    DECLARE @timer TABLE (T_TXT VARCHAR(100) NOT NULL,T_RCOUNT INT NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()));

    --/* -- TEST HARNESS DATA, UNCOMMENT THIS LINE TO BYPASS THE DATA SET GENERATION
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE_FULL_SET') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE_FULL_SET;
    CREATE TABLE dbo.TBL_TEST_DEDUPE_FULL_SET
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_FULL_SET_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)
    INSERT INTO dbo.TBL_TEST_DEDUPE_FULL_SET WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      CONCAT
      (
       STUFF('AAAAAAAAAA'
       ,(ABS(CHECKSUM(NEWID())) % 10) + 1
       ,1
       ,CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,10000 + (ABS(CHECKSUM(NEWID())) % 5000)
      )
    ,ISNULL(CONCAT
      (
       CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,CHAR(32)
       ,CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
       ,LOWER(CHAR(65 + (ABS(CHECKSUM(NEWID())) % 26)))
      ) + CHAR(97 + NULLIF((ABS(CHECKSUM(NEWID())) % 4),0)),'N/A')
    FROM  NUMS NM
    ORDER BY NM.N ASC
    ;
    -- */
    -------------------------------------------------------------------------------
    -- TEST SIZE 1000
    SET @TEST_SIZE = 1000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    -------------------------------------------------------------------------------
    -- TEST SIZE 5000
    SET @TEST_SIZE = 5000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    -- CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 10000
    SET @TEST_SIZE = 10000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 50000
    SET @TEST_SIZE = 50000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 100000
    SET @TEST_SIZE = 100000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 500000
    SET @TEST_SIZE = 500000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 1000000
    SET @TEST_SIZE = 1000000;
    -------------------------------------------------------------------------------

    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 5000000
    SET @TEST_SIZE = 5000000;
    -------------------------------------------------------------------------------

    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE  TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD

    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    -------------------------------------------------------------------------------
    -- TEST SIZE 10000000
    SET @TEST_SIZE = 10000000;
    -------------------------------------------------------------------------------
    IF OBJECT_ID(N'dbo.TBL_TEST_DEDUPE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DEDUPE;
    CREATE TABLE dbo.TBL_TEST_DEDUPE
    (
      TD_ID    INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DEDUPE_TD_ID PRIMARY KEY CLUSTERED
     ,CONTRACT_NUM  VARCHAR(20)   NOT NULL
     ,USERNAME   VARCHAR(50)   NOT NULL
    );
    INSERT INTO dbo.TBL_TEST_DEDUPE WITH (TABLOCKX) (CONTRACT_NUM,USERNAME)
    SELECT
      TF.CONTRACT_NUM
     ,TF.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE_FULL_SET  TF
    WHERE TF.TD_ID        <= @TEST_SIZE;
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_CONTRACT_NUM_INCL_USERNAME ON dbo.TBL_TEST_DEDUPE(CONTRACT_NUM ASC) INCLUDE (USERNAME);
    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DEDUPE_USERNAME_INCL_CONTRACT_NUM ON dbo.TBL_TEST_DEDUPE(USERNAME ASC) INCLUDE (CONTRACT_NUM);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = TD.CONTRACT_NUM
     ,@CHAR_BUCKET = TD.USERNAME
    FROM  dbo.TBL_TEST_DEDUPE TD
    ORDER BY TD.USERNAME ASC
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('DRY RUN',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);
    SELECT
    @CHAR_BUCKET = SD.CONTRACT_NUM
    ,@CHAR_BUCKET = MAX(SD.USERNAME)
    FROM dbo.TBL_TEST_DEDUPE SD
    WHERE SD.USERNAME <> 'N/A'
    GROUP BY SD.CONTRACT_NUM
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('GROUP BY',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);
    SELECT
      @CHAR_BUCKET = CONTRACT_NUM
     ,@CHAR_BUCKET = USERNAME FROM
    (
    select CONTRACT_NUM,USERNAME,COUNT(CONTRACT_NUM) OVER(PARTITION BY CONTRACT_NUM) AS CNT
    from dbo.TBL_TEST_DEDUPE
    GROUP BY CONTRACT_NUM,USERNAME
    HAVING USERNAME<>'N/A'
    )DEDUP
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('COUNT',@TEST_SIZE);

    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);
    ;WITH CTE_NA AS
    (
    SELECT
    TD.CONTRACT_NUM
    , TD.USERNAME
    , ROW_NUMBER()OVER(PARTITION BY TD.CONTRACT_NUM Order BY CASE WHEN TD.USERNAME='N/A' THEN 1 ELSE 2 END) AS seq
    FROM dbo.TBL_TEST_DEDUPE TD
    )
    SELECT
      @CHAR_BUCKET = CN.CONTRACT_NUM
     ,@CHAR_BUCKET = CN.USERNAME
    FROM CTE_NA CN
    WHERE CN.seq=2
    OPTION (MAXDOP 1, RECOMPILE);
    INSERT INTO @timer(T_TXT,T_RCOUNT) VALUES ('ROW_NUMBER',@TEST_SIZE);

    SELECT
      T.T_TXT
     ,T.T_RCOUNT
     ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
    FROM @timer T
    GROUP BY T.T_TXT
       ,T.T_RCOUNT
    ORDER BY DURATION;

    And the results on one of my mediocre i5 16Gb laptops with a rusty spinner

     

    • The ROW_NUMBER method goes bonkers when reaching 100K entries
    • The COUNT method has a hash match which does not scale too good
    • All methods rely on the right index being in place, otherwise they are all equally slow.

    1) First of all thanks for taking your time and posting the performance response of all above mentioned SQL queries.
    2) I do agree now that your query performs better.
    3) Honestly I didn't understand some of the things you written and haven't come across some of the things you written. I use SQL 2008  R2 in my old laptop .Thanks for helping us learn new things and it will take few hours to analyse your query.
    4) If I had any clarification will get back to you 🙂 .

    Saravanan

  • Thanks to All for your help! 🙂 😀🙂

  • saravanatn - Sunday, August 12, 2018 12:19 PM

    1) First of all thanks for taking your time and posting the performance response of all above mentioned SQL queries.
    2) I do agree now that your query performs better.
    3) Honestly I didn't understand some of the things you written and haven't come across some of the things you written. I use SQL 2008  R2 in my old laptop .Thanks for helping us learn new things and it will take few hours to analyse your query.
    4) If I had any clarification will get back to you 🙂 .

    You are very welcome and of course, feel free to ping back with any questions.
    😎

  • msramos - Monday, August 13, 2018 6:40 AM

    Thanks to All for your help! 🙂 😀🙂

    You are very welcome.
    😎

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply