Duplicate record check Left or Inner Join ?

  • Hi,

    I am writing a query to find duplicate records in a table,

    My question is should I use Inner, Left or Left Outer Join ? Which is correct

    select

    A.ID,

    A.NUM,

    A.ECD,

    Col1||Col2||Col3||Col4 as ERR_MSG_TXT

    from Table_A A

    [highlight=#ffff11]Inner join[/highlight]

    (

    SELECT COl1 as Col1,Col2 as COl2,Col3 as Col3,Col4 as Col4

    COUNT(*) AS RECCNT

    FROM Table_A

    GROUP BY Col1,Col2,COl3,COl4

    HAVING RECCNT > 1

    where ID=5628) REC

    on A.Col1=REC.COl1

    AND A.Col2=REC.Col2

    AND A.Col3=REC.Col3

    AND A.COl4=REC.COl4

    where A.ID=5628

    Thanks a lot

  • Neither, use OVER() with COUNT:

    ;WITH Dupechecker AS (

    SELECT

    A.ID,

    A.NUM,

    A.ECD,

    Col1||Col2||Col3||Col4 as ERR_MSG_TXT,

    dupecount = COUNT(*) OVER(PARTITION BY Col1,Col2,COl3,COl4)

    FROM Table_A

    WHERE A.ID = 5628

    )

    SELECT *

    FROM Dupechecker

    WHERE dupecount > 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Well Actually I am doing Insert select and table have only 4 columns

    In fact I started with OVER Function but then I also want my error_msg_taxt to display Concatenation of columns name.

    I dont how will fit Partition by in this case

    Insert into table_b

    A,

    B,

    C,

    D

    select

    A.ID,

    A.NUM,

    A.ECD,

    Col1||Col2||Col3||Col4 as ERR_MSG_TXT

    from Table_A A

    Inner join

    (

    SELECT COl1 as Col1,Col2 as COl2,Col3 as Col3,Col4 as Col4

    COUNT(*) AS RECCNT

    FROM Table_A

    GROUP BY Col1,Col2,COl3,COl4

    HAVING RECCNT > 1

    where ID=5628) REC

    on A.Col1=REC.COl1

    AND A.Col2=REC.Col2

    AND A.Col3=REC.Col3

    AND A.COl4=REC.COl4

    where A.ID=5628

    Thanks for the reply

  • HiralChhaya (3/28/2013)


    table have only 4 columns

    You've listed 7 columns in table A in your query - it would help us if you were to correct the column names.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • They are 4

    select

    A.ID, 1

    A.NUM, 2

    A.ECD, 3

    Col1||Col2||Col3||Col4 as ERR_MSG_TXT 4 this is concatenation of columns of Table_A, the error_mesg_text will display duplicate rows data

    Thanks

  • Like this

    ID ROW CD Error_Msg_txt (concatination of Col1,Col2,COl3Col4 from table_A)

    1 4 DUPBDGT FY12 BudFcst 00704

    2 5 DUPBDGT FY12 BudFcst 00704

  • HiralChhaya (3/28/2013)


    They are 4

    select

    A.ID, 1

    A.NUM, 2

    A.ECD, 3

    Col1||Col2||Col3||Col4 as ERR_MSG_TXT 4 this is concatenation of columns of Table_A, the error_mesg_text will display duplicate rows data

    Thanks

    -- why not retain the original column names instead of changing to col1,col2...?

    -- It makes a simple query look more complicated than it should.

    Insert into table_b

    (A,

    B,

    C,

    D) -- brackets required

    select

    A.ID,

    A.NUM,

    A.ECD,

    Col1+Col2+Col3+Col4 as ERR_MSG_TXT -- double-pipe is not TSQL concatenation operator

    from Table_A A

    Inner join

    (

    SELECT ID AS COl1, NUM AS Col2, ECD AS Col3, ? AS Col4, -- <<<<< what is this column name?

    COUNT(*) AS RECCNT

    FROM Table_A

    WHERE ID = 5628 -- correct position

    --GROUP BY Col1, Col2, COl3, COl4 -- won't work

    GROUP BY ID, NUM, ECD, ? -- <<<<< what is this column name?

    HAVING COUNT(*) > 1 --

    ) REC

    ON A.ID = REC.Col1

    AND A.NUM = REC.Col2

    AND A.ECD = REC.Col3

    AND A.? = REC.COl4 -- <<<<< what is this column name?

    -- where A.ID=5628 -- not needed, it's in the JOIN

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • So Inner Join will work... right? I was confused weather to use Left Outer Join or not.

    Thanks a lot for help

  • HiralChhaya (3/28/2013)


    So Inner Join will work... right? I was confused weather to use Left Outer Join or not.

    Thanks a lot for help

    Yes, it will work.

    You will have to correct the mistakes in the rest of the query.

    Test the inner query first, fix the errors.

    Then test the whole SELECT query without the INSERT.

    Finally test with the INSERT.

    It will almost certainly take twice as long to run as the alternative I suggested and is more complicated code.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Sorry for the confusion here is the actual query

    yeah Table_A have 7 columns 🙂

    ID,ROW_NUM,Error_CD,SRC_YR,SRC_CD,SRC_ID,SRC_NM

    Insert into table_b

    (ID,

    ROW_NUM,

    Error_CD,

    ERR_MSG_TXT) -- brackets required

    select

    A.ID,

    A.ROW_NUM,

    Error_CD,

    SRC_YR+SRC_CD+SRC_ID+SRC_NM as ERR_MSG_TXT -- double-pipe is not TSQL concatenation operator

    from Table_A A

    Inner join

    (

    SELECT SRC_YR AS SRC_YR, SRC_CD AS SRC_CD, SRC_ID AS SRC_ID, SRC_NM AS SRC_NM,

    COUNT(*) AS RECCNT

    FROM Table_A

    WHERE ID = 5628 -- correct position

    GROUP BY SRC_YR, SRC_CD, SRC_ID, SRC_NM

    HAVING COUNT(*) > 1 --

    ) REC

    ON A.SRC_YR = REC.SRC_YR

    AND A.SRC_CD = REC.SRC_CD

    AND A.SRC_ID = REC.SRC_ID

    AND A.SRC_NM = REC.SRC_NM -- <<<<< what is this column name?

    -- where A.ID=5628 -- not needed, it's in the JOIN

  • Try modifying this instead;

    ;WITH Dupechecker AS (

    SELECT ID, NUM, ECD, ?,

    dupecount = COUNT(*) OVER(PARTITION BY Col1,Col2,COl3,COl4)

    FROM Table_A

    WHERE ID = 5628

    )

    INSERT INTO table_b (A,B, C, D)

    SELECT

    ID, NUM, ECD,

    ERR_MSG_TXT =

    CAST(ID AS VARCHAR(?)) +

    CAST(NUM AS VARCHAR(?)) +

    CAST(ECD AS VARCHAR(?)) +

    CAST(? AS 'a text datatype')

    FROM Dupechecker

    WHERE dupecount > 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply