Eliminating Duplicate Rows

  • Ok Vinu,

    Tabel Structure

    doc_header (docno [alphanumeric], docno_i [integer], description [alphanumeric], doc_type [alphanumeric])

    doc_history (docno_i [integer], doc_status [alphanumeric], perf_date [integer])

    doc_signoff_tree (docno_i [integer], ref_docno_i [integer])

    doc_replacement (docno_i [integer], ref_docno_i [integer], replacement_code [alphanumeric])

    Sample Data

    doc_header

    [docno] | docno_i | description| doc_type

    ABC-11-PO| 14027| REMOVE HAT | G

    RRF-98-LL| 7654| UPDATE SHEET | G

    YTH-12-MU| 10189| PLAY OUTSIDE | G

    RET-79-TG| 6174| GOTO SCHOOL| G

    EE-201-AD| 2245| HAT | AD

    TRU-90-JK| 14526| EAT MANGO | G

    HRT-34-II| 7586| MANGO | EAD

    doc_history

    docno_i | doc_status| perf_date

    6174| R| 14578

    10189| R| 14664

    7654| C| 14486

    10189| R| 14742

    6174| R| 14478

    14027| C| 14526

    14526| C| 17589

    6174| R| 14675

    doc_signoff_tree

    docno_i | ref_docno_i

    2245| 10189

    14027| 14027

    7654| 7654

    14526| 14526

    10189| 10189

    6174| 6174

    7586| 14526

    doc_replacement

    docno_i| ref_docno_i| replacement_code

    14027| 337| S

    6174| 88964| P

    And Output is

    Output

    m.docno| m.adref| m.perf_date

    RET-79-TG| -| 14478

    RRF-98-LL| -| 14486

    ABC-11-PO| -| 14526

    RET-79-TG| -| 14578

    YTH-12-MU| EE-201-AD| 14664

    YTH-12-MU| -| 14664

    RET-79-TG| -| 14675

    YTH-12-MU| EE-201-AD| 14742

    YTH-12-MU| -| 14742

    Hoping for positive response very soon.

  • Are you using MSSQLServer??.....Doesn't look like it to me

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] ๐Ÿ˜‰

  • Refer R.P.Rozema post above. It is Sybase adaptive server (SQL J).

    Not MSSQL.

  • Sorry, I fail to understand the logic used for your output. Why isn't "TRU-90-JK" in the output? And why do you have so many rows for "YTH-12-MU"?

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] ๐Ÿ˜‰

  • Ok, lets go about this another way: do you need to remove the duplicates as a one-off task, or are you writing some code that will be used many more times?



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Vinu,

    1. "TRU-90-JK" is not in output b'coz of this condition -

    and h.perf_date between 14411 and 14755

    and for "TRU-90-JK" perf_date is

    doc_history

    14526 | C | 17589

    2. Query create the join on docno_i

    where d.docno_i = h.docno_i

    and h.docno_i = s.ref_docno_i

    The document "YTH-12-MU" has deen performed twice (refer doc_history table with docno_i - 10189 having perf_date 14664 and 14742 (different date). so appears two bunch of "YTH-12-MU" one for each date (refer output for dates).

    Why two record for "YTH-12-MU" - one with adref value & other without adref ?

    Refer doc_signoff_tree contains two record for this doc.

    docno_i | ref_docno_i

    2245 | 10189

    10189 | 10189

    for docno_i 2245 doc_type is 'AD'. So adref value is present (1st rec) as per query condition,

    x.doc_type in ('AD', 'AAD', 'APPL', 'EAD')

    For docno_i 10189 doc_type is 'G' So adref value is not present (2nd rec).

    Hope you understand.

  • R.P.Rozema,

    I want to remove the duplicates as a one-off task.

  • Hi All,

    No any comments, What happen? I am waiting for answer. I was on leave for a week. Pls give me some solution.

  • prakashp 84206 (6/14/2012)


    Hi All,

    No any comments, What happen? I am waiting for answer. I was on leave for a week. Pls give me some solution.

    You are likely to find your code much easier to maintain if you use standard coding conventions. This query might replace the first part of the UNION query you posted earlier. I'd recommend you test it against your data, modify it if necessary, and post back here.

    SELECT

    d.docno,

    adref = x.docno,

    h.perf_date

    FROM doc_header d

    INNER JOIN doc_history h

    ON h.docno_i = d.docno_i

    AND h.doc_status IN ('C','R')

    AND h.perf_date BETWEEN 14411 AND 14755

    INNER JOIN doc_signoff_tree s

    ON s.ref_docno_i = h.docno_i

    LEFT JOIN doc_replacement t

    ON t.docno_i = s.docno_i

    AND t.replacement_code = 'SB'

    INNER JOIN doc_header x -- INNER is assumed by HAVING filter on x.docno

    ON x.docno_i = ISNULL(t.ref_docno_i, s.docno_i)

    AND x.doc_type IN ('AD', 'AAD', 'APPL', 'EAD')

    WHERE x.docno <> ' '

    GROUP BY d.docno

    HAVING COUNT(*) > 1

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks ChrisM, for your efforts for me.

    But it gives me error "Out of Memory" and not able to execute. What to do?

  • prakashp 84206 (6/6/2012)


    R.P.Rozema,

    I want to remove the duplicates as a one-off task.

    Since you're talking with MS SQL experts here, I would like to suggest you import the data into a SQL server using SSIS. There is a very easy-to-use 'import data...' menu item in Microsoft SQL server Management Studio if you right click a database in the 'Object Explorer' and then choose 'Tasks' in the context menu that pops up. This will start a dialog that will enable you to use SSIS to import the data form your data source.

    Forum members here will be able to provide you with a multitude of methods to remove the duplicates in T-SQL. After that is done you can then replace the original data with the cleaned data using the reverse process, 'Export Data...'.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • prakashp 84206 (6/15/2012)


    Thanks ChrisM, for your efforts for me.

    But it gives me error "Out of Memory" and not able to execute. What to do?

    Fix it - something we are unable to do without access to your tables, or at least some sample data to experiment with.

    Start by changing to ANSI join syntax.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Dear ChrisM,

    Ok I will try to fix it.

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply