Eliminating Duplicate Rows

Question

Post reply

Eliminating Duplicate Rows

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

prakashp 84206 Old Hand Points: 303 More actions · Answer 1

Ok Vinu,

Tabel Structure

doc_header (docno [alphanumeric], docno_i [integer], description [alphanumeric], doc_type [alphanumeric])

doc_history (docno_i [integer], doc_status [alphanumeric], perf_date [integer])

doc_signoff_tree (docno_i [integer], ref_docno_i [integer])

doc_replacement (docno_i [integer], ref_docno_i [integer], replacement_code [alphanumeric])

Sample Data

doc_header

[docno] | docno_i | description| doc_type

ABC-11-PO| 14027| REMOVE HAT | G

RRF-98-LL| 7654| UPDATE SHEET | G

YTH-12-MU| 10189| PLAY OUTSIDE | G

RET-79-TG| 6174| GOTO SCHOOL| G

EE-201-AD| 2245| HAT | AD

TRU-90-JK| 14526| EAT MANGO | G

HRT-34-II| 7586| MANGO | EAD

doc_history

docno_i | doc_status| perf_date

6174| R| 14578

10189| R| 14664

7654| C| 14486

10189| R| 14742

6174| R| 14478

14027| C| 14526

14526| C| 17589

6174| R| 14675

doc_signoff_tree

docno_i | ref_docno_i

2245| 10189

14027| 14027

7654| 7654

14526| 14526

10189| 10189

6174| 6174

7586| 14526

doc_replacement

docno_i| ref_docno_i| replacement_code

14027| 337| S

6174| 88964| P

And Output is

Output

m.docno| m.adref| m.perf_date

RET-79-TG| -| 14478

RRF-98-LL| -| 14486

ABC-11-PO| -| 14526

RET-79-TG| -| 14578

YTH-12-MU| EE-201-AD| 14664

YTH-12-MU| -| 14664

RET-79-TG| -| 14675

YTH-12-MU| EE-201-AD| 14742

YTH-12-MU| -| 14742

Hoping for positive response very soon.

vinu512 SSCoach Points: 15729 More actions · Answer 2

Are you using MSSQLServer??.....Doesn't look like it to me

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

prakashp 84206 Old Hand Points: 303 More actions · Answer 3

Refer R.P.Rozema post above. It is Sybase adaptive server (SQL J).

Not MSSQL.

vinu512 SSCoach Points: 15729 More actions · Answer 4

Sorry, I fail to understand the logic used for your output. Why isn't "TRU-90-JK" in the output? And why do you have so many rows for "YTH-12-MU"?

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

R.P.Rozema SSChampion Points: 12317 More actions · Answer 5

Ok, lets go about this another way: do you need to remove the duplicates as a one-off task, or are you writing some code that will be used many more times?

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

prakashp 84206 Old Hand Points: 303 More actions · Answer 6

Vinu,

1. "TRU-90-JK" is not in output b'coz of this condition -

and h.perf_date between 14411 and 14755

and for "TRU-90-JK" perf_date is

doc_history

14526 | C | 17589

2. Query create the join on docno_i

where d.docno_i = h.docno_i

and h.docno_i = s.ref_docno_i

The document "YTH-12-MU" has deen performed twice (refer doc_history table with docno_i - 10189 having perf_date 14664 and 14742 (different date). so appears two bunch of "YTH-12-MU" one for each date (refer output for dates).

Why two record for "YTH-12-MU" - one with adref value & other without adref ?

Refer doc_signoff_tree contains two record for this doc.

docno_i | ref_docno_i

2245 | 10189

10189 | 10189

for docno_i 2245 doc_type is 'AD'. So adref value is present (1st rec) as per query condition,

x.doc_type in ('AD', 'AAD', 'APPL', 'EAD')

For docno_i 10189 doc_type is 'G' So adref value is not present (2nd rec).

Hope you understand.

prakashp 84206 Old Hand Points: 303 More actions · Answer 7

R.P.Rozema,

I want to remove the duplicates as a one-off task.

prakashp 84206 Old Hand Points: 303 More actions · Answer 8

Hi All,

No any comments, What happen? I am waiting for answer. I was on leave for a week. Pls give me some solution.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

prakashp 84206 (6/14/2012)
Hi All,
No any comments, What happen? I am waiting for answer. I was on leave for a week. Pls give me some solution.

You are likely to find your code much easier to maintain if you use standard coding conventions. This query might replace the first part of the UNION query you posted earlier. I'd recommend you test it against your data, modify it if necessary, and post back here.

SELECT

d.docno,

adref = x.docno,

h.perf_date

FROM doc_header d

INNER JOIN doc_history h

ON h.docno_i = d.docno_i

AND h.doc_status IN ('C','R')

AND h.perf_date BETWEEN 14411 AND 14755

INNER JOIN doc_signoff_tree s

ON s.ref_docno_i = h.docno_i

LEFT JOIN doc_replacement t

ON t.docno_i = s.docno_i

AND t.replacement_code = 'SB'

INNER JOIN doc_header x -- INNER is assumed by HAVING filter on x.docno

ON x.docno_i = ISNULL(t.ref_docno_i, s.docno_i)

AND x.doc_type IN ('AD', 'AAD', 'APPL', 'EAD')

WHERE x.docno <> ' '

GROUP BY d.docno

HAVING COUNT(*) > 1

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

prakashp 84206 Old Hand Points: 303 More actions · Answer 10

Thanks ChrisM, for your efforts for me.

But it gives me error "Out of Memory" and not able to execute. What to do?

R.P.Rozema SSChampion Points: 12317 More actions · Answer 11

prakashp 84206 (6/6/2012)
R.P.Rozema,
I want to remove the duplicates as a one-off task.

Since you're talking with MS SQL experts here, I would like to suggest you import the data into a SQL server using SSIS. There is a very easy-to-use 'import data...' menu item in Microsoft SQL server Management Studio if you right click a database in the 'Object Explorer' and then choose 'Tasks' in the context menu that pops up. This will start a dialog that will enable you to use SSIS to import the data form your data source.

Forum members here will be able to provide you with a multitude of methods to remove the duplicates in T-SQL. After that is done you can then replace the original data with the cleaned data using the reverse process, 'Export Data...'.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

prakashp 84206 (6/15/2012)
Thanks ChrisM, for your efforts for me.
But it gives me error "Out of Memory" and not able to execute. What to do?

Fix it - something we are unable to do without access to your tables, or at least some sample data to experiment with.

Start by changing to ANSI join syntax.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

prakashp 84206 Old Hand Points: 303 More actions · Answer 13

prakashp 84206

Old Hand

Points: 303

June 15, 2012 at 3:45 am

#1501488

Dear ChrisM,

Ok I will try to fix it.