Eliminate Duplicate rows

  • How to eliminate duplicate rows.

    EMPID   ENAME   ROLE

    100       Joe       Manager

    100       Joe       Clerk

    101       John      Admin

    101       John      Security

     

    I want to get only one record for 100 and 101. Please help.

    San

     

  • San

    How are you going to choose which row(s) to eliminate and which one to keep?  If you search this site for eliminating duplicates or something similar, you will find lots of threads from people trying to do exactly what you're trying to.

    John

  • It sounds like the problem is with design. To get only one record for each 100 and 101 I would recommend to normalize design of this table - to have 2 tables instead: EMP (EMPID, ENAME) and ROLES(ROLEID, EMPID, ROLENAME).

     

  • I have a PK defined on four columns. I need to eliminate the dups and get the first dup row so forth and so on for some other purpose. Hope this helps.

    Thanks

    San.

  • San

    Which four columns is your PK defined on?  There are only three in your sample data.

    John

  • Thansk for the reply John.

    EMPID   ENAME   ROLE   SAL  COMMENTS MGRID MGRNAME GROUP OTHER

    PK defined on EMPID,ENAME,ROLE,SAL.

    Let me know.

    Thanks

    San

     

  • San

    If the three columns in your sample data are all part of the primary key, then those rows aren't duplicates.  You're going to need to change the design of your table: perhaps only EMPID needs to be the primary key?  If that's the case then you will indeed have duplicates, and then you will have to decide how to eliminate them.  You can't just say you want to keep the "first", because a database table has no concept of order outside the primary key.

    John

  • Thanks John. I need to provide the data with the unique EMPID and hence need to remove the dups. The DB design is correct as far as our app is concerned.

    Thanks

    San.

  • What query are you running to get your results?  The problem is your tables are  not normalized as a person can perform more than one role which is why you are receiving duplicate data. If you only need the ID and name you can use the distinct keyword on those two columns and that will result in just one row for each ID and name combination. 

     

    Marvin Dillard
    Senior Consultant
    Claraview Inc

  •  I need all the columns. My data is something like this.

    PART_ID   DATE       PART_CD  REN_CD   STATE BN_NO BN_CD  BN_DS  DESC   ID

    206544    2006-01-01 DDD      11       TX    45789 TIN    OTHER  NULL   500

    206544    2006-01-01 DEA      11       TX    45789 TIN    OTHER  NULL   500

    206643    2006-01-01 PLI      10       NJ    3300  LPA    NULL   NULL   700

    206643    2006-01-01 TEA      10       NJ    3300  LPA    NULL   NULL   700

    Thans

    San.

  • Try distinct select and not select back the column that has the different data in it?

  • Nope. Its not working that way. Thanks

    San.

  • Which value do you want to keep and which to discard?  How do you know when it is correct?

    By the definition of a duplicate record there are none in the sample you have provided.  At some point you have to decide what to discard or this will not work in any way anyone can solve for you.  If you need to return something (anything) in the column you want to ignore the fact that the value is not duplicate, hard code in a value for it.

    select distinct ColA, 'B', ColC from TableA

    Other way of doing it would be to loop trough the data and discard every record that has the same PK values as the previous record.

  • Any SUggestions how to accomplish this.

    Thanks

    San.

  • Simple way to do it, not neccesarily the best...

     

    create

    table #T (

    C1 int)

    insert

    into #t values (100)

    insert

    into #t values (100)

    insert

    into #t values (200)

    insert

    into #t values (200)

    insert

    into #t values (300)

    Create

    Table #New (

    C1 int)

    declare

    @prev int, @Cur int

    DECLARE

    test_cursor CURSOR

    READ_ONLY

    FOR

    SELECT C1 FROM #T

    OPEN

    test_cursor

    FETCH

    NEXT FROM test_cursor INTO @Cur

    WHILE

    (@@fetch_status <> -1)

    BEGIN

    IF (@@fetch_status <> -2)

    BEGIN

    if @Cur <> @prev

    insert into #New

    select * from #T where C1 = @Cur

    set @prev = @Cur

    END

    FETCH NEXT FROM test_cursor INTO @Cur

    END

    CLOSE

    test_cursor

    DEALLOCATE

    test_cursor

    select

    * from #New

    GO

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply