duplicate data

  • Select count (*) from tower.nbdocdeletedrecords

    Where scandate = '12/13/2008'

    and slevel = 6

    Returns 5798 records

    select distinct towid from tower.NBdocdeletedrecords where slevel=6 and scandate = '12/13/2008' and username = 'SQLKofc'

    returns 2899 records which is correct

    How do I delete the duplicates???

  • 1.

    your first query doesnt contain username condition

    Select count (*) from tower.nbdocdeletedrecords

    Where scandate = '12/13/2008'

    and slevel = 6

    and username = 'SQLKofc'

    2. is towID primary key on the table? OR can it be duplicated?



    Pradeep Singh

  • here is the table:

    CREATE TABLE [tower].[NBDocUpdatedRecords] (

    [PresentDate] [datetime] NULL ,

    [SystemUser] [char] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [npages] [int] NULL ,

    [ifnds] [int] NULL ,

    [ifnid] [int] NULL ,

    [datatype] [int] NULL ,

    [win] [varchar] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [status] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [polnum] [varchar] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [multi] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [slevel] [smallint] NULL ,

    [descript] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [scandate] [datetime] NULL ,

    [boxid] [varchar] (11) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [username] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [batchtyp] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [memfnam] [varchar] (35) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [miscflag] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [spare1] [varchar] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [spare2] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [date1] [datetime] NULL ,

    [agent] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [batchnam] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [Country] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [qaoper] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [indxoper] [nvarchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [claimnum] [varchar] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [memnum] [varchar] (7) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [dob] [datetime] NULL ,

    [spare3] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [spare4] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [date2] [datetime] NULL ,

    [date3] [datetime] NULL ,

    [int1] [int] NULL ,

    [int2] [int] NULL ,

    [towid] [int] NOT NULL ,

    [ReplicationID] [int] IDENTITY (1, 1) NOT NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [tower].[NBDocUpdatedRecords] WITH NOCHECK ADD

    CONSTRAINT [PK_NBDocUpdatedRecords] PRIMARY KEY CLUSTERED

    (

    [ReplicationID]

    ) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    ALTER TABLE [tower].[NBDocUpdatedRecords] ADD

    CONSTRAINT [DF_NBDocUpdatedRecords_PresentDate] DEFAULT (getdate()) FOR [PresentDate]

    GO

    towid is not primary key

  • delete from nbdocdeletedrecords where replicationID in (

    select min(replicationID) from nbdocdeletedrecords

    group by towid

    having count(*)>1

    )

    ** ReplicationID is primary Key

    **deleting based on towid... min(replicationID)



    Pradeep Singh

  • Wait a minute! What are you calling a duplicate!? You mean to tell me that no other username can make an entry on that date for that sLevel?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ps, your solution will delete only part of candidates.

    If there are 3 identical rows it will leave 3rd row in the table.

    Try this:

    DELETE R

    FROM [tower].[NBDocUpdatedRecords] R

    LEFT JOIN ( select MIN(ReplicationID), {Unique key}

    from [tower].[NBDocUpdatedRecords]

    group by {Unique key}) U ON U. {Unique key} = R. {Unique key}

    WHERE U.{Unique key} IS NULL

    And don't forget to enforce unique constraint on the columns which are logically unique key.

    Then duplicates wil have no chance to appear.

    _____________
    Code for TallyGenerator

  • thanks!

    Tried this:

    DELETE R

    FROM [tower].[NBDocUpdatedRecords] R

    LEFT JOIN ( select MIN(ReplicationID), {Unique key}

    from [tower].[NBDocUpdatedRecords]

    group by {Unique key}) U ON U. {Unique key} = R. {Unique key}

    WHERE U.{Unique key} IS NULL

    and got this:

    [Microsoft][ODBC SQL Server Driver]Syntax error or access violation

  • You should replace "{Unique Key}" with column(s) which intend to hold unique values.

    In your case it's probably TowID, but you should know better.

    _____________
    Code for TallyGenerator

  • oh boy.. I must be half brain dead. Thank you! that worked.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply