deleting duplicates

  • Now that i have set the deleting envinronment i am in the processing of writing a code which can automatically select the duplicates then deletes.  Below is a code i am using to set the task. 

    -- declare all variables!

    DECLARE @iErrorVar int,

    @vchdssid varchar(12),

    @vchinterviewdate varchar(30),

    @vchselfproxy varchar(20),

    @iCount int,

    @chCount char(3),

    @nvchCommand nvarchar(4000)

     

    -- set initial environment

    SET ROWCOUNT 0

    SET NOCOUNT ON

    -- Build cursor to find duplicated information

    DECLARE DelDupe CURSOR FOR

    SELECT COUNT(*) AS Amount,

    dssid,

    interviewdate,

    selfproxy

    FROM testscreen

    GROUP BY dssid,interviewdate,selfproxy

    HAVING COUNT(*) > 1

     

    OPEN DelDupe

    FETCH NEXT FROM DelDupe INTO @iCount,

    @vchdssid,

    @vchinterviewdate,

    @vchselfproxy

    WHILE (@@fetch_status = 0)

    BEGIN

    -- Calculate number of rows to delete for each grouping by subtracting

    -- 1 from the total count for a given group.

    SELECT @iCount = @iCount - 1

    SELECT @chCount = CONVERT(char(3),@iCount)

    -- now build the rowcount and delete statements.

    SELECT @nvchCommand = N'SET ROWCOUNT ' + @chCount +

    'DELETE indtest' +

    ' WHERE dssid = ' + CHAR(34) + @vchdssid + CHAR(34) +

    ' AND interviewdate = ' + CHAR (34) + @vchinterviewdate + CHAR(34) +

    ' AND selfproxy = ' + CHAR(34) + @vchselfproxy + CHAR(34)

    -- print the statement. For your viewing pleasure only.

    PRINT @nvchCommand

    -- execute the statement.

    EXEC sp_executesql @nvchCommand

    SELECT @iErrorVar = @@Error

    IF @iErrorVar <> 0

    BEGIN

    RETURN

    END

    FETCH NEXT FROM DelDupe INTO @iCount,

    @vchdssid,

    @vchinterviewdate,

    @vchselfproxy

    END

    CLOSE DelDupe

    DEALLOCATE DelDupe

    RETURN

     

    When i run above code the following error message appears

    SET ROWCOUNT 1  DELETE indtest WHERE dssid = "012063D002" AND interviewdate = "January 17, 2006" AND selfproxy = "Self"

    Server: Msg 208, Level 16, State 1, Line 1

    Invalid object name 'indtest'.

    what can be the cause this error message and the soln?

     

  • Use single quotes around literals in SQL Server. To debug, try running your statement in Query Analyser and see what happens.

    By the way, this looks like a horrendously complex routine for what you are trying to achieve. I see that you are identifying dups based on a match on (dssid, interviewdate, selfproxy). Have you thought of doing this using a set-based procedure? It will be faster and simpler. How do you decide which of the duplicates to keep and which to delete?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • with commenting of the following two lines the code runs very well

    --EXEC sp_executesql @nvchCommand

    --PRINT @nvchCommand

    Phil it is true i selecting dups then deleting the extra cases. i.e. if they are two cases then it just delete one case.  The three variables automatically determines the duplicates.  i already have an delete insert trigger which backs up the duplicates such that i dont need to worry with the deleted record.  The reason why i need to remove the dups is because my system is bound to create dups. i am uploading data from several PDAs so if by any chance the uploading system is run more than once from the same PDA then there will be dups.

    The only problem is that it does do what it was intended.  it says successful yet the records are not deleted, record number remains the same.

    is this section correct?... may be I dont need the CHAR(34)....

    SELECT @nvchCommand = N'SET ROWCOUNT ' + @chCount +

    'DELETE ' +

    ' WHERE dssid = ' + CHAR(34) + @vchdssid + CHAR(34) +

    ' AND interviewdate = ' + CHAR (34) + @vchinterviewdate + CHAR(34) +

    ' AND selfproxy = ' + CHAR(34) + @vchselfproxy + CHAR(34)

     

  • Phil Parkin: How do i do it in a set-based procedure?  take into account that testscreen table has other variable apart from the three set of multi-column used to select the dups.

  • As I said, you need to use single quotes.

    declare @quote char(1)

    set @quote = ''''      --4 single quote characters

    'DELETE ' +

    ' WHERE dssid = ' + @quote + @vchdssid + @quote + ...

    might get you closer. But what about something like

    delete a

    from testscreen a join testscreen b

    on a.dssid = b.dssid and

    a.interviewdate = b.interviewdate and

    a.selfproxy = b.selfproxy and

    a.id > b.id

    (the ID field corresponds to the primary key of table testscreen). This deletes all dups in one hit, leaving the one with the 'lowest' PK (untested).

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply