Avoiding Duplicates

  • At work we have an application that makes outbound dialing calls to our customers.  There are 2 servers configured to do the calling, 1 as primary and 1 as a backup.  All detain pertaining to the call results - such as connected, busy, answering machine, etc... - are written to log files.  These files are then imported into a SQL Server 2000 DB that I created. 

    My question is how can I prevent getting duplicate data in the event that the backup server also starts making & logging call info? (The information from the "primary" log gets copied to the log for the backup and thus any calls recored on the primary log will be dupicated when the data from the backup log from the backup server is imported.)

    Right now I have 2 tables I use for importing data - one for the primary log and one for the backup log.  I then take the data from these 2 tables and move it to the "master" call table that users can then report on.  That's all I have.  There's nothing that checks to make sure there's no duplicate data between the two import tables.  How can I check that duplicate data does not exist between these two tables before importing to the "master" table?

    Thanks,

    Brian

  • First question: How do you define a duplicate? (ie which column(s) do you consider as the natural key)

    /Kenneth

  • I'm defining duplicate as a record/row that is entirely the same between the two import tables.

    For example, if our primary server is doing all the calling it will record things like:

    1.  Name of person

    2.  Phone # of person

    3.  Result of call

    4.  Date of call

    5.  Time of call

    All of this information will be written to a text file called primary_log.txt.  If something happens to our primary server and the backup/stand-by server is used, all the information from the primary_log.txt file gets copied to the log file - secondary_log.txt - file along with the call results made from the secondary server.

    When the DTS package imports the data, it will import from both log files.  The problem is that the information from the primary_log.txt file would be duplicated since it would also be included on the import from the secondary_log.txt file.  That's what I'm considering duplicate data and want to avoid in the master table for end users.

    Any ideas or suggestions on how I can avoid this data is greatly appreciated.

    Thanks,

    Brian

  • In that case you can import the two logs into a staging table (which then may hold duplicate rows) and insert from the staging table to the master table like this;

    INSERT masterTable (col1, col2, col3.....)

    SELECT DISTINCT col1, col2, col3....

    FROM stagingTable

    I think that should do the trick..?

    The requisite is also that the stagingtable be truncated between loads. It's also assumed that you don't re-run any files, each run of files contain 'new' records.

    /Kenneth

     

     

  • Try:

    INSERT masterTable (Name, Phone, CallResult, CallDate, CallTime)

    SELECT Name, Phone, CallResult, CallDate, CallTime

    FROM stagingTable

     LEFT JOIN(SELECT NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')

       +ISNULL(CallResult,'')+ISNULL(CallDate,'')

       +ISNULL(CallTime,''))),'') AS PK

      FROM stagingTable) AS New

     ON NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')

       +ISNULL(CallResult,'')+ISNULL(CallDate,'')

       +ISNULL(CallTime,''))),'')

      = New.PK

    WHERE New.PK IS NULL

     AND NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')

      +ISNULL(CallResult,'')+ISNULL(CallDate,'')

      +ISNULL(CallTime,''))),'') IS NOT NULL

    GROUP BY Name, Phone, CallResult, CallDate, CallTime

    Andy

  • Union will eliminate duplicates:

    Insert mastertable

    (name, phone, res, calldate)

    select name, phone, res, calldate

    from table1

    union

    select name, phone, res, calldate

    from table2

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply