Avoiding Duplicates

Question

Post reply

Avoiding Duplicates

Brian Brant

SSC Eights!

Points: 839
More actions
July 6, 2005 at 5:13 am

#165245

At work we have an application that makes outbound dialing calls to our customers. There are 2 servers configured to do the calling, 1 as primary and 1 as a backup. All detain pertaining to the call results - such as connected, busy, answering machine, etc... - are written to log files. These files are then imported into a SQL Server 2000 DB that I created.
My question is how can I prevent getting duplicate data in the event that the backup server also starts making & logging call info? (The information from the "primary" log gets copied to the log for the backup and thus any calls recored on the primary log will be dupicated when the data from the backup log from the backup server is imported.)
Right now I have 2 tables I use for importing data - one for the primary log and one for the backup log. I then take the data from these 2 tables and move it to the "master" call table that users can then report on. That's all I have. There's nothing that checks to make sure there's no duplicate data between the two import tables. How can I check that duplicate data does not exist between these two tables before importing to the "master" table?
Thanks,
Brian

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 1

First question: How do you define a duplicate? (ie which column(s) do you consider as the natural key)

/Kenneth

Brian Brant SSC Eights! Points: 839 More actions · Answer 2

I'm defining duplicate as a record/row that is entirely the same between the two import tables.

For example, if our primary server is doing all the calling it will record things like:

1. Name of person

2. Phone # of person

3. Result of call

4. Date of call

5. Time of call

All of this information will be written to a text file called primary_log.txt. If something happens to our primary server and the backup/stand-by server is used, all the information from the primary_log.txt file gets copied to the log file - secondary_log.txt - file along with the call results made from the secondary server.

When the DTS package imports the data, it will import from both log files. The problem is that the information from the primary_log.txt file would be duplicated since it would also be included on the import from the secondary_log.txt file. That's what I'm considering duplicate data and want to avoid in the master table for end users.

Any ideas or suggestions on how I can avoid this data is greatly appreciated.

Thanks,

Brian

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 3

In that case you can import the two logs into a staging table (which then may hold duplicate rows) and insert from the staging table to the master table like this;

INSERT masterTable (col1, col2, col3.....)

SELECT DISTINCT col1, col2, col3....

FROM stagingTable

I think that should do the trick..?

The requisite is also that the stagingtable be truncated between loads. It's also assumed that you don't re-run any files, each run of files contain 'new' records.

/Kenneth

David A. Long SSCrazy Eights Points: 8421 More actions · Answer 4

Try:

INSERT masterTable (Name, Phone, CallResult, CallDate, CallTime)

SELECT Name, Phone, CallResult, CallDate, CallTime

FROM stagingTable

LEFT JOIN(SELECT NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')

+ISNULL(CallResult,'')+ISNULL(CallDate,'')

+ISNULL(CallTime,''))),'') AS PK

FROM stagingTable) AS New

ON NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')

+ISNULL(CallResult,'')+ISNULL(CallDate,'')

+ISNULL(CallTime,''))),'')

= New.PK

WHERE New.PK IS NULL

AND NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')

+ISNULL(CallResult,'')+ISNULL(CallDate,'')

+ISNULL(CallTime,''))),'') IS NOT NULL

GROUP BY Name, Phone, CallResult, CallDate, CallTime

Andy

Jorge Cordero Ten Centuries Points: 1144 More actions · Answer 5

Union will eliminate duplicates:

Insert mastertable

(name, phone, res, calldate)

select name, phone, res, calldate

from table1

union

select name, phone, res, calldate

from table2