July 6, 2005 at 5:13 am
At work we have an application that makes outbound dialing calls to our customers. There are 2 servers configured to do the calling, 1 as primary and 1 as a backup. All detain pertaining to the call results - such as connected, busy, answering machine, etc... - are written to log files. These files are then imported into a SQL Server 2000 DB that I created.
My question is how can I prevent getting duplicate data in the event that the backup server also starts making & logging call info? (The information from the "primary" log gets copied to the log for the backup and thus any calls recored on the primary log will be dupicated when the data from the backup log from the backup server is imported.)
Right now I have 2 tables I use for importing data - one for the primary log and one for the backup log. I then take the data from these 2 tables and move it to the "master" call table that users can then report on. That's all I have. There's nothing that checks to make sure there's no duplicate data between the two import tables. How can I check that duplicate data does not exist between these two tables before importing to the "master" table?
Thanks,
Brian
July 6, 2005 at 5:26 am
First question: How do you define a duplicate? (ie which column(s) do you consider as the natural key)
/Kenneth
July 6, 2005 at 5:49 am
I'm defining duplicate as a record/row that is entirely the same between the two import tables.
For example, if our primary server is doing all the calling it will record things like:
1. Name of person
2. Phone # of person
3. Result of call
4. Date of call
5. Time of call
All of this information will be written to a text file called primary_log.txt. If something happens to our primary server and the backup/stand-by server is used, all the information from the primary_log.txt file gets copied to the log file - secondary_log.txt - file along with the call results made from the secondary server.
When the DTS package imports the data, it will import from both log files. The problem is that the information from the primary_log.txt file would be duplicated since it would also be included on the import from the secondary_log.txt file. That's what I'm considering duplicate data and want to avoid in the master table for end users.
Any ideas or suggestions on how I can avoid this data is greatly appreciated.
Thanks,
Brian
July 6, 2005 at 6:07 am
In that case you can import the two logs into a staging table (which then may hold duplicate rows) and insert from the staging table to the master table like this;
INSERT masterTable (col1, col2, col3.....)
SELECT DISTINCT col1, col2, col3....
FROM stagingTable
I think that should do the trick..?
The requisite is also that the stagingtable be truncated between loads. It's also assumed that you don't re-run any files, each run of files contain 'new' records.
/Kenneth
July 7, 2005 at 12:26 am
Try:
INSERT masterTable (Name, Phone, CallResult, CallDate, CallTime)
SELECT Name, Phone, CallResult, CallDate, CallTime
FROM stagingTable
LEFT JOIN(SELECT NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')
+ISNULL(CallResult,'')+ISNULL(CallDate,'')
+ISNULL(CallTime,''))),'') AS PK
FROM stagingTable) AS New
ON NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')
+ISNULL(CallResult,'')+ISNULL(CallDate,'')
+ISNULL(CallTime,''))),'')
= New.PK
WHERE New.PK IS NULL
AND NULLIF(LTRIM(RTRIM(ISNULL(Name,'')+ISNULL(Phone,'')
+ISNULL(CallResult,'')+ISNULL(CallDate,'')
+ISNULL(CallTime,''))),'') IS NOT NULL
GROUP BY Name, Phone, CallResult, CallDate, CallTime
Andy
July 7, 2005 at 12:41 am
Union will eliminate duplicates:
Insert mastertable
(name, phone, res, calldate)
select name, phone, res, calldate
from table1
union
select name, phone, res, calldate
from table2
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply