October 22, 2009 at 1:38 pm
I have written the following T-SQL and it works fine... just slowly.
I am think its from the cursors in it.. but do not know how to do the same thing without it.
the ll_dup_Sender table is a table i populated that has all the duplicate and a new column called 'rn'
if a sender is duplicated (based on columns: local_part, domain_part, is_local)then the one with the 'rn' = 1 is the one to keep. the other ones need to be removed. hence, the trick part. any data in the Sender_data table that references the sender_id key in the ll_dup_sender table need to be updated to begin to reference the one that will remain. then finally all the duplicates can be deleted.
for example....
ll_dup_sender table
sender_id | local_part | Domain_part | is_local | rn
-----------------------------------------------------
1 | someone | domain.com | 0 | 2
2 | someone | domain.com | 0 | 1
3 | someone | domain.com | 0 | 3
44 | someone | domain.com | 0 | 4
Sender_data table
stat_id | sender_id | More columns ->
-----------------------------------------------------
1 | 1 |
2 | 44 |
3 | 44 |
44 | 3 |
so basically the Sender_data table needs to be updated so any rows with sender_ID of 3,44, 2 are updated to sender_id = 1 (the one to keep, FYI -- always the one with the lowest ID number) no info to be removed from the MT_monitor_fact_scanned table and no sender_ID can be orphaned from the ll_dup_sender table. and not all in the ll_dup_sender table are duplicated and some may be duplicated upto 8 times. what a mess!
i just cannot figure out how to do this without a cursor and it will just take to long with the way i am doing it
Thanks
here is the T-SQL i have been using:
----------------------------------------
declare @DupCount int, declare @Real_sender_id int, declare @sender_id int
declare @local_part varchar(255), declare @domain_Part varchar(255), declare @Is_Local bit
declare @strSQL varchar(255), declare @Query1 varchar(255), declare @Query2 varchar(255)
declare @fixedrows int, declare @rn int, declare @Cursor_rowcount int
SET NOCOUNT ON
select @DupCount = Count(sender_id) from ll_dup_sender
while @DupCount != 0
begin
set @strSQL = '('
select top(1) @local_part = Local_part, @domain_Part = domain_Part, @Is_Local = Is_Local from ll_dup_sender
DECLARE Bad_Sender_ID CURSOR LOCAL STATIC FOR
select sender_id, rn from ll_dup_sender where local_part = @local_part and domain_Part = @domain_Part and Is_Local = @Is_Local
open Bad_sender_ID
fetch next from Bad_sender_ID into @sender_id, @rn
set @Cursor_rowcount = @@cursor_rows
WHILE @@FETCH_STATUS = 0
BEGIN
if @Cursor_rowcount = 1
Begin
set @Real_sender_id = @sender_id
goto OnlyDelete
end
if @rn > 1 set @strSQL = @strSQL + cast(@sender_id as varchar) + ', '
if @rn = 1 set @Real_sender_id = @sender_id
FETCH NEXT FROM Bad_Sender_ID into @sender_id, @rn
END
set @strSQL = left(@strSQL, len(@strSQL) - 1) + ')'
set @Query1 = 'update Sender_data set sender_id = ' + cast(@Real_sender_id as varchar) + ' where sender_id IN ' + @strSQL
EXEC (@Query1)
OnlyDelete:
if @Cursor_rowcount = 1 set @Query2 = 'delete from ll_dup_sender where sender_id = ' + cast(@Real_sender_id as varchar)
if @Cursor_rowcount > 1 set @Query2 = 'delete from ll_dup_sender where sender_id IN ' + @strSQL + ' or sender_id = ' + cast(@Real_sender_id as varchar)
EXEC (@Query2)
set @DupCount = @DupCount -@@rowcount
CLOSE Bad_Sender_ID
DEALLOCATE Bad_Sender_ID
end
October 22, 2009 at 7:29 pm
Thanks for providing your code. However, if you could provide sample data and the results you would like to get (see the link in my signature on how to provide sample data for the correct way to do this), it would make it much easier for us to help you write some set based code.
October 23, 2009 at 12:40 pm
If I understand your cursor code correctly, then the following 2 statements will do the equivalent. The self join on the ll_dup_sender table is effectvely creating a map linking duplicate sender_id values (where column rn > 1) to their associated "actual" sender_id values (where column rn = 1). The TRUNCATE statement just removes all values from the ll_dup_sender table since you don't appear to need it after updating the Sender_data table.
UPDATE SD SET sender_id = ACTUAL.sender_id
FROM ll_dup_sender ACTUAL
INNER JOIN ll_dup_sender DUP ON (
ACTUAL.local_part = DUP.local_part
AND ACTUAL.Domain_part = DUP.Domain_part
AND ACTUAL.is_local = DUP.is_local
AND ACTUAL.rn = 1
AND DUP.rn > 1
)
INNER JOIN Sender_data SD ON (SD.sender_id = DUP.sender_id)
TRUNCATE TABLE ll_dup_sender
The following version will be identical in terms of result and will very likely perform similarly, but the mapping from duplicate to actual sender_id values is made explicit in the CTE.
;WITH cteMAP AS (
SELECT
ACTUAL.sender_id AS actual_sender_id,
DUP.sender_id AS dup_sender_id
FROM ll_dup_sender ACTUAL
INNER JOIN ll_dup_sender DUP ON (
ACTUAL.local_part = DUP.local_part
AND ACTUAL.Domain_part = DUP.Domain_part
AND ACTUAL.is_local = DUP.is_local
AND ACTUAL.rn = 1
AND DUP.rn > 1
)
)
UPDATE SD SET sender_id = MAP.actual_sender_id
FROM Sender_data SD
INNER JOIN cteMAP MAP ON (SD.sender_id = MAP.dup_sender_id)
TRUNCATE TABLE ll_dup_sender
October 23, 2009 at 1:11 pm
Thanks for the reply..
i will setup some test data and give the code a try, i will let you know how it goes.
Thanks
LL
October 23, 2009 at 8:25 pm
As a side bar, if you had setup some test data and provided the scripts to gen that test data, any solutions folks gave you would probably already be tested. Please see the first link in my signature line below for how and what the format for the data should be.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 26, 2009 at 11:22 am
Jeff,
Thanks for pointing out the link on posting etiquette.
i have read the link when Garadin posted the same link in a reply to this post.
i was preparing some data to post to this thread but got sidetracked with some other things, when i had time to get back on this task a possible solution seemed to be posted (preliminary tests look good, thanks again andrewed)
so right now at the moment since it looks as if the puzzle has been solved it will hold on putting together a test data and table structure info since it does not appear to be needed anymore.
in the future i will follow the info as in the link provided by you and Garadin.
Thanks
Leroy L
October 30, 2009 at 3:14 pm
andrewd,
I have tested your suggestion on sample data and it worked perfectly as is.
running the T-SQL on the actual data worked successfully, and I appreciate you for your time and thought.
Leroy L
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply