June 1, 2011 at 2:41 pm
So I know there must be a better way to do this, because I am most definitely tired of doing it manually. 🙂
I have been given the task of cleaning up a table full of duplicates where all column data is the same but the primary key is different. However, before I can delete the dups I must update several tables with one of the primary keys (doesn't matter which one, the TOP one is fine) and then delete the rest in the original table. The setup is like this:
@MainTable
ID - PK
Col1
Col2
Col3
@Table1
TID - PK
ID - FK
@Table2
T2ID - PK
ID - FK
@Table3
T3ID - PK
ID - FK
So essentially I need to update any duplicate ID in @Table1, @Table2, and @Table3 with just one ID from @MainTable. Then delete the duplicate ID from @MainTable. Sample Data:
declare @MainTable table(ID int, col1 int, col2 int, col3 int);
declare @Table1 table(TID int, ID int);
declare @Table2 table(T2ID int, ID int);
declare @Table3 table(T3ID int, ID int);
insert into @MainTable(ID,col1,col2,col3)
select 1,22,33,44
union all select 2,22,33,44
union all select 4,55,66,77
union all select 5,55,66,77
union all select 6,55,66,77
union all select 7,88,99,100
union all select 8,88,99,100
union all select 9,88,99,100
union all select 10,88,99,100;
insert into @Table1(TID,ID)
select 1,2
union all select 2,5
union all select 3,6
union all select 4,8
union all select 5,9;
insert into @Table2(T2ID,ID)
select 1,2
union all select 2,4
union all select 3,5
union all select 4,8
union all select 5,9;
insert into @Table3(T3ID,ID)
select 1,2
union all select 2,5
union all select 3,6
union all select 4,8
union all select 5,10;
--to review data
select * from @MainTable;
select * from @Table1;
select * from @Table2;
select * from @Table3;
--update tables
update @Table1
set ID = 1
where ID in (1,2);
update @Table2
set ID = 1
where ID in (1,2);
update @Table3
set ID = 1
where in (1,2);
Delete from @MainTable
where ID = 2;
--next group of IDs
update @Table1
set ID = 4
where ID in (4,5,6);
update @Table2
set ID = 4
where ID in (4,5,6);
update @Table3
set ID = 4
where ID in (4,5,6);
delete from @MainTable
where ID in (5,6);
...and so on and so forth. The result would be all tables updated with the IDs below and the @MainTable looking like this:
ID col1 col2 col3
1 22 33 44
4 55 66 77
7 88 99 100
I am hoping to not have to use a cursor :crying:
Thanks for any help in advance!
-stephen
June 1, 2011 at 3:13 pm
Before I post my solution: You did an EXCELLENT JOB describing the scenario and providing ready to use sample data! There's nothing better to ask for! GREAT!!
And here's what I came up with:
;WITH cte AS
(
SELECT col1,col2,col3,MIN(id) min_id -- find the first id per group of identical columns
FROM @MainTable
GROUP BY col1,col2,col3
)
SELECT m.*,cte.min_id
INTO tmp -- build a staging table to hold the current and the min id where current <> min
FROM @MainTable m
INNER JOIN cte ON m.col1=cte.col1 AND m.col2=cte.col2 AND m.col3=cte.col3
AND m.id<>cte.min_id
SELECT *
FROM tmp
-- update all values in all three tables
UPDATE @Table1
SET ID = tmp.min_id
FROM @Table1 t1 INNER JOIN tmp ON t1.id=tmp.id
UPDATE @Table2
SET ID = tmp.min_id
FROM @Table2 t2 INNER JOIN tmp ON t2.id=tmp.id
UPDATE @Table3
SET ID = tmp.min_id
FROM @Table3 t3 INNER JOIN tmp ON t3.id=tmp.id
-- delete from the original table
DELETE m FROM
@MainTable m
INNER JOIN tmp ON m.id=tmp.id
SELECT * FROM @MainTable;
SELECT * FROM @Table1;
SELECT * FROM @Table2;
SELECT * FROM @Table3;
-- cleanup
DROP TABLE tmp
Depending on the number of rows you need to deal with it might be a good idea to add an index to the temp table on the id column.
If you have any question regarding the code, please ask.
June 2, 2011 at 8:24 am
This was exactly what I needed! Your sytax and comments were very clear, and this will definitely help me in the future.
I have never used a CTE before, but after doing a little research I definitely see the benefits. At best, I could be considered a junior database developer, so my knowledge of SQL Server is in its infancy. I have been using SQL Server for about 10 months now, and am hoping to get into SSAS and the rest of the Business Intelligence package.
If you don't mind me asking, do you have any suggestions on furthering my SQL server knowledge? It has been nice learning on the job, but I really want to take it to the next level (SSAS 😀 of course). I have bought books, but I have a hard time grasping the information (go figure, technical books that are boring!).
Thanks again for the help,
-Stephen
June 2, 2011 at 8:48 am
The first thing would be to get a SQL Server Express edition (version similar to what you use at work) and install it at home or wherever you're planning to study together with the AdventureWorks database.
If the books you're reading include sample code, try to run it in your test environment and understand how it works. If you need sample codes to study on, there's one great resource: SQLServerCentral.com 😀
Follow some of the threads you're interested in and try to solve the "puzzles" by yourself (before looking at the solutions posted). Compare your solution and the ones posted and try to find why those behave like they do. If you have questions, just ask (include either the sample code you're dealing with or a link to a related thread/post, so we know what you're working on).
Regarding SSAS: haven't used it (yet). Maybe others will post some resources.
In General, you could go to the main page of SQLServerCentral and have a look at the Stairways series. There might be some helpful articles.
June 2, 2011 at 9:32 am
Thanks for the advice!
I am going to try some of the free webinars from Pragmatic Works. I learn more effectively through demos and hands-on.
Also, I will start to participate on the site more often. It has always been useful and seems to be very helpful and friendly.
-stephen
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply