December 15, 2009 at 9:03 am
Hi Guys, long time reader, first time poster π
I have a question regarding duplicate entries.
I have a basic table with 7 columns that I am having to more into a new table with a unique index on 4 of the 7 columns.
I have around 50,000 rows that due to the new index are considered 'duplicate'.
My question is what would be the easiest way to either move the unique rows to a new table?
Thanks
Dean
December 15, 2009 at 9:26 am
This should get you started, Dean:
DROP TABLE #MySampleTable
CREATE TABLE #MySampleTable (col1 INT, col2 INT, col3 INT, col4 INT, col5 INT, col6 INT, col7 INT)
INSERT INTO #MySampleTable (col1, col2, col3, col4, col5, col6, col7)
SELECT 1, 2, 3, 4, 1, 1, 1 UNION ALL
SELECT 1, 2, 3, 4, 2, 2, 2 UNION ALL
SELECT 1, 2, 3, 4, 3, 3, 3 UNION ALL
SELECT 1, 2, 3, 5, 1, 1, 1 UNION ALL
SELECT 1, 2, 3, 5, 2, 2, 2 UNION ALL
SELECT 1, 2, 3, 6, 1, 1, 1 UNION ALL
SELECT 1, 2, 3, 6, 2, 2, 2 UNION ALL
SELECT 1, 2, 3, 7, 1, 1, 1
SELECT *
INTO #MyNewTable
FROM (
SELECT *, MyRow = ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY col1, col2, col3, col4, col5, col6, col7)
FROM #MySampleTable
) d
WHERE MyRow = 1
SELECT * FROM #MyNewTable
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 15, 2009 at 9:29 am
It would be most helpful for you to post :
1. Table(s) schema.
2. Sample data.
Thanks.
December 15, 2009 at 1:22 pm
Thanks for the replies guys.
I will get the schema and sample data tomorrow and add to this forum.
Cheers.
Dean
December 15, 2009 at 1:39 pm
Something for you to think about.
If two rows have duplicate values for those 4 columns, but different values for the other three, which row would you want to keep and which would you want to discard?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 15, 2009 at 1:54 pm
Thats one of the issues π
One of the columns is a time stamp so I would want to keep the latest entry.
Basically this is a WorkflowStatus table that has status code, then four column relating to the data itself and finally a 'Time' and a 'UpdateBy' column which records the users name.
Dean.
December 15, 2009 at 2:18 pm
deanrjohnson (12/15/2009)
Thats one of the issues π
Actually doing it is not hard, if there's a simple rule as to which row to keep. I'm sure once the schema's posted there'll be a solution within the hour.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply