January 14, 2016 at 2:56 pm
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 14, 2016 at 3:28 pm
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
Setting aside the fact that you don't need an Identity Column or a new table to identity dups, I am curious as to how having an Identity Column tells you what the dups are?
__________________________________________________________________________________________________________
How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/
January 14, 2016 at 4:23 pm
LinksUp (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.Setting aside the fact that you don't need an Identity Column or a new table to identity dups, I am curious as to how having an Identity Column tells you what the dups are?
Good point.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 14, 2016 at 4:24 pm
Thanks Drew. I tried a SELECT Distinct and it did not work.
Thanks again.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 14, 2016 at 4:30 pm
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
Yes there are 3 duplicates when I do a INNER JOIN but I exclude them in the Insert and I keep getting PK Violations. :unsure:An INNER JOIN won't give you the situation that I'm expecting. You're looking for situations where the code does not exist in the target table and there are duplicates in the source table. An inner join will exclude those, because the code doesn't exist in the target table.
Drew
Right. There are 3 records in the Postal that exist in the Import Table.
Thanks again.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 15, 2016 at 6:58 am
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 15, 2016 at 7:09 am
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 15, 2016 at 7:15 am
The following inserts 157 records:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY ZipCode) AS rn
FROM Postal_Codes_Import
)
SELECT *
INTO #tmp
FROM CTE
WHERE rn > 1
ORDER BY ZipCode
This returns 20 records:
SELECT ZipCode, COUNT(*) AS RecordNumber
FROM #tmp
GROUP BY ZipCode
HAVING COUNT (*) > 1
ZipCodeRecordNumber
B5A 5A82
B5A 5G32
B5A 5G53
B5A 5J13
B5A 5J72
B5A 5K52
B5A 5L13
B5A 5L62
B5A 5L72
B5A 5N33
B5A 5P22
B5A 5R32
B5A 5X92
C0A 1H14
C0A 1H34
C0A 1H64
C0A 1H73
C0A 1H92
N0N 1J42
T4B 2T32
51
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 15, 2016 at 7:36 am
Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.
I do not have a DateTime Stamp in this table. I did not create it.
I'm just looking for duplicate Zip Code fields.
Thanks.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 15, 2016 at 7:41 am
Welsh Corgi (1/15/2016)
Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.
I do not have a DateTime Stamp in this table. I did not create it.
I'm just looking for duplicate Zip Code fields.
Thanks.
The words "for example" in my post were there to indicate that I was providing an example.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 15, 2016 at 8:00 am
maybe maybe not....would be a whole lot easier with sample data.....:doze:
WITH cte
AS
(SELECT zipcode,
ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY ZipCode) AS rn
FROM Postal_Codes_Import
)
INSERT INTO Postal
SELECT cte.zipcode -- add additional columns as required
FROM cte
LEFT OUTER JOIN Postal AS P ON cte.zipcode = P.ID
WHERE(cte.rn = 1)
AND (P.ID IS NULL);
SELECT *
FROM postal;
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
January 15, 2016 at 6:57 pm
Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.
I do not have a DateTime Stamp in this table. I did not create it.
I'm just looking for duplicate Zip Code fields.
Thanks.
The words "for example" in my post were there to indicate that I was providing an example.
If both of the rows have identical values in all the columns and you know this for a fact , it does not matter what you order by.
Else you need to see if there are any indications in other columns that would want you to choose to select that one over the other(s). Phil's example uses a create date in DESC order, that makes sense in that you probably want the latest version of the row.
Without more insight into the data , I cannot provide further guidance, but I am sure you can make this judgement.
Dont forget to go back to the file originators and provide the feed back that there are duplicates when (I assume) there should not be. The best solution really is to stop the import if duplicates are encountered in the import file and it remains unclear why and what version of the row is the correct one. Here you want a process to fail. If I am importing this using SSIS I would want a variable to hold duplicate counts in the import and use this as a constraint to move to the next task (dupValues = 0) .
----------------------------------------------------
January 16, 2016 at 7:56 am
Welsh Corgi (1/15/2016)
Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.
I do not have a DateTime Stamp in this table. I did not create it.
I'm just looking for duplicate Zip Code fields.
Thanks.
Maybe you should read up on ROW_NUMBER used with PARTITION BY and understand how it works before you try to use it. Take the time to learn it so you'll be able to use it here and in other scenarios. It's a powerful tool to have in your toolbox, but you will need to understand it to be able to use it. If you don't understand it, don't put it in production until you do.
When you identify duplicate zip codes, you need to decide which one to keep. It case be based on whatever column you want, and then you sort by that column. If you decide that it doesn't matter, at least you've made a decision, but you have to make a decision one way.
Having an identity column will not identify duplicates in a different column. You should understand this already.
January 18, 2016 at 1:51 pm
There was not a problem with the LEFT JOIN syntax.
There were duplicates in the Source file which I should have caught. :blush:
Thanks for all that provided help.:-)
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Viewing 14 posts - 31 through 43 (of 43 total)
You must be logged in to reply to this topic. Login to reply