LEFT JOIN

Question

LEFT JOIN

Viewing 14 posts - 31 through 43 (of 43 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 76985 More actions · Answer 1

Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.

No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

WITH CTE AS (

SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

FROM Postal_Codes_Import

)

DELETE CTE

WHERE rn > 1

Depending on your data you may be able to get by with just using a DISTINCT.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

LinksUp SSCertifiable Points: 6526 More actions · Answer 2

Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.

Setting aside the fact that you don't need an Identity Column or a new table to identity dups, I am curious as to how having an Identity Column tells you what the dups are?

__________________________________________________________________________________________________________
How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 3

LinksUp (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
Setting aside the fact that you don't need an Identity Column or a new table to identity dups, I am curious as to how having an Identity Column tells you what the dups are?

Good point.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 4

Thanks Drew. I tried a SELECT Distinct and it did not work.

Thanks again.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 5

drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
Yes there are 3 duplicates when I do a INNER JOIN but I exclude them in the Insert and I keep getting PK Violations. :unsure:
An INNER JOIN won't give you the situation that I'm expecting. You're looking for situations where the code does not exist in the target table and there are duplicates in the source table. An inner join will exclude those, because the code doesn't exist in the target table.
Drew

Right. There are 3 records in the Postal that exist in the Import Table.

Thanks again.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 6

drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew

What?

<your order criteria here>???

What does that matter?

Would that be the ZipCode?

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Phil Parkin SSC Guru Points: 246961 More actions · Answer 7

Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?

Where you have several duplicates, how do you decide which one to keep and which to delete?

If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 8

The following inserts 157 records:

WITH CTE AS (

SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY ZipCode) AS rn

FROM Postal_Codes_Import

)

SELECT *

INTO #tmp

FROM CTE

WHERE rn > 1

ORDER BY ZipCode

This returns 20 records:

SELECT ZipCode, COUNT(*) AS RecordNumber

FROM #tmp

GROUP BY ZipCode

HAVING COUNT (*) > 1

ZipCodeRecordNumber

B5A 5A82

B5A 5G32

B5A 5G53

B5A 5J13

B5A 5J72

B5A 5K52

B5A 5L13

B5A 5L62

B5A 5L72

B5A 5N33

B5A 5P22

B5A 5R32

B5A 5X92

C0A 1H14

C0A 1H34

C0A 1H64

C0A 1H73

C0A 1H92

N0N 1J42

T4B 2T32

51

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 9

Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

I do not have a DateTime Stamp in this table. I did not create it.

I'm just looking for duplicate Zip Code fields.

Thanks.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Phil Parkin SSC Guru Points: 246961 More actions · Answer 10

Welsh Corgi (1/15/2016)
Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.
I do not have a DateTime Stamp in this table. I did not create it.
I'm just looking for duplicate Zip Code fields.
Thanks.

The words "for example" in my post were there to indicate that I was providing an example.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 11

maybe maybe not....would be a whole lot easier with sample data.....:doze:

WITH cte

AS

(SELECT zipcode,

ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY ZipCode) AS rn

FROM Postal_Codes_Import

)

INSERT INTO Postal

SELECT cte.zipcode -- add additional columns as required

FROM cte

LEFT OUTER JOIN Postal AS P ON cte.zipcode = P.ID

WHERE(cte.rn = 1)

AND (P.ID IS NULL);

SELECT *

FROM postal;

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

MMartin1 One Orange Chip Points: 27795 More actions · Answer 12

Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.
I do not have a DateTime Stamp in this table. I did not create it.
I'm just looking for duplicate Zip Code fields.
Thanks.
The words "for example" in my post were there to indicate that I was providing an example.

If both of the rows have identical values in all the columns and you know this for a fact , it does not matter what you order by.

Else you need to see if there are any indications in other columns that would want you to choose to select that one over the other(s). Phil's example uses a create date in DESC order, that makes sense in that you probably want the latest version of the row.

Without more insight into the data , I cannot provide further guidance, but I am sure you can make this judgement.

Dont forget to go back to the file originators and provide the feed back that there are duplicates when (I assume) there should not be. The best solution really is to stop the import if duplicates are encountered in the import file and it remains unclear why and what version of the row is the correct one. Here you want a process to fail. If I am importing this using SSIS I would want a variable to hold duplicate counts in the import and use this as a constraint to move to the next task (dupValues = 0) .

----------------------------------------------------

Ed Wagner SSC Guru Points: 287026 More actions · Answer 13

Welsh Corgi (1/15/2016)
Phil Parkin (1/15/2016)
Welsh Corgi (1/15/2016)
drew.allen (1/14/2016)
Welsh Corgi (1/14/2016)
I need to create a new table that has an Identity Column so that I can delete the duplicates.
No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn
FROM Postal_Codes_Import
)
DELETE CTE
WHERE rn > 1
Depending on your data you may be able to get by with just using a DISTINCT.
Drew
What?
<your order criteria here>???
What does that matter?
Would that be the ZipCode?
Where you have several duplicates, how do you decide which one to keep and which to delete?
If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.
I do not have a DateTime Stamp in this table. I did not create it.
I'm just looking for duplicate Zip Code fields.
Thanks.

Maybe you should read up on ROW_NUMBER used with PARTITION BY and understand how it works before you try to use it. Take the time to learn it so you'll be able to use it here and in other scenarios. It's a powerful tool to have in your toolbox, but you will need to understand it to be able to use it. If you don't understand it, don't put it in production until you do.

When you identify duplicate zip codes, you need to decide which one to keep. It case be based on whatever column you want, and then you sort by that column. If you decide that it doesn't matter, at least you've made a decision, but you have to make a decision one way.

Having an identity column will not identify duplicates in a different column. You should understand this already.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 14

There was not a problem with the LEFT JOIN syntax.

There were duplicates in the Source file which I should have caught. :blush:

Thanks for all that provided help.:-)

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/