LEFT JOIN

  • Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

    WITH CTE AS (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

    FROM Postal_Codes_Import

    )

    DELETE CTE

    WHERE rn > 1

    Depending on your data you may be able to get by with just using a DISTINCT.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    Setting aside the fact that you don't need an Identity Column or a new table to identity dups, I am curious as to how having an Identity Column tells you what the dups are?

    __________________________________________________________________________________________________________
    How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • LinksUp (1/14/2016)


    Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    Setting aside the fact that you don't need an Identity Column or a new table to identity dups, I am curious as to how having an Identity Column tells you what the dups are?

    Good point.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks Drew. I tried a SELECT Distinct and it did not work.

    Thanks again.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • drew.allen (1/14/2016)


    Welsh Corgi (1/14/2016)


    Yes there are 3 duplicates when I do a INNER JOIN but I exclude them in the Insert and I keep getting PK Violations. :unsure:

    An INNER JOIN won't give you the situation that I'm expecting. You're looking for situations where the code does not exist in the target table and there are duplicates in the source table. An inner join will exclude those, because the code doesn't exist in the target table.

    Drew

    Right. There are 3 records in the Postal that exist in the Import Table.

    Thanks again.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • drew.allen (1/14/2016)


    Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

    WITH CTE AS (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

    FROM Postal_Codes_Import

    )

    DELETE CTE

    WHERE rn > 1

    Depending on your data you may be able to get by with just using a DISTINCT.

    Drew

    What?

    <your order criteria here>???

    What does that matter?

    Would that be the ZipCode?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (1/15/2016)


    drew.allen (1/14/2016)


    Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

    WITH CTE AS (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

    FROM Postal_Codes_Import

    )

    DELETE CTE

    WHERE rn > 1

    Depending on your data you may be able to get by with just using a DISTINCT.

    Drew

    What?

    <your order criteria here>???

    What does that matter?

    Would that be the ZipCode?

    Where you have several duplicates, how do you decide which one to keep and which to delete?

    If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • The following inserts 157 records:

    WITH CTE AS (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY ZipCode) AS rn

    FROM Postal_Codes_Import

    )

    SELECT *

    INTO #tmp

    FROM CTE

    WHERE rn > 1

    ORDER BY ZipCode

    This returns 20 records:

    SELECT ZipCode, COUNT(*) AS RecordNumber

    FROM #tmp

    GROUP BY ZipCode

    HAVING COUNT (*) > 1

    ZipCodeRecordNumber

    B5A 5A82

    B5A 5G32

    B5A 5G53

    B5A 5J13

    B5A 5J72

    B5A 5K52

    B5A 5L13

    B5A 5L62

    B5A 5L72

    B5A 5N33

    B5A 5P22

    B5A 5R32

    B5A 5X92

    C0A 1H14

    C0A 1H34

    C0A 1H64

    C0A 1H73

    C0A 1H92

    N0N 1J42

    T4B 2T32

    51

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Phil Parkin (1/15/2016)


    Welsh Corgi (1/15/2016)


    drew.allen (1/14/2016)


    Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

    WITH CTE AS (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

    FROM Postal_Codes_Import

    )

    DELETE CTE

    WHERE rn > 1

    Depending on your data you may be able to get by with just using a DISTINCT.

    Drew

    What?

    <your order criteria here>???

    What does that matter?

    Would that be the ZipCode?

    Where you have several duplicates, how do you decide which one to keep and which to delete?

    If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

    I do not have a DateTime Stamp in this table. I did not create it.

    I'm just looking for duplicate Zip Code fields.

    Thanks.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (1/15/2016)


    Phil Parkin (1/15/2016)


    Welsh Corgi (1/15/2016)


    drew.allen (1/14/2016)


    Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

    WITH CTE AS (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

    FROM Postal_Codes_Import

    )

    DELETE CTE

    WHERE rn > 1

    Depending on your data you may be able to get by with just using a DISTINCT.

    Drew

    What?

    <your order criteria here>???

    What does that matter?

    Would that be the ZipCode?

    Where you have several duplicates, how do you decide which one to keep and which to delete?

    If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

    I do not have a DateTime Stamp in this table. I did not create it.

    I'm just looking for duplicate Zip Code fields.

    Thanks.

    The words "for example" in my post were there to indicate that I was providing an example.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • maybe maybe not....would be a whole lot easier with sample data.....:doze:

    WITH cte

    AS

    (SELECT zipcode,

    ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY ZipCode) AS rn

    FROM Postal_Codes_Import

    )

    INSERT INTO Postal

    SELECT cte.zipcode -- add additional columns as required

    FROM cte

    LEFT OUTER JOIN Postal AS P ON cte.zipcode = P.ID

    WHERE(cte.rn = 1)

    AND (P.ID IS NULL);

    SELECT *

    FROM postal;

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Phil Parkin (1/15/2016)


    Welsh Corgi (1/15/2016)


    Phil Parkin (1/15/2016)


    Welsh Corgi (1/15/2016)


    drew.allen (1/14/2016)


    Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

    WITH CTE AS (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

    FROM Postal_Codes_Import

    )

    DELETE CTE

    WHERE rn > 1

    Depending on your data you may be able to get by with just using a DISTINCT.

    Drew

    What?

    <your order criteria here>???

    What does that matter?

    Would that be the ZipCode?

    Where you have several duplicates, how do you decide which one to keep and which to delete?

    If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

    I do not have a DateTime Stamp in this table. I did not create it.

    I'm just looking for duplicate Zip Code fields.

    Thanks.

    The words "for example" in my post were there to indicate that I was providing an example.

    If both of the rows have identical values in all the columns and you know this for a fact , it does not matter what you order by.

    Else you need to see if there are any indications in other columns that would want you to choose to select that one over the other(s). Phil's example uses a create date in DESC order, that makes sense in that you probably want the latest version of the row.

    Without more insight into the data , I cannot provide further guidance, but I am sure you can make this judgement.

    Dont forget to go back to the file originators and provide the feed back that there are duplicates when (I assume) there should not be. The best solution really is to stop the import if duplicates are encountered in the import file and it remains unclear why and what version of the row is the correct one. Here you want a process to fail. If I am importing this using SSIS I would want a variable to hold duplicate counts in the import and use this as a constraint to move to the next task (dupValues = 0) .

    ----------------------------------------------------

  • Welsh Corgi (1/15/2016)


    Phil Parkin (1/15/2016)


    Welsh Corgi (1/15/2016)


    drew.allen (1/14/2016)


    Welsh Corgi (1/14/2016)


    I need to create a new table that has an Identity Column so that I can delete the duplicates.

    No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

    WITH CTE AS (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

    FROM Postal_Codes_Import

    )

    DELETE CTE

    WHERE rn > 1

    Depending on your data you may be able to get by with just using a DISTINCT.

    Drew

    What?

    <your order criteria here>???

    What does that matter?

    Would that be the ZipCode?

    Where you have several duplicates, how do you decide which one to keep and which to delete?

    If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

    I do not have a DateTime Stamp in this table. I did not create it.

    I'm just looking for duplicate Zip Code fields.

    Thanks.

    Maybe you should read up on ROW_NUMBER used with PARTITION BY and understand how it works before you try to use it. Take the time to learn it so you'll be able to use it here and in other scenarios. It's a powerful tool to have in your toolbox, but you will need to understand it to be able to use it. If you don't understand it, don't put it in production until you do.

    When you identify duplicate zip codes, you need to decide which one to keep. It case be based on whatever column you want, and then you sort by that column. If you decide that it doesn't matter, at least you've made a decision, but you have to make a decision one way.

    Having an identity column will not identify duplicates in a different column. You should understand this already.

  • There was not a problem with the LEFT JOIN syntax.

    There were duplicates in the Source file which I should have caught. :blush:

    Thanks for all that provided help.:-)

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 14 posts - 31 through 43 (of 43 total)

You must be logged in to reply to this topic. Login to reply