Problems with duplicates

  • Hi

    I am having some problems coding with sql. Very new at this. I'm using sql server 2014.

    Anyway, I want to clean my table which have seveal duplicates in the name column.

    IDTrNamn SäsongLag date land

    144Aaron Downey2008-2009DET1974-08-27Shelburne

    244Aaron Downey2001-2002CHI1974-08-27Shelburne

    344Aaron Downey2006-2007MTL1974-08-27Shelburne

    49Adam Graves1998-1999NYR1968-04-12Tecumseh

    518Adam Hall 2001-2002NSH 1980-08-14Kalamazoo

    I want to get rid of those duplicates and save one, to have this:

    IDTrNamn SäsongLag date land

    144Aaron Downey2008-2009DET1974-08-27Shelburne

    49Adam Graves1998-1999NYR1968-04-12Tecumseh

    518Adam Hall 2001-2002NSH 1980-08-14Kalamazoo

    Can anyone help? I'd really appreciate it.

    /F

  • fernando.arias (10/15/2015)


    Hi

    I am having some problems coding with sql. Very new at this. I'm using sql server 2014.

    Anyway, I want to clean my table which have seveal duplicates in the name column.

    IDTrNamn SäsongLag date land

    144Aaron Downey2008-2009DET1974-08-27Shelburne

    244Aaron Downey2001-2002CHI1974-08-27Shelburne

    344Aaron Downey2006-2007MTL1974-08-27Shelburne

    49Adam Graves1998-1999NYR1968-04-12Tecumseh

    518Adam Hall 2001-2002NSH 1980-08-14Kalamazoo

    I want to get rid of those duplicates and save one, to have this:

    IDTrNamn SäsongLag date land

    144Aaron Downey2008-2009DET1974-08-27Shelburne

    49Adam Graves1998-1999NYR1968-04-12Tecumseh

    518Adam Hall 2001-2002NSH 1980-08-14Kalamazoo

    Can anyone help? I'd really appreciate it.

    /F

    What is the logic behind choosing which of the 'duplicates' should be retained?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • fernando.arias (10/15/2015)


    Hi

    I am having some problems coding with sql. Very new at this. I'm using sql server 2014.

    Anyway, I want to clean my table which have seveal duplicates in the name column.

    IDTrNamn SäsongLag date land

    144Aaron Downey2008-2009DET1974-08-27Shelburne

    244Aaron Downey2001-2002CHI1974-08-27Shelburne

    344Aaron Downey2006-2007MTL1974-08-27Shelburne

    49Adam Graves1998-1999NYR1968-04-12Tecumseh

    518Adam Hall 2001-2002NSH 1980-08-14Kalamazoo

    I want to get rid of those duplicates and save one, to have this:

    IDTrNamn SäsongLag date land

    144Aaron Downey2008-2009DET1974-08-27Shelburne

    49Adam Graves1998-1999NYR1968-04-12Tecumseh

    518Adam Hall 2001-2002NSH 1980-08-14Kalamazoo

    Can anyone help? I'd really appreciate it.

    /F

    You have duplicates on [Tr]/[Namn]/[date]/[land] e.g. "Aaron Downey". What are your rules for determining which rows to delete?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for responding. The logic is the latest säsong (season) 2008-2009 in this case.

    /F

  • Thanks for responding. The logic is the latest säsong (season) 2008-2009 in this case, meaning that I want the results for Aaron Downing 2008-2009 but not the others.

    /F

  • SELECT *

    INTO #FernandosData

    FROM (VALUES

    (1, 44, 'Aaron Downey', '2008-2009', 'DET', '1974-08-27', 'Shelburne'),

    (2, 44, 'Aaron Downey', '2001-2002', 'CHI', '1974-08-27', 'Shelburne'),

    (3, 44, 'Aaron Downey', '2006-2007', 'MTL', '1974-08-27', 'Shelburne'),

    (4, 9, 'Adam Graves', '1998-1999', 'NYR', '1968-04-12', 'Tecumseh'),

    (5, 18, 'Adam Hall', '2001-2002', 'NSH', '1980-08-14', 'Kalamazoo')

    ) d (ID, Tr, Namn, Säsong, Lag, [date], land);

    -- Will this work? Have a look:

    WITH MarkedData AS (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY Tr, Namn,[date], land ORDER BY Säsong DESC)

    FROM #FernandosData

    )

    SELECT *

    FROM MarkedData

    ORDER BY ID;

    -- Try it out

    WITH MarkedData AS (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY Tr, Namn,[date], land ORDER BY Säsong DESC)

    FROM #FernandosData

    )

    DELETE FROM MarkedData WHERE rn > 1

    -- Check the results

    SELECT * FROM #FernandosData

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hmm, I can't make it work. Look, i might as well give you a sample of my sheet so you see all the correct column names and so forth (I made some of tehm up before to make it easier). I am sorry if this is confusing.

    This is what i tried:

    SELECT *

    INTO Newtable

    FROM [nhl].[dbo].[Blad1$]

    WITH Fernhl AS (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY Namn, Lag, födelseland ORDER BY Säsong DESC)

    FROM Newtable

    )

    SELECT *

    FROM Fernhl

    ORDER BY ID;

    WITH Fernhl AS (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY Namn, Lag, Födelseland ORDER BY Säsong DESC)

    FROM Newtable

    )

    DELETE FROM Fernhl WHERE rn > 1

    SELECT *

    FROM Newtable

    --- But it gave me this message:

    Msg 336, Level 15, State 1, Line 5

    Incorrect syntax near 'Fernhl'. If this is intended to be a common table expression, you need to explicitly terminate the previous statement with a semi-colon.

  • fernando.arias (10/15/2015)


    --- But it gave me this message:

    Msg 336, Level 15, State 1, Line 5

    Incorrect syntax near 'Fernhl'. If this is intended to be a common table expression, you need to explicitly terminate the previous statement with a semi-colon.

    Put ";" in front of all your "WITH"

    i.e.

    [highlight="#ffff11"]; [/highlight]WITH Fernhl AS

    (Or put ";" on the end of the preceding statement, if you prefer.)

  • Kristen-173977 (10/15/2015)


    fernando.arias (10/15/2015)


    --- But it gave me this message:

    Msg 336, Level 15, State 1, Line 5

    Incorrect syntax near 'Fernhl'. If this is intended to be a common table expression, you need to explicitly terminate the previous statement with a semi-colon.

    Put ";" in front of all your "WITH"

    i.e.

    [highlight="#ffff11"]; [/highlight]WITH Fernhl AS

    (Or put ";" on the end of the preceding statement, if you prefer.)

    ; is a terminator, not an initiator, and should appear at the end of the preceding statement.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Thanks old hand!

    Though, the code didn't work for me. The result I got had several duplicates still I am afraid.

    Another question: When I want to repeat the last command and press execute it will always tell me:

    ------

    Msg 2714, Level 16, State 6, Line 1

    There is already an object named 'Newtable' in the database.

    ---

    and I always have to rename the table. How do I avoid this?

  • SELECT *

    INTO Newtable

    creates a permanent table called Newtable. So if you run it again when the table already exists, you get an error because the table cannot be created again.

    If you want to be able to rerun the entire batch, you could do something like this:

    If Object_Id('dbo.NewTable', 'U') is not null

    drop table dbo.NewTable;

    select *

    into dbo.NewTable ...

    But please be careful not to DROP (delete) any tables which you need!

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Thanks!

    Though it doesnt delete the duplicates with the codes that you all have been so kind to give me.

    This (below) doesn't result in deleting all the dups. I only want one person to appear once with the row showing his latest "säsong", and then for it to return: Namn, Lag, födelseland and säsong.

    Thanks!

    SELECT *

    INTO Newtable

    FROM [nhl].[dbo].[Blad1$]

    ; WITH Fernhl AS (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY Namn, Lag, födelseland ORDER BY Säsong DESC)

    FROM Newtable

    )

    SELECT *

    FROM Fernhl

    ORDER BY ID;

    WITH Fernhl AS (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY Namn, Lag, Födelseland ORDER BY Säsong DESC)

    FROM Newtable

    )

    DELETE FROM Fernhl WHERE rn > 1

    SELECT *

    FROM Newtable

  • fernando.arias (10/15/2015)


    Thanks!

    Though it doesnt delete the duplicates with the codes that you all have been so kind to give me.

    This (below) doesn't result in deleting all the dups. I only want one person to appear once with the row showing his latest "säsong", and then for it to return: Namn, Lag, födelseland and säsong.

    Thanks!

    SELECT *

    INTO Newtable

    FROM [nhl].[dbo].[Blad1$]

    ; WITH Fernhl AS (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY Namn, Lag, födelseland ORDER BY Säsong DESC)

    FROM Newtable

    )

    SELECT *

    FROM Fernhl

    ORDER BY ID;

    WITH Fernhl AS (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY Namn, Lag, Födelseland ORDER BY Säsong DESC)

    FROM Newtable

    )

    DELETE FROM Fernhl WHERE rn > 1

    SELECT *

    FROM Newtable

    You haven't told us anything at all about the "remaining dupes". When you look at a pair of these dupes, what do you see? I'd expect them to have different values for lag or Födelseland.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I'm so sorry for that.

    Here comes a screenshot, as you can see Aaron Downey is there several times, the code took away the dups when the value for the "Lag" column was the same . But it didn't delete the rest of the dups.

    Please tell me if you need more info I am really a newbie here.

    Thank you

  • Based on that screen shot, if you change your PARTITION BY to this, it should work

    (PARTITION BY TROJNUMMER ORDER BY SASONG DESC)

    (obviously, the various accented characters need to be substituted into the column names)

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply