Deduplicating rows by choosing the row with the shortest string

  • Hello T-SQL experts

    I have a table containing team codes and descriptions. Unfortunately, many of the team codes are duplicated, often with different descriptions.

    I want to deduplicate the table (in T-SQL) by selecting the shortest description (in number of characters) for each team code. How do I do this in a non-manual fashion?

    I could obviously group in the team code, but there is no suitable aggregate function that I can think of for pulling out the shortest description in each group. MIN() won’t work as it sorts (e.g.) ‘aa’ before ‘b’. Also, T-SQL has no FIRST() or LAST() aggregate function.

    Any ideas anyone?

    Yours hopefully

    Mark Dalley

  • If your version of SQL has use of FIRST_VALUE, you could try this:

    SELECT DISTINCT team_code, /* , ...*/
    FIRST_VALUE(description) OVER(PARTITION BY team_code ORDER BY LEN(description)) AS description
    FROM dbo.table_name

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Maybe something like this?

    declare @Teams table (
    TeamCode varchar(10),
    Description varchar(30)
    )

    INSERT INTO @Teams (TeamCode, Description)
    VALUES
    ('AC', 'Action Committee'),
    ('AC', 'Active Cowards'),
    ('AC', 'Attentive Collegeboys'),
    ('AC', 'Act. Com.'),
    ('WC', 'World-class Citizens'),
    ('WC', 'World Citizens'),
    ('WC', 'Wretched Cowards'),
    ('WC', 'Wicked Consultants')
    ;

    WITH TeamData AS (
    SELECT
    TeamCode,
    Description,
    MIN(RIGHT(SPACE(10)+CAST(LEN(Description) AS VARCHAR(10)),10)+Description) OVER(PARTITION BY TeamCode) AS NewDescription
    FROM @Teams)
    select
    TeamCode,
    Description,
    substring(NewDescription,11,255) AS NewDescription
    from TeamData
    TeamCode   Description                    NewDescription
    ---------- ------------------------------ ----------------------------------------
    AC Action Committee Act. Com.
    AC Active Cowards Act. Com.
    AC Attentive Collegeboys Act. Com.
    AC Act. Com. Act. Com.
    WC World-class Citizens World Citizens
    WC World Citizens World Citizens
    WC Wretched Cowards World Citizens
    WC Wicked Consultants World Citizens

    (8 rows affected)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply