November 7, 2024 at 6:06 pm
Hello T-SQL experts
I have a table containing team codes and descriptions. Unfortunately, many of the team codes are duplicated, often with different descriptions.
I want to deduplicate the table (in T-SQL) by selecting the shortest description (in number of characters) for each team code. How do I do this in a non-manual fashion?
I could obviously group in the team code, but there is no suitable aggregate function that I can think of for pulling out the shortest description in each group. MIN() won’t work as it sorts (e.g.) ‘aa’ before ‘b’. Also, T-SQL has no FIRST() or LAST() aggregate function.
Any ideas anyone?
Yours hopefully
Mark Dalley
November 7, 2024 at 6:33 pm
If your version of SQL has use of FIRST_VALUE, you could try this:
SELECT DISTINCT team_code, /* , ...*/
FIRST_VALUE(description) OVER(PARTITION BY team_code ORDER BY LEN(description)) AS description
FROM dbo.table_name
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 7, 2024 at 7:02 pm
Maybe something like this?
declare @Teams table (
TeamCode varchar(10),
Description varchar(30)
)
INSERT INTO @Teams (TeamCode, Description)
VALUES
('AC', 'Action Committee'),
('AC', 'Active Cowards'),
('AC', 'Attentive Collegeboys'),
('AC', 'Act. Com.'),
('WC', 'World-class Citizens'),
('WC', 'World Citizens'),
('WC', 'Wretched Cowards'),
('WC', 'Wicked Consultants')
;
WITH TeamData AS (
SELECT
TeamCode,
Description,
MIN(RIGHT(SPACE(10)+CAST(LEN(Description) AS VARCHAR(10)),10)+Description) OVER(PARTITION BY TeamCode) AS NewDescription
FROM @Teams)
select
TeamCode,
Description,
substring(NewDescription,11,255) AS NewDescription
from TeamData
TeamCode Description NewDescription
---------- ------------------------------ ----------------------------------------
AC Action Committee Act. Com.
AC Active Cowards Act. Com.
AC Attentive Collegeboys Act. Com.
AC Act. Com. Act. Com.
WC World-class Citizens World Citizens
WC World Citizens World Citizens
WC Wretched Cowards World Citizens
WC Wicked Consultants World Citizens
(8 rows affected)
November 7, 2024 at 9:19 pm
I love a good reason to use any of the widow functions. Thank you to Kaj for some sample data. I can't tell you how many times I've had to dedupe and have leveraged this.
DECLARE @Teams TABLE (
TeamCode VARCHAR(10),
Description VARCHAR(30)
)
INSERT INTO @Teams (TeamCode, Description)
VALUES
('AC', 'Action Committee'),
('AC', 'Active Cowards'),
('AC', 'Attentive Collegeboys'),
('AC', 'Act. Com.'),
('WC', 'World-class Citizens'),
('WC', 'World Citizens'),
('WC', 'Wretched Cowards'),
('WC', 'Wicked Consultants')
;
-- Show how the data is grouped and numbered
SELECT *,
RowNum = ROW_NUMBER() OVER (PARTITION BY TeamCode ORDER BY LEN([Description]))
FROM @Teams
-- Remove anything with RowNum > 1
; WITH Dupes AS
(SELECT *,
RowNum = ROW_NUMBER() OVER (PARTITION BY TeamCode ORDER BY LEN([Description]))
FROM @Teams
)
DELETE FROM Dupes
WHERE RowNum > 1
-- Final result
SELECT *
FROM @Teams
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply