Select group on multiple columns when at least one of the non grouped columns do not all match

Question

Select group on multiple columns when at least one of the non grouped columns do not all match

jeff.born

Ten Centuries

Points: 1136
More actions
August 27, 2014 at 7:50 pm

#397269

I'd like to first figure out the count of how many rows are not the Current Edition have the following:
Second I'd like to be able to select the primary key of all the rows involved
Third I'd like to select all the primary keys of just the rows not in the current edition
Not really sure how to describe this without making a dataset
[Code="sql"]
CREATE TABLE [Project].[TestTable1](
[TestTable1_pk] [int] IDENTITY(1,1) NOT NULL,
[Source_ID] [int] NOT NULL,
[Edition_fk] [int] NOT NULL,
[Key1_fk] [int] NOT NULL,
[Key2_fk] [int] NOT NULL,
[MatchTypeCode] [char](1) NOT NULL,
[MatchScore] [int] NOT NULL,
[MatchRank] [int] NOT NULL CONSTRAINT [DF_MidasMatchNormalized_MatchRank] DEFAULT ((0)),
CONSTRAINT [PK_TestTable1] PRIMARY KEY CLUSTERED
(
[TestTable1_pk] ASC
)
);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,1181206,'N',50,1);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1181206,'N',50,1);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1307967,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1307967,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,1307967,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1307967,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1753914,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,1753914,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3829330,'I',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,3829330,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3829330,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3829330,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,3829330,'I',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3829330,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,3854067,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3854067,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,133966931,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,133966931,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,133966931,'I',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,133966931,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,133966931,'I',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,134402313,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,134402313,'N',50,0);
[/code]
So this is a small sample of the data that I randomly pulled that had the 403583201 Source_ID and I noticed that some have an Edition_fk of 58 but other than that they are the same.
Group by fails me because I only want the groups where the Edition_fk don't match
My head is spinning trying to figure these out!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63523 More actions · Answer 1

Nice job posting DDL and sample data! How do you determine which rows are/are not the Current Edition? Once we have that, the rest of this should be trivial... So could you explain what the Edition column means? Without it I can't get from your question to your data to an answer, because I don't understand what I'm looking at.

Thanks!

Pieter

Eirikur Eiriksson SSC Guru Points: 182935 More actions · Answer 2

Hi and welcome to the forum, complements on the fine DDL and sample data!

Not too clear on what you are after but please bear with me as I'm only on the first morning coffee:-D

I put together a quick query bringing back some of the things I think you'll need using Window functions. Using the Window functions for this is quite neat as it allows for aggregation while keeping the details in the set.

Note: I changed the schema to dbo.

😎

;WITH BASE_DATA AS

(

SELECT

TT.TestTable1_pk

,TT.Source_ID

,TT.Edition_fk

,TT.Key1_fk

,TT.Key2_fk

,TT.MatchTypeCode

,TT.MatchScore

,TT.MatchRank

,CASE

WHEN DENSE_RANK() OVER (ORDER BY TT.Edition_fk DESC) = 1 THEN 0

ELSE 1

END AS OLD_EDITION

,ROW_NUMBER() OVER

(

PARTITION BY TT.Edition_fk

,TT.Source_ID

,TT.Key1_fk

,TT.Key2_fk

,TT.MatchTypeCode

,TT.MatchScore

,TT.MatchRank

ORDER BY (SELECT NULL)

) AS EDITION_DDRID

,ROW_NUMBER() OVER

(

ORDER BY TT.Edition_fk

,TT.Source_ID

,TT.Key1_fk

,TT.Key2_fk

,TT.MatchTypeCode

,TT.MatchScore

,TT.MatchRank

) AS EDITION_RID

FROM [dbo].[TestTable1] TT

)

SELECT

BD.TestTable1_pk

,BD.Source_ID

,BD.Edition_fk

,BD.Key1_fk

,BD.Key2_fk

,BD.MatchTypeCode

,BD.MatchScore

,BD.MatchRank

,BD.OLD_EDITION

,SIGN(BD.EDITION_DDRID -1) AS IS_DUPE

,BD.EDITION_RID

,SUM(BD.OLD_EDITION) OVER

(

PARTITION BY BD.Source_ID

) AS NUM_OLD_ENTRIES_BY_SRC_ID

,SUM(BD.EDITION_DDRID -1) OVER

(

PARTITION BY BD.Source_ID,OLD_EDITION

) AS NUM_DUP_ENT_BY_SRC_CURR

FROM BASE_DATA BD

Results

TestTable1_pk Source_ID Edition_fk Key1_fk Key2_fk MatchTypeCode MatchScore MatchRank OLD_EDITION IS_DUPE EDITION_RID NUM_OLD_ENTRIES_BY_SRC_ID NUM_DUP_ENT_BY_SRC_CURR

------------- ----------- ----------- ----------- ----------- ------------- ----------- ----------- ----------- -------------------- -------------------- ------------------------- -----------------------

2 403583201 58 249848 1181206 N 50 1 0 0 13 12 1

6 403583201 58 249848 1307967 N 50 0 0 0 14 12 1

3 403583201 58 249848 1307967 P 25 0 0 0 15 12 1

4 403583201 58 249848 1307967 P 25 0 0 1 16 12 1

7 403583201 58 249848 1753914 N 50 0 0 0 17 12 1

13 403583201 58 249848 3829330 I 50 0 0 0 18 12 1

10 403583201 58 249848 3829330 N 50 0 0 0 19 12 1

15 403583201 58 249848 3854067 N 50 0 0 0 20 12 1

21 403583201 58 249848 133966931 I 50 0 0 0 21 12 1

20 403583201 58 249848 133966931 N 50 0 0 0 22 12 1

22 403583201 58 249848 134402313 N 50 0 0 0 23 12 1

1 403583201 54 249848 1181206 N 50 1 1 0 1 12 1

5 403583201 54 249848 1307967 N 50 0 1 0 2 12 1

8 403583201 54 249848 1753914 N 50 0 1 0 3 12 1

9 403583201 54 249848 3829330 I 50 0 1 0 4 12 1

14 403583201 54 249848 3829330 N 50 0 1 0 5 12 1

11 403583201 54 249848 3829330 P 25 0 1 0 6 12 1

12 403583201 54 249848 3829330 P 25 0 1 1 7 12 1

16 403583201 54 249848 3854067 N 50 0 1 0 8 12 1

19 403583201 54 249848 133966931 I 50 0 1 0 9 12 1

17 403583201 54 249848 133966931 N 50 0 1 0 10 12 1

18 403583201 54 249848 133966931 P 25 0 1 0 11 12 1

23 403583201 54 249848 134402313 N 50 0 1 0 12 12 1

jeff.born Ten Centuries Points: 1136 More actions · Answer 3

pietlinden (8/27/2014)
Nice job posting DDL and sample data! How do you determine which rows are/are not the Current Edition? Once we have that, the rest of this should be trivial... So could you explain what the Edition column means? Without it I can't get from your question to your data to an answer, because I don't understand what I'm looking at.
Thanks!
Pieter

Realized that I didn't put what current Edition_fk should be until after I crawled into bed last night. The current Edition_fk is 58. It basically tells me with what data set the matches were created.

Jeff

jeff.born Ten Centuries Points: 1136 More actions · Answer 4

Eirikur Eiriksson (8/27/2014)
Hi and welcome to the forum, complements on the fine DDL and sample data!
Not too clear on what you are after but please bear with me as I'm only on the first morning coffee:-D
I put together a quick query bringing back some of the things I think you'll need using Window functions. Using the Window functions for this is quite neat as it allows for aggregation while keeping the details in the set.
Note: I changed the schema to dbo.

I'm also still in the process of waking up and am still trying to figure out what cool tricks are doing, so I'm not sure if this accomplishes what I need, but I intend to understand it because I like learning! After I test with some more data I'll reply back if this is what I need.