August 27, 2014 at 7:50 pm
I'd like to first figure out the count of how many rows are not the Current Edition have the following:
Second I'd like to be able to select the primary key of all the rows involved
Third I'd like to select all the primary keys of just the rows not in the current edition
Not really sure how to describe this without making a dataset
[Code="sql"]
CREATE TABLE [Project].[TestTable1](
[TestTable1_pk] [int] IDENTITY(1,1) NOT NULL,
[Source_ID] [int] NOT NULL,
[Edition_fk] [int] NOT NULL,
[Key1_fk] [int] NOT NULL,
[Key2_fk] [int] NOT NULL,
[MatchTypeCode] [char](1) NOT NULL,
[MatchScore] [int] NOT NULL,
[MatchRank] [int] NOT NULL CONSTRAINT [DF_MidasMatchNormalized_MatchRank] DEFAULT ((0)),
CONSTRAINT [PK_TestTable1] PRIMARY KEY CLUSTERED
(
[TestTable1_pk] ASC
)
);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,1181206,'N',50,1);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1181206,'N',50,1);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1307967,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1307967,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,1307967,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1307967,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,1753914,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,1753914,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3829330,'I',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,3829330,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3829330,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3829330,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,3829330,'I',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3829330,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,3854067,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,3854067,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,133966931,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,133966931,'P',25,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,133966931,'I',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,133966931,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,133966931,'I',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,58,249848,134402313,'N',50,0);
INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])
VALUES (403583201,54,249848,134402313,'N',50,0);
[/code]
So this is a small sample of the data that I randomly pulled that had the 403583201 Source_ID and I noticed that some have an Edition_fk of 58 but other than that they are the same.
Group by fails me because I only want the groups where the Edition_fk don't match
My head is spinning trying to figure these out!
August 27, 2014 at 9:13 pm
Nice job posting DDL and sample data! How do you determine which rows are/are not the Current Edition? Once we have that, the rest of this should be trivial... So could you explain what the Edition column means? Without it I can't get from your question to your data to an answer, because I don't understand what I'm looking at.
Thanks!
Pieter
August 27, 2014 at 10:53 pm
Hi and welcome to the forum, complements on the fine DDL and sample data!
Not too clear on what you are after but please bear with me as I'm only on the first morning coffee:-D
I put together a quick query bringing back some of the things I think you'll need using Window functions. Using the Window functions for this is quite neat as it allows for aggregation while keeping the details in the set.
Note: I changed the schema to dbo.
😎
;WITH BASE_DATA AS
(
SELECT
TT.TestTable1_pk
,TT.Source_ID
,TT.Edition_fk
,TT.Key1_fk
,TT.Key2_fk
,TT.MatchTypeCode
,TT.MatchScore
,TT.MatchRank
,CASE
WHEN DENSE_RANK() OVER (ORDER BY TT.Edition_fk DESC) = 1 THEN 0
ELSE 1
END AS OLD_EDITION
,ROW_NUMBER() OVER
(
PARTITION BY TT.Edition_fk
,TT.Source_ID
,TT.Key1_fk
,TT.Key2_fk
,TT.MatchTypeCode
,TT.MatchScore
,TT.MatchRank
ORDER BY (SELECT NULL)
) AS EDITION_DDRID
,ROW_NUMBER() OVER
(
ORDER BY TT.Edition_fk
,TT.Source_ID
,TT.Key1_fk
,TT.Key2_fk
,TT.MatchTypeCode
,TT.MatchScore
,TT.MatchRank
) AS EDITION_RID
FROM [dbo].[TestTable1] TT
)
SELECT
BD.TestTable1_pk
,BD.Source_ID
,BD.Edition_fk
,BD.Key1_fk
,BD.Key2_fk
,BD.MatchTypeCode
,BD.MatchScore
,BD.MatchRank
,BD.OLD_EDITION
,SIGN(BD.EDITION_DDRID -1) AS IS_DUPE
,BD.EDITION_RID
,SUM(BD.OLD_EDITION) OVER
(
PARTITION BY BD.Source_ID
) AS NUM_OLD_ENTRIES_BY_SRC_ID
,SUM(BD.EDITION_DDRID -1) OVER
(
PARTITION BY BD.Source_ID,OLD_EDITION
) AS NUM_DUP_ENT_BY_SRC_CURR
FROM BASE_DATA BD
Results
TestTable1_pk Source_ID Edition_fk Key1_fk Key2_fk MatchTypeCode MatchScore MatchRank OLD_EDITION IS_DUPE EDITION_RID NUM_OLD_ENTRIES_BY_SRC_ID NUM_DUP_ENT_BY_SRC_CURR
------------- ----------- ----------- ----------- ----------- ------------- ----------- ----------- ----------- -------------------- -------------------- ------------------------- -----------------------
2 403583201 58 249848 1181206 N 50 1 0 0 13 12 1
6 403583201 58 249848 1307967 N 50 0 0 0 14 12 1
3 403583201 58 249848 1307967 P 25 0 0 0 15 12 1
4 403583201 58 249848 1307967 P 25 0 0 1 16 12 1
7 403583201 58 249848 1753914 N 50 0 0 0 17 12 1
13 403583201 58 249848 3829330 I 50 0 0 0 18 12 1
10 403583201 58 249848 3829330 N 50 0 0 0 19 12 1
15 403583201 58 249848 3854067 N 50 0 0 0 20 12 1
21 403583201 58 249848 133966931 I 50 0 0 0 21 12 1
20 403583201 58 249848 133966931 N 50 0 0 0 22 12 1
22 403583201 58 249848 134402313 N 50 0 0 0 23 12 1
1 403583201 54 249848 1181206 N 50 1 1 0 1 12 1
5 403583201 54 249848 1307967 N 50 0 1 0 2 12 1
8 403583201 54 249848 1753914 N 50 0 1 0 3 12 1
9 403583201 54 249848 3829330 I 50 0 1 0 4 12 1
14 403583201 54 249848 3829330 N 50 0 1 0 5 12 1
11 403583201 54 249848 3829330 P 25 0 1 0 6 12 1
12 403583201 54 249848 3829330 P 25 0 1 1 7 12 1
16 403583201 54 249848 3854067 N 50 0 1 0 8 12 1
19 403583201 54 249848 133966931 I 50 0 1 0 9 12 1
17 403583201 54 249848 133966931 N 50 0 1 0 10 12 1
18 403583201 54 249848 133966931 P 25 0 1 0 11 12 1
23 403583201 54 249848 134402313 N 50 0 1 0 12 12 1
August 28, 2014 at 6:38 am
pietlinden (8/27/2014)
Nice job posting DDL and sample data! How do you determine which rows are/are not the Current Edition? Once we have that, the rest of this should be trivial... So could you explain what the Edition column means? Without it I can't get from your question to your data to an answer, because I don't understand what I'm looking at.Thanks!
Pieter
Realized that I didn't put what current Edition_fk should be until after I crawled into bed last night. The current Edition_fk is 58. It basically tells me with what data set the matches were created.
Jeff
August 28, 2014 at 6:41 am
Eirikur Eiriksson (8/27/2014)
Hi and welcome to the forum, complements on the fine DDL and sample data!Not too clear on what you are after but please bear with me as I'm only on the first morning coffee:-D
I put together a quick query bringing back some of the things I think you'll need using Window functions. Using the Window functions for this is quite neat as it allows for aggregation while keeping the details in the set.
Note: I changed the schema to dbo.
I'm also still in the process of waking up and am still trying to figure out what cool tricks are doing, so I'm not sure if this accomplishes what I need, but I intend to understand it because I like learning! After I test with some more data I'll reply back if this is what I need.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply