Thoughts on comparing two result sets

Question

Post reply

Thoughts on comparing two result sets

Mike Scalise

SSCrazy Eights

Points: 8145
More actions
September 11, 2015 at 2:19 pm

#304495

Hi,
I have extracted the metadata from most of the reports in our reporting system for the purpose of comparing their field usage to see if any are identical or (even better), to determine a % of matching fields for those reports that match on at least one field. Does anyone have any thoughts on an effective approach to this? I know there are a ton of ways to skin this cat, so I'm trying to see if anyone has done it before. Here's a sample of how my table is structured.
ReportID, FieldName, TotalNumberOfFields
1.ID,5
1,FirstName,5
1,MiddleName,5
1,LastName,5
1,BirthDate,5
2,ID,2
2,LastNAme,2
3,ID,1
4,HairColor,3
4,EyeColor,3
4,Ethnicity,3
You can see that 100% of the fields on reports 2 and 3 are encompassed by report 1. There is no overlap for report 4. Again, just trying to get some ideas on how I might approach this.
Thanks in advance,
Mike
Mike Scalise, PMP
https://www.michaelscalise.com

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 1

Quick suggestion

😎

/* Safe place */

USE tempdb;

GO

SET NOCOUNT ON;

GO

DECLARE @METADATA TABLE

(

ReportID INT NOT NULL

,FieldName NVARCHAR(128) NOT NULL

,TotalNumberOfFields INT NOT NULL

);

INSERT INTO @METADATA(ReportID,FieldName ,TotalNumberOfFields)

VALUES (1 ,N'ID' ,5 )

,(1 ,N'FirstName' ,5 )

,(1 ,N'MiddleName' ,5 )

,(1 ,N'LastName' ,5 )

,(1 ,N'BirthDate' ,5 )

,(2 ,N'ID' ,2 )

,(2 ,N'LastNAme' ,2 )

,(3 ,N'ID' ,1 )

,(4 ,N'HairColor' ,3 )

,(4 ,N'EyeColor' ,3 )

,(4 ,N'Ethnicity' ,3 )

;

;WITH BASE_DATA AS

(

SELECT

MF.FieldName

,MD.ReportID

,DENSE_RANK() OVER

(

ORDER BY MF.FieldName

) AS FN_DRNK

FROM @METADATA MD

CROSS APPLY

(

SELECT

DISTINCT M.FieldName

FROM @METADATA M

) AS MF(FieldName)

WHERE MD.FieldName = MF.FieldName

)

,REPORTS AS

(

SELECT

DISTINCT MDR.ReportID

,N'REPORT ' + CONVERT(NVARCHAR(10),MDR.ReportID,0) AS R_NAME

FROM @METADATA MDR

)

,REPORT_FIELD_LIST AS

(

SELECT DISTINCT

BD.ReportID

,RS.R_NAME

,(SELECT

NCHAR(124) + CONVERT(NVARCHAR(10),SBD.FN_DRNK,0)

FROM BASE_DATA SBD

WHERE SBD.ReportID = BD.ReportID

FOR XML PATH(''), TYPE).value('.[1]','NVARCHAR(1000)') AS FLD_LIST

FROM BASE_DATA BD

INNER JOIN REPORTS RS

ON BD.ReportID = RS.ReportID

)

SELECT

RF.R_NAME

,RF.FLD_LIST

,R2.R_NAME

,R2.FLD_LIST

FROM REPORT_FIELD_LIST RF

OUTER APPLY REPORT_FIELD_LIST R2

WHERE CHARINDEX(R2.FLD_LIST,RF.FLD_LIST,1) > 0

AND RF.ReportID <> R2.ReportID;

Results

R_NAME FLD_LIST R_NAME FLD_LIST

--------- ----------- --------- ---------

REPORT 1 |1|4|6|7|8 REPORT 2 |6|7

REPORT 1 |1|4|6|7|8 REPORT 3 |6

REPORT 2 |6|7 REPORT 3 |6

Mike Scalise SSCrazy Eights Points: 8145 More actions · Answer 2

Thanks, Eirikur!

I see what you're doing but I'm not sure how I can use that last column to determine the overlap of fields between reports. Perhaps I'm missing something. Can you explain?

Thanks,

Mike

Mike Scalise, PMP
https://www.michaelscalise.com

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 3

Here's an alternate approach...

IF OBJECT_ID('tempdb..#ReportFields') IS NOT NULL

DROP TABLE #ReportFields;

CREATE TABLE #ReportFields (

ReportID INT,

FieldName VARCHAR(20),

TotalNumberOfFields INT

);

INSERT #ReportFields (ReportID, FieldName, TotalNumberOfFields) VALUES

(1,'ID',5),

(1,'FirstName',5),

(1,'MiddleName',5),

(1,'LastName',5),

(1,'BirthDate',5),

(2,'ID',2),

(2,'LastNAme',2),

(3,'ID',1),

(4,'HairColor',3),

(4,'EyeColor',3),

(4,'Ethnicity',3);

SELECT

rf1.ReportID AS Report_1,

rf2.ReportID AS Report_2,

COUNT(*) AS CountOfCommonFields,

COUNT(*) * 1.0 / MIN(rf1.TotalNumberOfFields) * 100 AS Report_1_Percent,

COUNT(*) * 1.0 / MIN(rf2.TotalNumberOfFields) * 100 AS Report_2_Percent

FROM

#ReportFields rf1

JOIN #ReportFields rf2

ON rf1.ReportID < rf2.ReportID

AND rf1.FieldName = rf2.FieldName

GROUP BY

rf1.ReportID,

rf2.ReportID

Results...

Report_1 Report_2 CountOfCommonFields Report_1_Percent Report_2_Percent

----------- ----------- ------------------- --------------------------------------- ---------------------------------------

1 2 2 40.000000000000 100.000000000000

1 3 1 20.000000000000 100.000000000000

2 3 1 50.000000000000 100.000000000000

Mike Scalise SSCrazy Eights Points: 8145 More actions · Answer 4

This is incredibly helpful. Thank you for your input!!

Mike Scalise, PMP
https://www.michaelscalise.com

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 5

Jason A. Long

SSC-Insane

Points: 23786

September 14, 2015 at 1:31 pm

#1826523

Grad to help. 🙂