September 11, 2015 at 2:19 pm
Hi,
I have extracted the metadata from most of the reports in our reporting system for the purpose of comparing their field usage to see if any are identical or (even better), to determine a % of matching fields for those reports that match on at least one field. Does anyone have any thoughts on an effective approach to this? I know there are a ton of ways to skin this cat, so I'm trying to see if anyone has done it before. Here's a sample of how my table is structured.
ReportID, FieldName, TotalNumberOfFields
1.ID,5
1,FirstName,5
1,MiddleName,5
1,LastName,5
1,BirthDate,5
2,ID,2
2,LastNAme,2
3,ID,1
4,HairColor,3
4,EyeColor,3
4,Ethnicity,3
You can see that 100% of the fields on reports 2 and 3 are encompassed by report 1. There is no overlap for report 4. Again, just trying to get some ideas on how I might approach this.
Thanks in advance,
Mike
Mike Scalise, PMP
https://www.michaelscalise.com
September 12, 2015 at 2:45 am
Quick suggestion
😎
/* Safe place */
USE tempdb;
GO
SET NOCOUNT ON;
GO
DECLARE @METADATA TABLE
(
ReportID INT NOT NULL
,FieldName NVARCHAR(128) NOT NULL
,TotalNumberOfFields INT NOT NULL
);
INSERT INTO @METADATA(ReportID,FieldName ,TotalNumberOfFields)
VALUES (1 ,N'ID' ,5 )
,(1 ,N'FirstName' ,5 )
,(1 ,N'MiddleName' ,5 )
,(1 ,N'LastName' ,5 )
,(1 ,N'BirthDate' ,5 )
,(2 ,N'ID' ,2 )
,(2 ,N'LastNAme' ,2 )
,(3 ,N'ID' ,1 )
,(4 ,N'HairColor' ,3 )
,(4 ,N'EyeColor' ,3 )
,(4 ,N'Ethnicity' ,3 )
;
;WITH BASE_DATA AS
(
SELECT
MF.FieldName
,MD.ReportID
,DENSE_RANK() OVER
(
ORDER BY MF.FieldName
) AS FN_DRNK
FROM @METADATA MD
CROSS APPLY
(
SELECT
DISTINCT M.FieldName
FROM @METADATA M
) AS MF(FieldName)
WHERE MD.FieldName = MF.FieldName
)
,REPORTS AS
(
SELECT
DISTINCT MDR.ReportID
,N'REPORT ' + CONVERT(NVARCHAR(10),MDR.ReportID,0) AS R_NAME
FROM @METADATA MDR
)
,REPORT_FIELD_LIST AS
(
SELECT DISTINCT
BD.ReportID
,RS.R_NAME
,(SELECT
NCHAR(124) + CONVERT(NVARCHAR(10),SBD.FN_DRNK,0)
FROM BASE_DATA SBD
WHERE SBD.ReportID = BD.ReportID
FOR XML PATH(''), TYPE).value('.[1]','NVARCHAR(1000)') AS FLD_LIST
FROM BASE_DATA BD
INNER JOIN REPORTS RS
ON BD.ReportID = RS.ReportID
)
SELECT
RF.R_NAME
,RF.FLD_LIST
,R2.R_NAME
,R2.FLD_LIST
FROM REPORT_FIELD_LIST RF
OUTER APPLY REPORT_FIELD_LIST R2
WHERE CHARINDEX(R2.FLD_LIST,RF.FLD_LIST,1) > 0
AND RF.ReportID <> R2.ReportID;
Results
R_NAME FLD_LIST R_NAME FLD_LIST
--------- ----------- --------- ---------
REPORT 1 |1|4|6|7|8 REPORT 2 |6|7
REPORT 1 |1|4|6|7|8 REPORT 3 |6
REPORT 2 |6|7 REPORT 3 |6
September 12, 2015 at 7:34 am
Thanks, Eirikur!
I see what you're doing but I'm not sure how I can use that last column to determine the overlap of fields between reports. Perhaps I'm missing something. Can you explain?
Thanks,
Mike
Mike Scalise, PMP
https://www.michaelscalise.com
September 12, 2015 at 6:53 pm
Here's an alternate approach...
IF OBJECT_ID('tempdb..#ReportFields') IS NOT NULL
DROP TABLE #ReportFields;
CREATE TABLE #ReportFields (
ReportID INT,
FieldName VARCHAR(20),
TotalNumberOfFields INT
);
INSERT #ReportFields (ReportID, FieldName, TotalNumberOfFields) VALUES
(1,'ID',5),
(1,'FirstName',5),
(1,'MiddleName',5),
(1,'LastName',5),
(1,'BirthDate',5),
(2,'ID',2),
(2,'LastNAme',2),
(3,'ID',1),
(4,'HairColor',3),
(4,'EyeColor',3),
(4,'Ethnicity',3);
SELECT
rf1.ReportID AS Report_1,
rf2.ReportID AS Report_2,
COUNT(*) AS CountOfCommonFields,
COUNT(*) * 1.0 / MIN(rf1.TotalNumberOfFields) * 100 AS Report_1_Percent,
COUNT(*) * 1.0 / MIN(rf2.TotalNumberOfFields) * 100 AS Report_2_Percent
FROM
#ReportFields rf1
JOIN #ReportFields rf2
ON rf1.ReportID < rf2.ReportID
AND rf1.FieldName = rf2.FieldName
GROUP BY
rf1.ReportID,
rf2.ReportID
Results...
Report_1 Report_2 CountOfCommonFields Report_1_Percent Report_2_Percent
----------- ----------- ------------------- --------------------------------------- ---------------------------------------
1 2 2 40.000000000000 100.000000000000
1 3 1 20.000000000000 100.000000000000
2 3 1 50.000000000000 100.000000000000
September 14, 2015 at 12:33 pm
This is incredibly helpful. Thank you for your input!!
Mike Scalise, PMP
https://www.michaelscalise.com
September 14, 2015 at 1:31 pm
Grad to help. 🙂
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply