July 26, 2018 at 1:35 pm
First off thanks to all those who share their expertise on this forum. Apology #1: I am really in SQL Server 2005. An upgrade is in the works so a more preferred solution in 2016 would be a bonus. Apology #2: I have to obfuscate my data so much that it becomes very generic but hopefully I have fully demonstrated the problem.
This is really a two-part question. I have a vendor who provides data on a daily basis about certain entities. I take this data and follow business rules to generate my own calculated value.
1) If the vendor fails to provide a data point on the most recent day I want to use the most recent non-null value for that entity. Any of my datapoints can be null and this most recent non-null value logic needs to apply to any of them(DataPoint1,DataPoint2, and DataPoint3). This article seems to be state of the art but I don't believe it will not work in 2005. https://www.itprotoday.com/software-development/last-non-null-puzzle
2) Given that the value of my calculation can be sourced from any of the datapoints on any of the dates I want to be able to track which data point provided the value and from which day.
Thanks if you can help.IF OBJECT_ID('dbo.SourceData', 'U') IS NOT NULL DROP TABLE dbo.SourceData;
CREATE TABLE dbo.SourceData
(
EntityID INT,
ReportingDate DATETIME,
EntityType CHAR(1),
DataPoint1 DECIMAL(5,1),
DataPoint2 DECIMAL(5,1),
DataPoint3 DECIMAL(5,1)
)
INSERT INTO dbo.SourceData VALUES (1,'2018-07-01','A',1.3,2.0,3.0)
INSERT INTO dbo.SourceData VALUES (2,'2018-07-01','B',3,3,3)
INSERT INTO dbo.SourceData VALUES (3,'2018-07-01','C',2.1,2.3,2.7)
INSERT INTO dbo.SourceData VALUES (4,'2018-07-01','C',2.6,2.3,2.7)
INSERT INTO dbo.SourceData VALUES (5,'2018-07-01','C',2.6,2.6,2.7)
INSERT INTO dbo.SourceData VALUES (1,'2018-07-02','A',NULL,2.0,3.0)
INSERT INTO dbo.SourceData VALUES (2,'2018-07-02','B',3,3,3)
INSERT INTO dbo.SourceData VALUES (3,'2018-07-02','C',2.1,2.3,2.7)
INSERT INTO dbo.SourceData VALUES (4,'2018-07-02','C',2.6,2.3,2.7)
INSERT INTO dbo.SourceData VALUES (5,'2018-07-02','C',2.6,2.6,NULL)
--This demonstrates my calculation but the expected results should have one record per EntityID
SELECT
CASE EntityType
WHEN 'A' THEN DataPoint1
WHEN 'B' THEN COALESCE(DataPoint1,DataPoint2)
ELSE
CASE WHEN DataPoint1 > DataPoint2 THEN DataPoint1 ELSE DataPoint3 END
END
AS MyCalculation,
*
FROM dbo.SourceData
--Expected results
SELECT '1.3' AS MyCalculation,'1' AS EntityID, '2018-07-01' AS SourceDate,'DataPoint1' AS SourceDatapoint UNION ALL
SELECT '3.0'AS MyCalculation,'2' AS EntityID,'2018-07-02' AS SourceDate,'DataPoint1' AS SourceDatapoint UNION ALL
SELECT '2.7'AS MyCalculation,'3' AS EntityID,'2018-07-02' AS SourceDate,'DataPoint3' AS SourceDatapoint UNION ALL
SELECT '2.6'AS MyCalculation,'4' AS EntityID,'2018-07-02' AS SourceDate,'DataPoint1' AS SourceDatapoint UNION ALL
SELECT '2.7'AS MyCalculation,'5' AS EntityID,'2018-07-01' AS SourceDate,'DataPoint3' AS SourceDatapoint
July 26, 2018 at 2:04 pm
Probably something like this?
WITH CTE AS(
SELECT
CASE
WHEN EntityType = 'A' THEN DataPoint1
WHEN EntityType = 'B' THEN COALESCE(DataPoint1,DataPoint2)
WHEN DataPoint1 > DataPoint2 THEN DataPoint1
ELSE DataPoint3
END
AS MyCalculation,
CASE
WHEN EntityType = 'A' THEN 'DataPoint1'
WHEN EntityType = 'B' AND Datapoint1 IS NOT NULL THEN 'DataPoint1'
WHEN EntityType = 'B' AND Datapoint1 IS NULL THEN 'DataPoint2'
WHEN DataPoint1 > DataPoint2 THEN 'DataPoint1'
ELSE 'DataPoint3'
END AS SourceDataPoint,
ROW_NUMBER() OVER(PARTITION BY EntityID ORDER BY ReportingDate DESC) AS rn,
EntityID,
ReportingDate AS SourceDate
FROM dbo.SourceData
WHERE CASE
WHEN EntityType = 'A' THEN DataPoint1
WHEN EntityType = 'B' THEN COALESCE(DataPoint1,DataPoint2)
WHEN DataPoint1 > DataPoint2 THEN DataPoint1
ELSE DataPoint3
END IS NOT NULL
)
SELECT CTE.MyCalculation,
CTE.EntityID,
CTE.SourceDate,
CTE.SourceDataPoint
FROM CTE
WHERE CTE.rn = 1;
July 26, 2018 at 4:20 pm
Very nice, thank you. Putting aside the most recent non-null record issue I started toying with the below. The advantage being that I don't have too have a second expanded calculation which may simplify matters if the logic gets deep. Definitely kludgey since I don't have a ways to distribute unique identifiers across these 3 fields: DataPoint1ID, DataPoint2ID and DataPoint3ID.
I expect I'll go with your solution but I thought I'd share my thought experiment. Thanks again.
IF OBJECT_ID('dbo.SourceData', 'U') IS NOT NULL DROP TABLE dbo.SourceData;
CREATE TABLE dbo.SourceData
(
EntityID INT,
ReportingDate DATETIME,
EntityType CHAR(1),
DataPoint1 DECIMAL(5,1),
DataPoint1ID INT,
DataPoint2 DECIMAL(5,1),
DataPoint2ID INT,
DataPoint3 DECIMAL(5,1) ,
DataPoint3ID INT
)
INSERT INTO dbo.SourceData VALUES (1,'2018-07-01','A',1.3,1,2.0,11,3.0,21)
INSERT INTO dbo.SourceData VALUES (2,'2018-07-01','B',3,2,3,12,3,22)
INSERT INTO dbo.SourceData VALUES (3,'2018-07-01','C',2.1,3,2.3,13,2.7,23)
INSERT INTO dbo.SourceData VALUES (4,'2018-07-01','C',2.6,4,2.3,14,2.7,24)
INSERT INTO dbo.SourceData VALUES (5,'2018-07-01','C',2.6,5,2.6,15,2.7,25)
--Luis's solution
SELECT
CASE
WHEN EntityType = 'A' THEN DataPoint1
WHEN EntityType = 'B' THEN COALESCE(DataPoint1,DataPoint2)
WHEN DataPoint1 > DataPoint2 THEN DataPoint1
ELSE DataPoint3
END
AS MyCalculation,
CASE
WHEN EntityType = 'A' THEN 'DataPoint1'
WHEN EntityType = 'B' AND Datapoint1 IS NOT NULL THEN 'DataPoint1'
WHEN EntityType = 'B' AND Datapoint1 IS NULL THEN 'DataPoint2'
WHEN DataPoint1 > DataPoint2 THEN 'DataPoint1'
ELSE 'DataPoint3'
END AS SourceDataPoint,
EntityID,
ReportingDate AS SourceDate
FROM dbo.SourceData
WHERE CASE
WHEN EntityType = 'A' THEN DataPoint1
WHEN EntityType = 'B' THEN COALESCE(DataPoint1,DataPoint2)
WHEN DataPoint1 > DataPoint2 THEN DataPoint1
ELSE DataPoint3
END IS NOT NULL
IF OBJECT_ID('tempdb..#DataPointSourceIDS') IS NOT NULL DROP TABLE #DataPointSourceIDS
IF OBJECT_ID('tempdb..#SourceIDS') IS NOT NULL DROP TABLE #SourceIDS
SELECT
*
INTO #DataPointSourceIDS
FROM
(
SELECT
Datapoint1 AS DataPoint,
DataPoint1ID AS DataPointID,
ReportingDate AS SourceDate,
'Datapoint1' AS SourceDatePoint
FROM dbo.SourceData
UNION ALL
SELECT
Datapoint2,
DataPoint2ID,
ReportingDate AS SourceDate,
'Datapoint2' AS SourceDatePoint
FROM dbo.SourceData
UNION ALL
SELECT
Datapoint3,
DataPoint3ID,
ReportingDate AS SourceDate,
'Datapoint3' AS SourceDatePoint
FROM dbo.SourceData
) DPSIDS
SELECT
CASE EntityType
WHEN 'A' THEN DataPoint1ID
WHEN 'B' THEN COALESCE(DataPoint1ID,DataPoint2ID)
ELSE
CASE WHEN DataPoint1 > DataPoint2 THEN DataPoint1ID ELSE DataPoint3ID END
END
AS MyCalculationID,
*
INTO #SourceIDS
FROM dbo.SourceData
SELECT
--SID.MyCalculationID,
(SELECT DataPoint FROM #DataPointSourceIDS WHERE DataPointID = SID.MyCalculationID) AS MyCalculation,
(SELECT SourceDatePoint FROM #DataPointSourceIDS WHERE DataPointID = SID.MyCalculationID) AS SourceDataPoint,
EntityID,
(SELECT SourceDate FROM #DataPointSourceIDS WHERE DataPointID = SID.MyCalculationID) AS SourceDate
FROM #SourceIDS SID
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply