January 9, 2018 at 4:21 pm
Hello everybody,
trying to figure what is the best way to figure out change between snapshots of project records. Snapshots are taken weekly. Project name is unique, dates are always 7 days apart.
We have a process that generates snapshots of products and product values and I need to detect all types of changes:
1. When project was added from previous week - e.g. it did not exist on 12/24 and it exists on 12/31 plus calculate the value change (net addition)
2. Project was removed - e.g. it existed on 12/24 and does not exist on 12/31 plus calculate the value change (net subtraction)
3. Project value has changes - existed on 12/24 and exists on 12/31 but value has changed plus calculate the value change.
The snapshots get generated every week. Attached is an example of the table and some sample data.
What I need to detect from the sample would be:
1. Project B was removed for 12/31/2017
2. Project C value has changed on 01/07/2018
3. Project D was added on 01/07/2018.
Please, suggest solutions as you see fit.
CREATE TABLE [dbo].[Snapshots](
[ID] [bigint] NULL,
[SnapshotDate] [date] NULL,
[ProjectName] [varchar](50) NULL,
[ProjectValue] [decimal](18, 2) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[Snapshots] ([ID], [SnapshotDate], [ProjectName], [ProjectValue]) VALUES (1, CAST(N'2017-12-24' AS Date), N'Project A', CAST(10000.00 AS Decimal(18, 2)))
INSERT [dbo].[Snapshots] ([ID], [SnapshotDate], [ProjectName], [ProjectValue]) VALUES (2, CAST(N'2017-12-24' AS Date), N'Project B', CAST(15000.00 AS Decimal(18, 2)))
INSERT [dbo].[Snapshots] ([ID], [SnapshotDate], [ProjectName], [ProjectValue]) VALUES (3, CAST(N'2017-12-24' AS Date), N'Project C', CAST(20000.00 AS Decimal(18, 2)))
INSERT [dbo].[Snapshots] ([ID], [SnapshotDate], [ProjectName], [ProjectValue]) VALUES (4, CAST(N'2017-12-31' AS Date), N'Project A', CAST(10000.00 AS Decimal(18, 2)))
INSERT [dbo].[Snapshots] ([ID], [SnapshotDate], [ProjectName], [ProjectValue]) VALUES (5, CAST(N'2017-12-31' AS Date), N'Project C', CAST(20000.00 AS Decimal(18, 2)))
INSERT [dbo].[Snapshots] ([ID], [SnapshotDate], [ProjectName], [ProjectValue]) VALUES (6, CAST(N'2018-01-07' AS Date), N'Project A', CAST(10000.00 AS Decimal(18, 2)))
INSERT [dbo].[Snapshots] ([ID], [SnapshotDate], [ProjectName], [ProjectValue]) VALUES (7, CAST(N'2018-01-07' AS Date), N'Project C', CAST(15000.00 AS Decimal(18, 2)))
INSERT [dbo].[Snapshots] ([ID], [SnapshotDate], [ProjectName], [ProjectValue]) VALUES (8, CAST(N'2018-01-07' AS Date), N'Project D', CAST(50000.00 AS Decimal(18, 2)))
January 9, 2018 at 4:31 pm
And what have you tried so far to solve your problem? Remember, we are just volunteers here and you are the one that will have to support any solution that may be provided.
January 9, 2018 at 4:37 pm
Hi Lynn,
excellent question. I tried two different solutions one using standard left outer join (self join) and a second one with two CTEs and joins using current and previous date.
Can't figure out how to do this in a way where it can scale as well.
Totally understand that everybody is a volunteer and greatly appreciate everybody's time and input.
thanks,
vecerda
January 9, 2018 at 4:44 pm
vecerda - Tuesday, January 9, 2018 4:37 PMHi Lynn,excellent question. I tried two different solutions one using standard left outer join (self join) and a second one with two CTEs and joins using current and previous date.
Can't figure out how to do this in a way where it can scale as well.
Totally understand that everybody is a volunteer and greatly appreciate everybody's time and input.
thanks,
vecerda
How about posting the code, not just a verbal summary of what you did.
January 9, 2018 at 5:04 pm
Where I started is this:
,
select s1.Id, s1.snapshotdate, s1.ProjectName, ISNULL(s1.ProjectValue,0)-ISNULL(s2.ProjectValue,0) as Change from snapshots s1 LEFT OUTER JOIN snapshots s2
on s1.SnapshotDate=dateadd(d,7,s2.snapshotdate) and s1.ProjectName=s2.ProjectName
but issue here is that this way I am not detecting the removal of Project B on 12/31.
Thanks for all of your help,
petr
January 9, 2018 at 5:27 pm
The fact that this table keeps a full history of all projects makes it a bit challenging. But hopefully this snippet will get you started. Let's say you want to check this table once a day to detect changes in project value. Of course, you will need to find a way to detect situation where a project (B) disappears after a certain date.
;WITH [f] AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY [ProjectName] ORDER BY SnapshotDate ASC) AS [RowNumber],
[SnapshotDate],
[ProjectName],
[ProjectValue]
FROM [Snapshots] )
SELECT
[f].[ProjectName] AS [ProjectName],
[f].[SnapshotDate] AS [SnapshotDate],
[n].[SnapshotDate] AS [Date Changed],
[f].[ProjectValue] AS [Previous Value],
[n].[ProjectValue] AS [New Value]
FROM [f]
LEFT OUTER JOIN [f] AS [n]
ON [f].[ProjectName] = [n].[ProjectName]
AND [f].[RowNumber] + 1 = [n].[RowNumber]
WHERE ([f].[ProjectValue] <> [n].[ProjectValue] )
.
January 9, 2018 at 5:33 pm
Okay, I have to leave but I am hoping that this code may help you figure out what you need to do, and if not, maybe someone else can pick up from here.
SELECT * FROM [dbo].[Snapshots]
DECLARE @CurrentSnapshot DATE
, @PreviousSnapshot DATE;
SET @CurrentSnapshot = '2017-12-31';
SET @PreviousSnapshot = DATEADD(DAY,-7,@CurrentSnapshot);
SELECT @CurrentSnapshot, @PreviousSnapshot;
(SELECT .[ProjectName], .[ProjectValue] FROM [dbo].[Snapshots] AS WHERE .[SnapshotDate] = @PreviousSnapshot
EXCEPT
SELECT.[ProjectName], .[ProjectValue] FROM [dbo].[Snapshots] AS WHERE .[SnapshotDate] = @CurrentSnapshot)
UNION
(SELECT .[ProjectName], .[ProjectValue] FROM [dbo].[Snapshots] AS WHERE .[SnapshotDate] = @CurrentSnapshot
EXCEPT
SELECT.[ProjectName], .[ProjectValue] FROM [dbo].[Snapshots] AS WHERE .[SnapshotDate] = @PreviousSnapshot);
SET @CurrentSnapshot = '2018-01-07';
SET @PreviousSnapshot = DATEADD(DAY,-7,@CurrentSnapshot);
SELECT @CurrentSnapshot, @PreviousSnapshot;
(SELECT .[ProjectName], .[ProjectValue] FROM [dbo].[Snapshots] AS WHERE .[SnapshotDate] = @PreviousSnapshot
EXCEPT
SELECT.[ProjectName], .[ProjectValue] FROM [dbo].[Snapshots] AS WHERE .[SnapshotDate] = @CurrentSnapshot)
UNION
(SELECT .[ProjectName], .[ProjectValue] FROM [dbo].[Snapshots] AS WHERE .[SnapshotDate] = @CurrentSnapshot
EXCEPT
SELECT.[ProjectName], .[ProjectValue] FROM [dbo].[Snapshots] AS WHERE .[SnapshotDate] = @PreviousSnapshot);
January 9, 2018 at 5:56 pm
Something like this is one way, with a FULL OUTER JOIN.
SELECT ChangeDate =ISNULL(post.SnapshotDate,DATEADD(DAY,7,pre.SnapshotDate)),
ProjectName=COALESCE(pre.ProjectName,post.ProjectName),
ChangeType =CASE WHEN post.SnapshotDate IS NULL THEN 'Removed'
WHEN pre.SnapshotDate IS NULL THEN 'Added'
ELSE 'Value Change'
END,
ValueChange=ISNULL(post.ProjectValue,0)-ISNULL(pre.ProjectValue,0)
FROM Snapshots pre
FULL OUTER JOIN
Snapshots post ON post.ProjectName=pre.ProjectName
AND
post.SnapshotDate=DATEADD(DAY,7,pre.SnapshotDate)
WHERE pre.ProjectName+post.ProjectName IS NULL OR pre.ProjectValue<>post.ProjectValue
ORDER BY ChangeDate ASC, ProjectName ASC;
Depending on your needs you may want to tweak it so that it doesn't show all the projects on the first recorded week being added and all projects being removed after the last recorded week, but this should at least give you the core of one approach.
Cheers!
January 9, 2018 at 5:58 pm
This is pretty close but does not identify the ones that have been removed. I feel like this should be relatively simple but I can't figure out the right kind of logic.
Lynn, appreciate the help, but this does not solve the problem.
Quite challenging even though it feels simple 🙂
Vecerda
January 9, 2018 at 6:00 pm
Jacob, this looks very very promising. I am going to play with it and see if I can adjust it to the real life scenario, but conceptually this looks like the right kind of solution.
Thanks you so very much,
petr
January 9, 2018 at 6:03 pm
vecerda - Tuesday, January 9, 2018 5:58 PMThis is pretty close but does not identify the ones that have been removed. I feel like this should be relatively simple but I can't figure out the right kind of logic.Lynn, appreciate the help, but this does not solve the problem.
Quite challenging even though it feels simple 🙂
Vecerda
Ideally (in my simple mind), the comparison should be between a before-image and after-image. If you could somehow create a before-image and after-image from the same table (like using the snapshotdate), then it shouldn't be hard to identify something that is missing from the after-image.
January 9, 2018 at 7:27 pm
Can you add a table? Our system has audit tables that capture the before/after picture. Basically exact same columns as your actual project table, with Changetype and TimeOfChange columns added (table already has who created/updated and when, so it captures that every time something changes anyway). Changetype is Inserted/changeBefore/changeAfter/Deleted, and triggers update the audit table on all INSERT/UPDATE/DELETE actions. you'll still have to do the comparison to figure out what changed, but it's easy to see it when it's stepped out.
-------------------------------------------------------------------------------------------------------------------------------------
Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses
January 9, 2018 at 9:35 pm
Here's another approach that only requires reading the table once, but it's not quite as straightforward as doing the FULL OUTER JOIN; whether such an approach will work better for your needs will come down to the age-old "It depends..." 🙂 :
WITH CTE AS
(
SELECT
ChangeDate=calc_date
,ProjectName
,ChangeType=CASE WHEN COUNT(*)=2 THEN 'Value Change'
WHEN COUNT(CASE WHEN calc_date>SnapshotDate THEN 1 END)=1 THEN 'Removed'
ELSE 'Added'
END
,ValueChange=CASE WHEN COUNT(*)=2 THEN MAX(CASE WHEN calc_date=SnapshotDate THEN ProjectValue END)-MAX(CASE WHEN calc_date>SnapshotDate THEN ProjectValue END)
WHEN COUNT(CASE WHEN calc_date>SnapshotDate THEN 1 END)=1 THEN -MAX(ProjectValue)
ELSE MAX(ProjectValue)
END
FROM Snapshots
CROSS APPLY
(VALUES(SnapshotDate, ProjectValue),(DATEADD(DAY,7,SnapshotDate),ProjectValue))x(calc_date,calc_value)
GROUP BY ProjectName,calc_date
)
SELECT *
FROM CTE
WHERE ValueChange<>0
ORDER BY ChangeDAte ASC, ProjectName ASC;
Cheers!
January 9, 2018 at 10:54 pm
Using LEAD will allow you to check the value of the next occurrence
WITH cteLead AS (
SELECT [ID], [SnapshotDate], [ProjectName], [ProjectValue]
, [NextVal] = LEAD([ProjectValue]) OVER (PARTITION BY [ProjectName] ORDER BY [SnapshotDate])
, [MaxDate] = MAX([SnapshotDate]) OVER ()
FROM dbo.Snapshots
)
SELECT *
FROM cteLead
WHERE ([ProjectValue] != [NextVal] OR [NextVal] IS NULL)
AND [SnapshotDate] < [MaxDate]
January 9, 2018 at 11:21 pm
vecerda - Tuesday, January 9, 2018 5:58 PMThis is pretty close but does not identify the ones that have been removed. I feel like this should be relatively simple but I can't figure out the right kind of logic.Lynn, appreciate the help, but this does not solve the problem.
Quite challenging even though it feels simple 🙂
Vecerda
What I posted was not a solution. It was something for you to look at and work with to determine if it could be the basis for a solution. It looks to me that others are offering other solutions that are better. Just remember, you have to support what ever solution you find that meets your requirements. No one on this site is responsible for providing you support. My simple recommendation is this; if you don't understand it, if you can't explain how it works to your coworkers or supervisor, then don't use it in a production environment. I have no problem with people using solutions or modifying solutions from the internet, but I do have a problem if the can't support it themselves.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply