May 8, 2014 at 9:05 am
I'm working with a Daily Input Table and I'm keeping track of Changes in a 'Change Table'.
For example, on Day 1, the Change Table will have no records as it 'awaits' data from the Input Table.
The Input Table sends the following 3 records:
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
11111111 12345680 00001234
11111111 12345678 00002234
Note that Key1 has the same value for all 3 records and that Key3 has 1 value with 2 Key2 values, and a 2nd Key3 Value has 1 Key2 value.
Since these Input Records are New to Change, All 3 Will be Inserted and I need them Flagged as 'Enabled'.
The Change_Table goes from Empty to:
Chg_Key1 Chg_Key2 Chg_Key3 Chg_Action
-------- -------- ------------ ----------
11111111 12345678 00001234 E
11111111 12345680 00001234 E
11111111 12345678 00002234 E
Now to a scenario that I'm getting stuck with... On Day 2, the Input Table no longer provides 2 of the 3 records....
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
The Change_Table needs to 1) Set 11111111 12345678 00002234 to D from E since it is no longer being supplied. 2) Keep the combination of 11111111 and 00001234 Enabled because 1 of the records still Exists in the Input Table. The reason for this is that ultimately, the data pulled from Change will NOT INCLUDE any reference to Key2.
The Extracted Data from Change will only have 1 Record for each Key1/Key3 combination. It 'does not care' if Key1/Key3 has 4 different Key2 values or just 1 Key2 value. As long as 1 Key1/Key3 value is present from Input, Change Needs to reflect an E for eventual ouput. Conversely, as I've pointed out, if Change has multiple Key1/Key3 combinations and Input doesn't provide 1 of them, No Update to Change is the Result. The Action stays 'E'. The only way to change to 'D' in Change is if ALL Key1/Key3 combinations are no longer provided from Input.
Desire Extracted Data from this example:
11111111 00001234 Enabled
11111111 00002234 Disabled
Any help or Ideas? I'm using Nested Queries to Isolate the data that Input is no longer providing and I call that result set a 'candidate' for Disable, but it requires further inspection to ensure that all the records are no longer present in Change. The further inspection part has me stumped.
I'm trying to avoid using Variables and a Cursor as there are 100,000's of records per day in Input, but if that is the easiest route, I will surely do that.
As an aside, I have a 2nd Change Table that is more granular for troubleshooting purposes, and it is NOT used for output. Within that table, 2 of the 3 records would be marked for D, 1 for E, and that is actually quite easy to do, lol!
Thanks for looking!
May 9, 2014 at 10:16 am
Thinking about this a little more, I may need to incorporate a Group By and record the number of records for a given combination, I'm not currently doing that.
May 12, 2014 at 6:24 am
ranmazure (5/8/2014)
I'm working with a Daily Input Table and I'm keeping track of Changes in a 'Change Table'.For example, on Day 1, the Change Table will have no records as it 'awaits' data from the Input Table.
The Input Table sends the following 3 records:
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
11111111 12345680 00001234
11111111 12345678 00002234
Note that Key1 has the same value for all 3 records and that Key3 has 1 value with 2 Key2 values, and a 2nd Key3 Value has 1 Key2 value.
Since these Input Records are New to Change, All 3 Will be Inserted and I need them Flagged as 'Enabled'.
The Change_Table goes from Empty to:
Chg_Key1 Chg_Key2 Chg_Key3 Chg_Action
-------- -------- ------------ ----------
11111111 12345678 00001234 E
11111111 12345680 00001234 E
11111111 12345678 00002234 E
Now to a scenario that I'm getting stuck with... On Day 2, the Input Table no longer provides 2 of the 3 records....
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
The Change_Table needs to 1) Set 11111111 12345678 00002234 to D from E since it is no longer being supplied. 2) Keep the combination of 11111111 and 00001234 Enabled because 1 of the records still Exists in the Input Table. The reason for this is that ultimately, the data pulled from Change will NOT INCLUDE any reference to Key2.
The Extracted Data from Change will only have 1 Record for each Key1/Key3 combination. It 'does not care' if Key1/Key3 has 4 different Key2 values or just 1 Key2 value. As long as 1 Key1/Key3 value is present from Input, Change Needs to reflect an E for eventual ouput. Conversely, as I've pointed out, if Change has multiple Key1/Key3 combinations and Input doesn't provide 1 of them, No Update to Change is the Result. The Action stays 'E'. The only way to change to 'D' in Change is if ALL Key1/Key3 combinations are no longer provided from Input.
Desire Extracted Data from this example:
11111111 00001234 Enabled
11111111 00002234 Disabled
Any help or Ideas? I'm using Nested Queries to Isolate the data that Input is no longer providing and I call that result set a 'candidate' for Disable, but it requires further inspection to ensure that all the records are no longer present in Change. The further inspection part has me stumped.
I'm trying to avoid using Variables and a Cursor as there are 100,000's of records per day in Input, but if that is the easiest route, I will surely do that.
As an aside, I have a 2nd Change Table that is more granular for troubleshooting purposes, and it is NOT used for output. Within that table, 2 of the 3 records would be marked for D, 1 for E, and that is actually quite easy to do, lol!
Thanks for looking!
I would definitely consider a MERGE statement to handle this task. Just to make sure I am understanding this correctly: An input table will be presented daily that will have data in it with Key1, Key2, and Key3. Comparing Key1 and Key3 only:
1. When there is a match, the row will be flagged as enabled.
2. When there is a row in the input that is not in the change table, it will be inserted and flagged as enabled.
3. When there is a row in the change table, but no corresponding row in the input table, it will be flagged as disabled.
Since I cannot see all of the schema, I will put together a basic MERGE statement that does essentially what you are looking for, and you can customize from there:
MERGE INTO change_table AS target
USING
(SELECT key1, key2, key3 FROM input_table) AS source
ON target.key1 = source.key1
AND tarket.key3 = source.key3
WHEN MATCHED AND target.is_enabled = 0 THEN
UPDATE SET is_enabled = 1
WHEN NOT MATCHED BY TARGET THEN
INSERT (key1, key2, key3, is_enabled) VALUES (source.key1, source.key2, source.key3, 1)
WHEN NOT MATCHED BY SOURCE THEN
UPDATE SET is_enabled = 0;
Note that MERGE only works in SQL Server 2008 and later---if you are on an older version, we can consider a different solution.
May 12, 2014 at 12:01 pm
Ed, thanks for the Reply.
I've got the code you provided working with my tables and I'm testing it out :-)!
Thanks, I will update you with my results soon!
May 12, 2014 at 12:03 pm
ranmazure (5/12/2014)
Ed, thanks for the Reply.I've got the code you provided working with my tables and I'm testing it out :-)!
Thanks, I will update you with my results soon!
Good to hear! MERGE was one of the best additions that came with 2008, and I find more uses for it all the time : )
May 12, 2014 at 3:43 pm
Ed, I ran into a snag on the 3rd Step of my testing...
Here is the Merge I'm using:
Merge Change_Table as Target
Using (Select Key1, Key2, Key3 From Input_Table) as Source (Key1, Key2, Key3)
on (Target.Key1 = Source.Key1 And Target.Key3 = Source.Key3)
When Matched And Target.Action_Type = 'D' Then
Update Set Target.Action_Type = 'E', Target.Action_Date = GetDate()
When Not Matched by Target Then
Insert (Key1, Key2, Key3, Action_Type, Action_Date) Values (Source.Key1, Source.Key2, Source.Key3, 'E', GetDate())
When Not Matched by Source Then
Update Set Action_Type = 'D', Action_Date = GetDate();
Step 1: Source Sends the following 3 records:
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
11111111 12345680 00001234
11111111 12345678 00002234
Target Inserts All 3 Records as Enabled:
Chg_Key1 Chg_Key2 Chg_Key3 Chg_Action
-------- -------- ------------ ----------
11111111 12345678 00001234 E
11111111 12345680 00001234 E
11111111 12345678 00002234 E
Step 2: Source Sends NO RECORDS:
Key1 Key2 Key3
-------- -------- ------------
(0) records
Target correctly Updates All 3 Records as Disabled:
Chg_Key1 Chg_Key2 Chg_Key3 Chg_Action
-------- -------- ------------ ----------
11111111 12345678 00001234 D
11111111 12345680 00001234 D
11111111 12345678 00002234 D
Step 3: Source Re-Sends the same 3 records:
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
11111111 12345680 00001234
11111111 12345678 00002234
Target should set All records back to E, but Errors out:
"The MERGE statement attempted to UPDATE or DELETE the same row more than once."
This is a 'Valid' error as record 11111111 (Key1) 00001234 (Key3) does occur twice with different Key2 values.
-- ****************************************************************************
The easy solution would seem to be just disregard Key2, that would alleviate this test case.
However I cannot do that because when I lose Key2 granularity, I would run into this issue:
Step 1: Source Sends the following 2 records:
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
11111111 12345680 00001234
Target Inserts Both Records as Enabled:
Chg_Key1 Chg_Key2 Chg_Key3 Chg_Action
-------- -------- ------------ ----------
11111111 12345678 00001234 E
11111111 12345680 00001234 E
Step 2: Source does not send Key2 value 12345680 for 11111111 (Key1) 00001234 (Key3)
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
Note: I'm going to refer back to having 2 Target Tables, 1 is more granular but will not be used for Extract, I will call that Target_G. Extract is Target_E. So far, we've only been working with Target_E...
Target_E, in a perfect world would take NO ACTION (rather than disabling 1 record via a group by or Select Distinct on just Key1 and Key3) because a least 1 Source combination of Key1 and Key3 Still exists.
Chg_Key1 Chg_Key2 Chg_Key3 Chg_Action
-------- -------- ------------ ----------
11111111 12345678 00001234 E
11111111 12345680 00001234 E
Target_G, in a perfect world would Disable the Source combination of Key1 and Key3 that no longer exists:
Chg_Key1 Chg_Key2 Chg_Key3 Chg_Action
-------- -------- ------------ ----------
11111111 12345678 00001234 E
11111111 12345680 00001234 D
I currently can achieve the Correct Test Case 3 Updates for both Target_E and Target_G using these monster, but the Update for Target_E breaks down on Test Case 5 (which was the original Case that spawned this post):
-- Update Test Case #3 Target table to be used For Extract
Update Change_Table
Set Action_Type = 'E', Action_Date = GetDate()
From Change_Table as Target_E (nolock)
Left Join Input_Table as Source (nolock) on (Target_E.Key1 = Source.Key1) And (Target_E.Key2 = Source.Key2) And (Target_E.Key3 = Source.Key3)
Where Exists
(Select Source.Key1, Source.Key2, Source.Key3, Target_E.Key1, Target_E.Key2, Target_E.Key3, Target_E.Action_Type, Target_E.Action_Date
From Change_Table as Target_E (nolock)
Left Join Input_Table as Source (nolock) on (Target_E.Key1 = Source.Key1) And (Target_E.Key2 = Source.Key2) And (Target_E.Key3 = Source.Key3)
Where Target_E.Action_Type = 'D'
) And (Target_E.Action_Type = 'D')
-- Update Test Case #3 Target table to be used For Granularity and Troubleshooting
Update Change_Table
Set Action_Type = 'E', Action_Date = GetDate()
From Change_Table as Target_G (nolock)
Left Join Input_Table as Source (nolock) on (Target_G.Key1 = Source.Key1) And (Target_G.Key2 = Source.Key2) And (Target_G.Key3 = Source.Key3)
Where Exists
(Select Source.Key1, Source.Key2, Source.Key3, Target_G.Key1, Target_G.Key2, Target_G.Key3, Target_G.Action_Type, Target_G.Action_Date
From Change_Table as Target_G (nolock)
Left Join Input_Table as Source (nolock) on (Target_G.Key1 = Source.Key1) And (Target_G.Key2 = Source.Key2) And (Target_G.Key3 = Source.Key3)
Where (Source.Key1 Is Not Null) And Target_G.Action_Type = 'D'
) And Source.Key1 Is Not Null And (Target_G.Action_Type = 'D')
As for case #5, I have a working prototype for this issue:
"Now to a scenario that I'm getting stuck with... On Day 2, the Input Table no longer provides 2 of the 3 records....
Key1 Key2 Key3
-------- -------- ------------
11111111 12345678 00001234
The Change_Table needs to 1) Set 11111111 12345678 00002234 to D from E since it is no longer being supplied. 2) Keep the combination of 11111111 and 00001234 Enabled because 1 of the records still Exists in the Input Table. The reason for this is that ultimately, the data pulled from Change will NOT INCLUDE any reference to Key2.
The Extracted Data from Change will only have 1 Record for each Key1/Key3 combination. It 'does not care' if Key1/Key3 has 4 different Key2 values or just 1 Key2 value. As long as 1 Key1/Key3 value is present from Input, Change Needs to reflect an E for eventual ouput. Conversely, as I've pointed out, if Change has multiple Key1/Key3 combinations and Input doesn't provide 1 of them, No Update to Change is the Result. The Action stays 'E'. The only way to change to 'D' in Change is if ALL Key1/Key3 combinations are no longer provided from Input.
Desire Extracted Data from this example:
11111111 00001234 Enabled
11111111 00002234 Disabled
Any help or Ideas?"
Thru a series of Subselect Joins and Group Bys, I'm able to isolate the record combination in no longer provided by Source where Target_E still has 1 instance:
Update Change_Table
Set Action_Type = 'D', Action_Date = GetDate()
From
(
Select Target_E__Key1, Target_E__Key3
From
(
Select Key1 as Target_E__Key1, Key3 as Target_E__Key3
From Change_Table
Where Action_Type = 'E'
Group By Key1, Key3
) ss1 Left Join (Select Key1 as Source_Key1, Key3 as Source_Key3
From Input_Table
Group By Key1, Key3
) ss2 On ss1.Target_E__Key1 = ss2.Source_Key1 And ss1.Target_E__Key3 = ss2.Source_Key3
Where Source_Key1 Is Null) ss3 Inner Join Change_Table as Target_E on ss3.Target_E__Key1 = Target_E.Key1 and ss3.Target_E__Key3 = Target_E.Key3
At this point, I thank you if you are still reading and I feel like it might make sense to offer a real world example for what needs to be accomplished.
Cosider Key1 to be a Class at a school, Key2 is the Number that differentiates a Class's 'homeroom' vs any applicable Labs, and Key3 is a Student Number.
Therefore, if a Student is in a Class that has a lab (or 2 or 3 labs), they will have Source records for Key1 that are the same, different Key2 values, and the same Key3 value.
The Extract does NOT CARE about the labs. Grrr, wish it did, but that is out of my control. The extract doesn't even include Key2 (lab) data.
Therefore, if a student is taking a class with a lab(s) and then they leave a lab, they obviously need to still be enabled for the Class. I would like to know that this happened, hence the more granular change table that seems to work just fine, mainly because I can use all 3 keys all the time. I will easily have situations where a given Key1 has multiple records in the granular table that have both an E and a D record based on Key2, and that is fine. The Extract data could not use the Granular table because it wouldn't know which E or D value was valid, there would be no way for it to 'tell'.
Keeping a Student Enabled has proven much more difficult than the opposite case. If a student has all Disabled records in the Target_E, then ANY record in Source Enables all. Obviously in the real world, that 1 record would NOT BE just the Lab, it would be the Class. I have no way of telling that either, but it is an acceptable assumption.
Thanks again!
May 13, 2014 at 8:42 am
That was certainly a use case that we did not take into account before. The trouble was that there were non-unique combinations of Key1 and Key3 (even though Key1+Key2+Key3 is unique). I'm going to switch away from the MERGE as this would be done most simply by separate statements rather than trying to cram it all into one. We can adjust how each statement checks for rows (whether it cares about Key2 or not), and that will make this a lot easier to digest.
From your description, it doesn't matter what Key2 is---if Key1 and Key3 are present at the input, then the whole record should be inserted or all records with that Key1/Key3 combination should be enabled. I'll start from scratch again and see if this gets us where we want to go here, including all steps along the way:
-- Create tables for example
CREATE TABLE [dbo].[Input_Table](
Key1 int,
Key2 int,
Key3 int
)
GO
CREATE TABLE [dbo].[Change_Table](
Key1 int,
Key2 int,
Key3 int,
Action_Type varchar(1),
Action_Date datetime
)
GO
----- This is the SQL that handles input data -----
CREATE PROCEDURE Process_Data AS
BEGIN
-- Update existing rows to be enabled, where not already enabled
UPDATE CHANGE_TABLE
set Action_Type = 'E',
Action_Date = GETDATE()
FROM Change_Table CHANGE_TABLE
INNER JOIN Input_Table INPUT_TABLE
ON CHANGE_TABLE.Key1 = INPUT_TABLE.Key1
AND CHANGE_TABLE.Key3 = INPUT_TABLE.Key3
-- Update existing rows to be NULL, where not in the input table
UPDATE CHANGE_TABLE
set Action_Type = 'D',
Action_Date = GETDATE()
FROM Change_Table CHANGE_TABLE
LEFT JOIN Input_Table INPUT_TABLE
ON CHANGE_TABLE.Key1 = INPUT_TABLE.Key1
AND CHANGE_TABLE.Key3 = INPUT_TABLE.Key3
WHERE INPUT_TABLE.Key1 IS NULL
-- Insert new rows from the input table
INSERT INTO Change_Table
(Key1, Key2, Key3, Action_Type, Action_Date)
SELECT
INPUT_TABLE.Key1,
INPUT_TABLE.Key2,
INPUT_TABLE.Key3,
'E' AS Action_Type,
GETDATE() AS Action_Date
FROM Input_Table INPUT_TABLE
WHERE NOT EXISTS
(SELECT * FROM Change_Table CHANGE_TABLE WHERE CHANGE_TABLE.Key1 = INPUT_TABLE.Key1 AND CHANGE_TABLE.Key2 = INPUT_TABLE.Key2 AND CHANGE_TABLE.Key3 = INPUT_TABLE.Key3)
END
GO
-- STEP 1: Populate Input_Table with initial data
INSERT INTO Input_Table
VALUES
(11111111, 12345678, 00001234),
(11111111, 12345680, 00001234),
(11111111, 12345678, 00002234)
EXEC Process_Data
SELECT * FROM Change_Table -- Verify that all new rows are there
-- STEP 2: Remove all data from input table to simulate empty input
DELETE FROM Input_Table
EXEC Process_Data
SELECT * FROM Change_Table -- Verify that all new rows are deactivated
-- STEP 3: Populate Input_Table with new data (same as in the first step)
INSERT INTO Input_Table
VALUES
(11111111, 12345678, 00001234),
(11111111, 12345680, 00001234),
(11111111, 12345678, 00002234)
EXEC Process_Data
SELECT * FROM Change_Table -- Verify that existing rows are all activated
I tried to keep this as simple as possible, breaking the necessary processing into 3 statements and putting them into a stored proc. Let me know if this is closer to what you are looking for.
May 13, 2014 at 8:31 pm
Thanks for the Reply Ed.
I too use 3 'Tasks'. I'm not sure if the Order of the Tasks is ultimately important, but if it is, I have chosen to Insert New Records that don't already exist in Change, First. In Task #2, I 'Re-Enable' record combinations that are being provided from Source 'again'. In Task #3, I Disable records no longer being provided. All work for both the Extract and the Granular.
I've been testing my 'prototype' (Task #3) all day, and it appears to work (11 Test Cases!!).
Although they look similar to some I have tried recently, I don't believe the queries you provided will work as desired in all cases, but I don't know the specific examples that will fail. You've been so gracious with your time that I'm going to do my best to carve out time tomorrow to find specific examples that may not work as well as providing what I'm currently using, again, similar.
Cheers!
May 14, 2014 at 5:24 am
Glad to hear everything is working! The order does matter somewhat, but I don't believe it will affect the results. Ie if you insert first, then a future update will enable those new rows as they are now existing matches. This won't create any bad data, but it would cause some extra writes, that's all.
May 14, 2014 at 1:42 pm
Ed, your Queries have passed my Test! They are so much simpler too.
I have to incorporate the same logic with an Instructor Field (actually there are 3 Instructor Fields possible per Key1 and Key2, but I think I will do a Pivot so that it mimics an enrollment cobmination of Key1 and Key2 with 3 different Students.) Sounds easy, lol.
More testing, I know have 2 working versions but yours is better!
Thanks again, I will be in touch.
May 15, 2014 at 3:38 pm
Everything is still working good, I've re-arranged my Task order to match yours as well.
One question, on our Insert Statement... The Select Portion...
SELECT
INPUT_TABLE.Key1,
INPUT_TABLE.Key2,
INPUT_TABLE.Key3,
'E' AS Action_Type,
GETDATE() AS Action_Date
FROM Input_Table INPUT_TABLE
WHERE NOT EXISTS
(SELECT * FROM Change_Table CHANGE_TABLE WHERE CHANGE_TABLE.Key1 = INPUT_TABLE.Key1 AND CHANGE_TABLE.Key2 = INPUT_TABLE.Key2 AND CHANGE_TABLE.Key3 = INPUT_TABLE.Key3)
I wouldn't normally write a where not exists quite like this, so I re-wrote what I thought would be the equivalent but it doesn't always work, so I'm scrapping it of course.
Is there a 'different way' this could have been written involving an IJ or LJ?
I don't quite understand how the Query can select fields from Input_Table that did't even have equivalent fields, but is that why you 'had' to supply: 'E' AS Action_Type,
GETDATE() AS Action_Date, using the exact same Aliases as Change?
If so, this is pretty slick!
May 15, 2014 at 7:39 pm
ranmazure (5/15/2014)
Everything is still working good, I've re-arranged my Task order to match yours as well.One question, on our Insert Statement... The Select Portion...
SELECT
INPUT_TABLE.Key1,
INPUT_TABLE.Key2,
INPUT_TABLE.Key3,
'E' AS Action_Type,
GETDATE() AS Action_Date
FROM Input_Table INPUT_TABLE
WHERE NOT EXISTS
(SELECT * FROM Change_Table CHANGE_TABLE WHERE CHANGE_TABLE.Key1 = INPUT_TABLE.Key1 AND CHANGE_TABLE.Key2 = INPUT_TABLE.Key2 AND CHANGE_TABLE.Key3 = INPUT_TABLE.Key3)
I wouldn't normally write a where not exists quite like this, so I re-wrote what I thought would be the equivalent but it doesn't always work, so I'm scrapping it of course.
Is there a 'different way' this could have been written involving an IJ or LJ?
I don't quite understand how the Query can select fields from Input_Table that did't even have equivalent fields, but is that why you 'had' to supply: 'E' AS Action_Type,
GETDATE() AS Action_Date, using the exact same Aliases as Change?
If so, this is pretty slick!
Well, for the aliases---they are completely arbitrary:
'E' AS Action_Type,
GETDATE() AS Action_Date
I could have called them anything and it would still have worked. I just chose matching aliases with the column names so that the SQL made a bit more sense. 'E' and GETDATE() are constants, so they are being inserted independently of the tables that are selected from below. I could have alos used no aliases for these two constants, and that would also be OK by SQL Server. This is just coding style and is up to the author.
Back to your first question:
You can rewrite any IF NOT EXISTS query as a LEFT JOIN that will look something like this:
SELECT
INPUT_TABLE.Key1,
INPUT_TABLE.Key2,
INPUT_TABLE.Key3,
'E' AS Action_Type,
GETDATE() AS Action_Date
FROM Input_Table INPUT_TABLE
LEFT JOIN Change_Table CT
ON CT.Key1 = INPUT_TABLE.Key1
AND CT.Key2 = INPUT_TABLE.Key2
AND CT.Key3 = INPUT_TABLE.Key3
WHERE CT.Key1 IS NULL
This SQL will perform a left join on all of our three key columns. If there's a match then our results will not be NULL, otherwise they will and it is OK to insert. I personally prefer a NOT EXISTS as it seems (to me at least) to sound & look more like what it is trying to do: ie, check and see if a row exists before inserting. Performance-wise they will typically be similar---but for larger queries, they might not be - so it's worth testing if there are ever performance concerns.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply