December 19, 2019 at 2:07 pm
Dear Experts
I am trying to merge two Dimensions (from a DWH) into a new one. I think I've figured it out basically, but there's a weird special case, which my code won't catch right.
Below is an excerpt from the real data. Never mind the illogical city names, I just replaced some company terms.
The "DeptCode" that's causing trouble is 7210 "London"
The timeline I want to build is
2016-07-06 00:00:00 - 2017-01-16 23:59:59 7600 -- notice substraction of 1 day for ValidTo
2017-01-17 00:00:00 - 9999-12-31 00:00:00 7300
Note sure, if that's even possible or the data's too wrecked.
As mentioned above, I can handle it for the other patterns by using this code (perhaps there's a more elegant/efficient way to do it):
Thanks a lot, Roger 🙂
CREATE TABLE [dbo].[TestData](
[struct_ValidFrom] [datetime2](7) NOT NULL,
[struct_ValidTo] [datetime2](7) NOT NULL,
[object_ValidFrom] [datetime2](7) NOT NULL,
[object_ValidTo] [datetime2](7) NOT NULL,
[DeptCode] [int] NOT NULL,
[DeptName] [varchar](80) NULL,
[DeptCodeParent] [int] NULL
) ON [PRIMARY]
GO
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 7312, 'Tokyo', 7310);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 1012, 'Paris', 1010);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '9999-12-31 00:00:00', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 7210, 'London', 7600);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '9999-12-31 00:00:00', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 1012, 'Paris', 1010);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '9999-12-31 00:00:00', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 7312, 'Tokyo', 7310);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '9999-12-31 00:00:00', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 1200, 'Berlin', 720);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 7210, 'London', 7300);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2017-01-16 00:00:00', '2019-01-14 23:59:59', 1130, 'ZĂĽrich', 1100);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2017-01-16 00:00:00', '2019-01-14 23:59:59', 1114, 'Madrid', 1130);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '2019-10-03 23:59:59', '2019-01-15 00:00:00', '9999-12-31 00:00:00', 1130, 'New York', 4600);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '2019-10-03 23:59:59', '2019-01-15 00:00:00', '9999-12-31 00:00:00', 1114, 'Beijing', 1130);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-10-04 00:00:00', '9999-12-31 00:00:00', '2019-01-15 00:00:00', '9999-12-31 00:00:00', 1130, 'New York', 4600);
insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-10-04 00:00:00', '9999-12-31 00:00:00', '2019-01-15 00:00:00', '9999-12-31 00:00:00', 1114, 'Beijing', 1130);
with Departments as
(
-- merge periods
select
struct_ValidFrom as ValidFrom,
struct_ValidTo as ValidTo,
DeptCode,
DeptName,
DeptCodeParent
from dbo.TestData
union
select
object_ValidFrom as ValidFrom,
object_ValidTo as ValidTo,
DeptCode,
DeptName,
DeptCodeParent
from dbo.TestData
),
DepartmentsMerged as
(
-- merge/reduce timelines when a change is found
select
min(ValidFrom) as ValidFrom,
max(ValidTo) as ValidTo,
DeptCode,
DeptName,
DeptCodeParent
from Departments
group by
DeptCode,
DeptName,
DeptCodeParent
)
select *
from DepartmentsMerged
-- problematic DeptCode:
-- where DeptCode = 7210
order by
DeptCode,
ValidFrom
December 19, 2019 at 3:16 pm
Is it intentional to have two DeptCode Parent values for London? That seems to be why you're getting both values back. In your sample data, there isn't a date value that would fall into the question parameters you mentioned, so both come back as 9999-12-31.
Alan H
MCSE - Data Management and Analytics
Senior SQL Server DBA
Best way to ask a question: http://www.sqlservercentral.com/articles/Best+Practices/61537/
December 19, 2019 at 3:23 pm
yep, it's the crap i am receiving :-/
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply