January 26, 2018 at 8:48 am
Hello Experts,
I am trying to merge records with the same contents of different periods.
drop table if exists timeSeries;
create table timeSeries (
businessKey int,
startDate date,
endDate date,
value1 char(1),
value2 char(2)
)
;
insert into timeSeries
(businessKey, startDate, endDate, value1, value2)
values
(100, '2017-01-01', '2017-01-10', 'A', '0')
;
insert into timeSeries
(businessKey, startDate, endDate, value1, value2)
values
(100, '2017-01-11', '2017-01-20', 'A', '0')
;
insert into timeSeries
(businessKey, startDate, endDate, value1, value2)
values
(100, '2017-01-21', '2017-01-31', 'A', '0')
;
insert into timeSeries
(businessKey, startDate, endDate, value1, value2)
values
(100, '2017-02-01', '2017-02-28', 'B', '0')
;
insert into timeSeries
(businessKey, startDate, endDate, value1, value2)
values
(100, '2017-03-01', '2017-03-31', 'B', '0')
;
insert into timeSeries
(businessKey, startDate, endDate, value1, value2)
values
(100, '2017-04-01', '2017-04-30', 'B', '1')
;
select *
from timeSeries
order by
businessKey,
startDate
The sample table set up by the above code, contains one business object with the key 100 and its history in the form of start/end periods.
As you can see, there are 6 "slices" but only 3 actual changes in the data.
I need to merge records with the same data to one period, because my real data may contain hundreds of slices/records with the same data, which inflates my tables.
Expected result:
100 2017-01-01 2017-01-31 A 0
100 2017-02-01 2017-03-31 B 0
100 2017-04-01 2017-04-30 B 1
By "changes in the data" I mean a new value in any of the payload-attributes, which is "vaue1" or "value2" in the sample.
Thanks,
Roger
January 26, 2018 at 9:47 am
This looks like a classic Island and Gaps question.
There are plenty of solutions out there, this is just one example:
WITH Groups AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY value1 ORDER BY startDate) -
ROW_NUMBER() OVER (PARTITION BY value1, value2 ORDER BY startDate) AS Grp
FROM timeSeries tS)
SELECT businessKey,
MIN(startDate) AS startDate,
MAX(endDate) AS endDate,
value1,
value2
FROM Groups
GROUP BY businessKey,
value1, value2,
Grp
ORDER BY value1, startDate;
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
January 26, 2018 at 9:52 am
INSERT INTO #timeSeries (businessKey, startDate, endDate, value1, value2)
VALUES
(100, '2017-01-01', '2017-01-10', 'A', '0'),
(100, '2017-01-11', '2017-01-20', 'A', '0'),
(100, '2017-01-21', '2017-01-31', 'A', '0'),
(100, '2017-02-01', '2017-02-28', 'B', '0'),
(100, '2017-03-01', '2017-03-31', 'B', '0'),
(100, '2017-04-01', '2017-04-30', 'B', '1'),
(100, '2017-05-01', '2017-05-10', 'A', '0'),
(100, '2017-05-11', '2017-05-20', 'A', '0');
END;
WITH OrderedData AS (
SELECT *,
rn1 = ROW_NUMBER() OVER (PARTITION BY businessKey ORDER BY startDate),
rn2 = ROW_NUMBER() OVER (PARTITION BY businessKey, value1, value2 ORDER BY startDate)
FROM #timeSeries
)
SELECT businessKey, value1, value2, startDate = MIN(startDate), endDate = MAX(endDate)
FROM OrderedData
GROUP BY businessKey, value1, value2, rn1-rn2
ORDER BY businessKey, startDate
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply