June 24, 2022 at 9:55 pm
Hello,
Consider the following table that gives the validity dates of an item :
create table #table1(
DD_S int,
DF_S int,
id_prod int,
is_valid bit
)
insert into #table1
values
(20190101,20190601,10,0),
(20190601,20190901,10,0),
(20190901,20200701,10,0),
(20200701,20211001,10,0),
(20211001,20211101,10,0),
(20211101,20220401,10,1),
(20220401,20291231,10,0),
(20190101,20200201,15,1),
(20200201,20301101,15,0)
Now consider the following table that gives the identifier of a item throughout time:
create table #table2(
DD int,
DF int,
identifier int,
id_prod int
)
insert into #table2
VALUES
(20190101,20211001,5,10),
(20211001,20211101,4,10),
(20211101,20291231,5,10),
(20190101,20190501,6,15),
(20190501,20400101,7,15)
The goal is to 'slice' the first table to associate to each Item , while taking into account the time periods. The result should look like this:
DD_S DF_S id_Prod identifier DD DF IS_Valid
20190101 20190601 10 5 20190101 20211001 0
20190601 20190901 10 5 20190101 20211001 0
20190901 20200701 10 5 20190101 20211001 0
20200701 20211001 10 5 20190101 20211001 0
20211001 20211101 10 4 20211001 20211101 0
20211101 20220401 10 5 20211101 20291231 1
20220401 20291231 10 5 20211101 20291231 0
20190101 20190501 15 6 20190101 20190501 1
20190501 20200201 15 6 20190501 20400101 1
20200201 20301101 15 7 20190501 20400101 0
The idea is Slicing time periods by related time
Any idea or best solution to do that ?
Thanks
June 25, 2022 at 11:57 am
Step1: convert the dates from INT to DATE
In #table1 the date ranges overlap. How to get rid of the overlaps? My code updates t1 (my guess is the ranges are inclusive of the DF_S date)
Next: confirm the basic triage of the rows in #table1. Some rows in t1 are completely covered by a row t2. Some rows are undefined by t2. Some rows need to be split
drop table if exists #table1;
go
create table #table1(
DD_S int,
DF_S int,
id_prod int,
is_valid bit)
insert into #table1
values
(20190101,20190601,10,0),
(20190601,20190901,10,0),
(20190901,20200701,10,0),
(20200701,20211001,10,0),
(20211001,20211101,10,0),
(20211101,20220401,10,1),
(20220401,20291231,10,0),
(20190101,20200201,15,1),
(20200201,20301101,15,0);
drop table if exists #table2;
go
create table #table2(
DD int,
DF int,
identifier int,
id_prod int)
insert into #table2
VALUES
(20190101,20211001,5,10),
(20211001,20211101,4,10),
(20190101,20190501,6,15),
(20190501,20400101,7,15);
/* use sql date type */
drop table if exists #table1_dt;
go
create table #table1_dt(
DD_S date,
DF_S date,
id_prod int,
is_valid bit)
drop table if exists #table2_dt;
go
create table #table2_dt(
DD date,
DF date,
identifier int,
id_prod int)
insert #table1_dt
select datefromparts(DD_S/10000, DD_S/100%100, DD_S%100),
datefromparts(DF_S/10000, DF_S/100%100, DF_S%100),
id_prod, is_valid
from #table1;
insert #table2_dt
select datefromparts(DD/10000, DD/100%100, DD%100),
datefromparts(DF/10000, DF/100%100, DF%100),
identifier, id_prod
from #table2;
/* update t1 to eliminate overlaps per id_prod */
update t
set DD_S=dateadd(month, 1, t.DD_S)
from #table1_dt t
join #table1_dt tt on t.id_prod=tt.id_prod
and t.DD_S=tt.DF_S
/* triage rows in t1 */
with
covered_cte as (
select o.*, t.identifier
from #table1_dt o
join #table2_dt t on o.id_prod=t.id_prod
and o.DD_S between t.DD and t.DF
and o.DF_S between t.DD and t.DF),
undef_cte as (
select o.*, 0 identifier
from #table1_dt o
where not exists (select 1
from covered_cte sc
where sc.id_prod=o.id_prod
and sc.DD_S=o.DD_S
and sc.DF_S=o.DF_S)
and not exists(select 1
from #table2_dt t
where t.id_prod=o.id_prod
and (t.DD between o.DD_S and o.DF_S
or t.DF between o.DD_S and o.DF_S))),
to_be_split_cte as (
select DD_S, DF_S, id_prod, is_valid
from #table1_dt
except
select DD_S, DF_S, id_prod, is_valid
from undef_cte
except
select DD_S, DF_S, id_prod, is_valid
from covered_cte)
select 'covered', * from covered_cte
union all
select 'undef', * from undef_cte
union all
select 'to_be_split', *, 0 from to_be_split_cte
order by 4, 2;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
June 25, 2022 at 9:52 pm
Hi @steve-2 Collins
Thanks for the help.
Your query doesn't resolve my problem i need to cover the date of overlapping for example the id_pord=15 as bellow that i want
20190101 20190501 15 6 20190101 20190501 1
20190501 20200201 15 7 20190501 20400101 1
20200201 20301101 15 7 20190501 20400101 0
This a query that it give me the right result but i want something dynamically i have a lot of identifier ..
with t2 as
(
select DD, DF, identifier, id_prod
from #table2
union all
select max(DF), 99991231, 5, id_prod
from #table2
group by id_prod
),
C as
(
select t1.*, t2.identifier,
iif(t1.DD_S >= t2.DD, t1.DD_S, t2.DD) as DD,
iif(t1.DF_S <= t2.DF, t1.DF_S, t2.DF) as DF
from #table1 t1
inner join t2 on t2.id_prod = t1.id_prod
)
select DD_S, DF_S, id_prod, identifier, DD, DF, is_valid
from C
where DD < DF
order by id_prod, DD
Thanks
June 26, 2022 at 12:07 am
You're saying "split my data" but you've provided 9 rows of input data of which 8 rows do not (seemingly) require splitting. Does my query correctly separate rows which DO NOT need to be split from rows which DO need to be split? Why in the CTE of the query you just posted would it apply identifier=5 to id_prod=15 when in the sample data provided initially it's associated with id_prod=10?
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
June 26, 2022 at 12:25 pm
Sorry i forgot a line in sample data (just updated)
the query that i shared doesn't work perfectly but just i want to giving an idea i am trying to solve that problem .
if you see the output that i want to problem is caused with id_prod , i want to cover the gap between table 1 and table 2
DD_S DF_S id_Prod identifier DD DF IS_Valid
20190101 20190501 15 6 20190101 20190501 1
20190501 20200201 15 7 20190501 20400101 1
20200201 20301101 15 7 20190501 20400101 0
Thanks
June 26, 2022 at 5:17 pm
One of the fastest methods ever created for doing this type of thing is documented in the following article.
https://blogs.solidq.com/en/sqlserver/packing-intervals/
It's an older article so solution #3 can be made work in SS 2017 but have never tested that solution. I HAVE tested solution #2 in the past and it's nasty fast. It's also comparatively simple. I just don't have the time today to write code to prove it. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
June 26, 2022 at 9:49 pm
Hi,
I did the query as bellow i don't know if will perfectly work for another data but with my sample data is work fine just one problem when i do the union the first query doesn't work i don't know why but when i execute the query one by one i have the result.
Can you please tell me why the union doesn't work and if you have any elegant solution it will be great
;with cte as(
SELECT a.[DD_S]
,a.[DF_S]
,a.[id_Prod]
,b.[identifier]
,b.[DD]
, b.[DF]
,LAG([DD_S]) over (partition by a.id_prod order by DD_S) as LagDD_S
,LAG([DF_S]) over (partition by a.id_prod order by DD_S) as LagDF_S
,LEAD([DD_S]) over (partition by a.id_prod order by DD_S) as LeadDD_S
,LEAD([DD]) over (partition by a.id_prod order by DD_S) as LeadDD
FROM table1 a
join table2 b
on a.[id_Prod]=b.[id_Prod]
)
select DF as DD_S,LagDD_S as DF_S,id_prod,identifier from cte
where
1=1
and DF<LagDF_S
and LeadDD_S is null /*Doesn't work but when i execute the query alone i have a result*/
union
select DD_S,DF_S,id_prod,identifier from cte
where ([DD_S]>=[DD] and [DF_S]<=[DF])
union
select DD_S,LeadDD as DF_S,id_prod,identifier from cte
where LagDD_S is null
and ([DD_S]<=[DD] and [DF_S]>=[DF])
order by id_prod,dd_s
Thanks
June 26, 2022 at 11:28 pm
... and if you have any elegant solution it will be great
I posted a link in my previous response as to what I think "the" elegant solution would be. You just have to adapt it to your needs.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 27, 2022 at 12:57 pm
Hi,
I did the query as bellow i don't know if will perfectly work for another data but with my sample data is work fine just one problem when i do the union the first query doesn't work i don't know why but when i execute the query one by one i have the result.
Can you please tell me why the union doesn't work and if you have any elegant solution it will be great
Thanks
The updated code still contains the INTEGER dates. Imo it's mandatory to use SQL dates. Consider the UPDATE statements to make the date ranges unique (DD to DF - wise). Just to add 1 month to an INTEGER date would be a gigantic waste of needless complexity. Imo you should switch ALL dates stored as INTEGER to SQL dates everywhere in the database and never go backwards again
The approach of getting it to work with 1 single case and then expanding the number of inputs seems likely to be frustrating and unlikely to work out well. It's better imo to use fully representative input from the outset. It's a challenging query or so it seems. The article Jeff listed is definitely applicable. On top of the packing issue it appears the 'identifier' value gets "carried through" from the adjacent #table2 range(s)
When I re-run my code with the additional t2 input row it now seems to correctly classify 8/9ths of the #table1 rows. The remaining row needs splitting
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
June 27, 2022 at 2:10 pm
To make more realistic (I guess) I added an additional row to t2 so the split requires 3 rows instead of 2. Also, I added a UNIQUE CLUSTERED INDEX on the DD/DF columns in both t1 and t2 tables.
drop table if exists #table1;
go
create table #table1(
DD_S int,
DF_S int,
id_prod int,
is_valid bit)
insert into #table1
values
(20190101,20190601,10,0),
(20190601,20190901,10,0),
(20190901,20200701,10,0),
(20200701,20211001,10,0),
(20211001,20211101,10,0),
(20211101,20220401,10,1),
(20220401,20291231,10,0),
(20190101,20200201,15,1),
(20200201,20301101,15,0);
drop table if exists #table2;
go
create table #table2(
DD int,
DF int,
identifier int,
id_prod int)
insert into #table2
VALUES
(20190101,20211001,5,10),
(20211001,20211101,4,10),
(20211101,20291231,5,10),
(20190101,20190501,6,15),
(20190501,20190601,7,15), /* added row */ (20190601,20400101,8,15)
--(20190101,20190501,6,15),
--(20190501,20400101,7,15);
/* use sql date type */drop table if exists #table1_dt;
go
create table #table1_dt(
DD_S date,
DF_S date,
id_prod int,
is_valid bit)
drop table if exists #table2_dt;
go
create table #table2_dt(
DD date,
DF date,
identifier int,
id_prod int)
insert #table1_dt
select datefromparts(DD_S/10000, DD_S/100%100, DD_S%100),
datefromparts(DF_S/10000, DF_S/100%100, DF_S%100),
id_prod, is_valid
from #table1;
insert #table2_dt
select datefromparts(DD/10000, DD/100%100, DD%100),
datefromparts(DF/10000, DF/100%100, DF%100),
identifier, id_prod
from #table2;
/* update t1 to eliminate overlaps per id_prod */update t
set DD_S=dateadd(month, 1, t.DD_S)
from #table1_dt t
join #table1_dt tt on t.id_prod=tt.id_prod
and t.DD_S=tt.DF_S;
create unique clustered index ndx_dd_df on #table1_dt(DD_S, DF_S);
/* update t2 to eliminate overlaps per id_prod */update t
set DD=dateadd(month, 1, t.DD)
from #table2_dt t
join #table2_dt tt on t.id_prod=tt.id_prod
and t.DD=tt.DF;
create unique clustered index ndx_dd_df on #table2_dt(DD, DF);
This query expands the non-covered DD_S to DF_S range from t1 and OUTER APPLY's a correlated subquery which expands the DD to DS range from t2.
with
covered_cte as (
select o.*, t.identifier
from #table1_dt o
join #table2_dt t on o.id_prod=t.id_prod
and o.DD_S between t.DD and t.DF
and o.DF_S between t.DD and t.DF),
undef_cte as (
select o.*, 0 identifier
from #table1_dt o
where not exists (select 1
from covered_cte sc
where sc.id_prod=o.id_prod
and sc.DD_S=o.DD_S
and sc.DF_S=o.DF_S)
and not exists(select 1
from #table2_dt t
where t.id_prod=o.id_prod
and (t.DD between o.DD_S and o.DF_S
or t.DF between o.DD_S and o.DF_S))),
to_be_split_cte as (
select DD_S, DF_S, id_prod, is_valid
from #table1_dt
except
select DD_S, DF_S, id_prod, is_valid
from undef_cte
except
select DD_S, DF_S, id_prod, is_valid
from covered_cte)
select 'to_be_split', s.*, 0, calc.dt,
min_range.start_dd min_start_dd, min_range.end_df min_end_df,
max_range.start_dd max_start_dd, max_range.end_df max_end_df,
ott.identifier ott_id
from to_be_split_cte s
/* locate t2 range starting point */ outer apply (select top(1) tt.DD, tt.DF
from #table2_dt tt
where tt.id_prod=s.id_prod
and tt.DF>=s.DD_S
order by tt.DF) min_range(start_dd, end_df)
/* locate t2 range ending point */ outer apply (select top(1) tt.DD, tt.DF
from #table2_dt tt
where tt.id_prod=s.id_prod
and tt.DD<=s.DF_S
order by tt.DF desc) max_range(start_dd, end_df)
/* expand t1 date range */ cross apply dbo.fnTally(0, datediff(month, s.DD_S, s.DF_S)) fn
cross apply (values (dateadd(month, fn.n, s.DD_S))) calc(dt)
/* outer join/apply expanded t2 range(s) */
outer apply (select tt.*, tt_calc.dt
from #table2_dt tt
/* expand t2 date range(s) */
cross apply dbo.fnTally(0, datediff(month, tt.DD, tt.DF)) fn
cross apply (values (dateadd(month, fn.n, tt.DD))) tt_calc(dt)
where tt.id_prod=s.id_prod
and tt_calc.dt=calc.dt) ott;
This query uses GROUP BY to summarize the results from above. UNION ALL with the "covered" rows
with
covered_cte as (
select o.*, t.identifier
from #table1_dt o
join #table2_dt t on o.id_prod=t.id_prod
and o.DD_S between t.DD and t.DF
and o.DF_S between t.DD and t.DF),
undef_cte as (
select o.*, 0 identifier
from #table1_dt o
where not exists (select 1
from covered_cte sc
where sc.id_prod=o.id_prod
and sc.DD_S=o.DD_S
and sc.DF_S=o.DF_S)
and not exists(select 1
from #table2_dt t
where t.id_prod=o.id_prod
and (t.DD between o.DD_S and o.DF_S
or t.DF between o.DD_S and o.DF_S))),
to_be_split_cte as (
select DD_S, DF_S, id_prod, is_valid
from #table1_dt
except
select DD_S, DF_S, id_prod, is_valid
from undef_cte
except
select DD_S, DF_S, id_prod, is_valid
from covered_cte)
select min(calc.dt) DD_S,
max(calc.dt) DF_F,
s.id_prod, s.is_valid, ott.identifier
from to_be_split_cte s
/* locate t2 range starting point */ outer apply (select top(1) tt.DD, tt.DF
from #table2_dt tt
where tt.id_prod=s.id_prod
and tt.DF>=s.DD_S
order by tt.DF) min_range(start_dd, end_df)
/* locate t2 range ending point */ outer apply (select top(1) tt.DD, tt.DF
from #table2_dt tt
where tt.id_prod=s.id_prod
and tt.DD<=s.DF_S
order by tt.DF desc) max_range(start_dd, end_df)
/* expand t1 date range */ cross apply dbo.fnTally(0, datediff(month, s.DD_S, s.DF_S)) fn
cross apply (values (dateadd(month, fn.n, s.DD_S))) calc(dt)
/* outer join/apply expanded t2 range(s) */
outer apply (select tt.*, tt_calc.dt
from #table2_dt tt
/* expand t2 date range(s) */
cross apply dbo.fnTally(0, datediff(month, tt.DD, tt.DF)) fn
cross apply (values (dateadd(month, fn.n, tt.DD))) tt_calc(dt)
where tt.id_prod=s.id_prod
and tt_calc.dt=calc.dt) ott
group by s.id_prod, s.is_valid, ott.identifier
union all
select * from covered_cte
order by id_prod, DD_S;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
June 27, 2022 at 2:25 pm
The function dbo.fnTally can be found and explained here
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Jeff Moden Script on SSC: https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
**********************************************************************************************************************/
(@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
H2(N) AS ( SELECT 1
FROM (VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
)V(N)) --16^2 or 256 rows
, H4(N) AS (SELECT 1 FROM H2 a, H2 b) --16^4 or 65,536 rows
, H8(N) AS (SELECT 1 FROM H4 a, H4 b) --16^8 or 4,294,967,296 rows
SELECT N = 0 WHERE @ZeroOrOne = 0 UNION ALL
SELECT TOP(@MaxN)
N = ROW_NUMBER() OVER (ORDER BY N)
FROM H8
;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
June 27, 2022 at 3:02 pm
Hi,
Just to answer to your question about why i have int date because it's a fact table ,
Thanks for new code. I will test that in my sample data then in my fact table, hope that query doesn't take me a lot of time
Well,i let you know .
Thanks
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply