April 27, 2016 at 2:39 am
Hi ,
There is data from upstream of different buses usage with StartDateKey and EndDateKey . There is need to condense data as shown in scenario 1 and 2 .
In scenario 1 , Datekey sequence is followed , where EndDatekey of current Row is previous day of StartDateKey in next row of same bus .
In Scenario 2 , DateKey sequence is followed but there is gap of n days (n>1) (Highlighted in Colors)
Following Query is working fine for scenario 1 .
DROP table #temp
CREATE table #temp (SysKey int identity(1,1) ,BusName varchar(9) ,BusType varchar(6) ,StartDateKey int ,EndDatekey int )
Insert into #temp (BusName,BusType,StartDateKey,EndDatekey)
Select 'A' ,'AC',20130101,20130112 union ALL
Select 'A' ,'AC', 20130113 ,20130115 union ALL
Select 'A' ,'AC', 20130116 ,20130118 Union ALL
Select 'B' , 'NON-AC', 20130119 ,20130121 union ALL
Select 'B' , 'NON-AC', 20130122 ,20130124 Union ALL
Select 'A' ,'AC', 20130125, NULL
;WITH SequencedData AS (
SELECT
SysKey, BusName, BusType, StartDateKey, EndDatekey,
seq = SysKey - ROW_NUMBER() OVER(PARTITION BY BusName ORDER BY StartDateKey)
FROM #temp
)
SELECT
SysKey = MAX(SysKey),
BusName, BusType,
StartDate = MIN(StartDateKey),
EndDate = MAX(EndDatekey)
FROM SequencedData
GROUP BY BusName, BusType, seq
ORDER BY MAX(SysKey)
How to tweak above query to handle both scenario 1 and scenario 2
Thanks
Surya Sunil
April 27, 2016 at 7:44 am
do a web search for: itzik ben-gan gaps and islands
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 27, 2016 at 7:45 am
do you need SysKey?
I'm testing out this scenario where the later date is earlier in the insert and syskey for the later date will have a lower value.
The max(syskey) requirement is making me spin in my chair.
Can only seem to make something close with a recursive CTE.
CREATE table #temp2 (SysKey int identity(1,1) ,BusName varchar(9) ,BusType varchar(6) ,StartDateKey Datetime ,EndDatekey Datetime)
SET DATEFORMAT YMD
Insert into #temp2 (BusName,BusType,StartDateKey,EndDatekey)
Select 'A' ,'AC','20130108','20130109' union ALL
Select 'A' ,'AC','20130101','20130105' union ALL
Select 'A' ,'AC','20130106','20130107' union ALL
Select 'A' ,'AC', '20130117' ,'20130118' Union ALL
Select 'A' ,'AC', '20130113' ,'20130115' union ALL
Select 'B' , 'NON-AC', '20130119' ,'20130121' union ALL
Select 'B' , 'NON-AC', '20130122' ,'20130124' Union ALL
Select 'A' ,'AC', '20130125', NULL
April 27, 2016 at 7:46 am
is the sample data you have provided already been processed to create a "pivot/crosstab"?
why are you storing dates as integer?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 27, 2016 at 11:54 am
maybe.....
EDIT ::: maybe not.......further testing required me thinks !
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 27, 2016 at 10:33 pm
Hi ,
Thanks for looking into post .
Dates are integer in the upstream , which I need to consolidate for Datawarehouse purpose .
April 27, 2016 at 10:41 pm
Hi ,
MAX(Syskey) is not required . I made it , when i was doing hit and trail .
Thanks
Surya Sunil
April 28, 2016 at 2:43 am
sunil.mvs (4/27/2016)
Hi ,MAX(Syskey) is not required . I made it , when i was doing hit and trail .
Thanks
Surya Sunil
Tried this, seems to work, but I made all the dates into datetime.
If you want it to work with int, you may need to hack this to make it convert on the fly since using an int, it wont work when going from one month into another month cos it does a data add of one day and joins.
DROP table #temp
CREATE table #temp (SysKey int identity(1,1) ,BusName varchar(9) ,BusType varchar(6) ,StartDateKey Datetime ,EndDatekey Datetime)
SET DATEFORMAT YMD
Insert into #temp (BusName,BusType,StartDateKey,EndDatekey)
Select 'A' ,'AC','20130101','20130112' union ALL
Select 'A' ,'AC', '20130113' ,'20130115' union ALL
Select 'A' ,'AC', '20130117' ,'20130118' Union ALL
Select 'B' , 'NON-AC', '20130119' ,'20130121' union ALL
Select 'B' , 'NON-AC', '20130122' ,'20130124' Union ALL
Select 'A' ,'AC', '20130125', NULL
;WITH recCTE AS (
SELECT BusName,BusType,StartDateKey,EndDatekey,
Convert(varchar(9), '') BusName2, Convert(varchar(6), '') BusType2, CONVERT(DateTime, Null) StartDateKey2, CONVERT(DateTime, Null) EndDatekey2,
0 as [Level], CONVERT(DateTime, StartDateKey) startd, CONVERT(DateTime, EndDatekey) endd --, CONVERT(DateTime, StartDateKey) minDate
FROM #temp
UNION ALL
SELECT
A.BusName, A.BusType,A.StartDateKey,A.EndDatekey,
B.BusName, B.BusType, B.StartDateKey, B.EndDatekey,
[Level] + 1, ISNULL(B.StartDateKey2,B.StartDateKey), A.EndDatekey --,CASE WHEN B.startd < minDate THEN B.startd ELSE A.StartDateKey END
FROM #temp AS A
INNER JOIN recCTE AS B ON A.StartDateKey = B.EndDatekey + 1 AND A.Busname = B.busName AND a.bustype = b.bustype
)
select BusName, BusType,startd,endd from (
SELECT startd,endd,BusName, BusType,RANK() OVER (PARTITION BY startd, BusName, BusType ORDER BY Level DESC) rid1,
RANK() OVER (PARTITION BY endD, BusName, BusType ORDER BY Level DESC) RID2
FROM recCTE AA )window
Where RID1*RID2=1
order by Startd,Busname,BusType
April 28, 2016 at 4:39 am
would you ever have a situation where enddate of one row is the same as startdate of another row?
eg
DROP table #temp
CREATE table #temp (SysKey int identity(1,1) ,BusName varchar(9) ,BusType varchar(6) ,StartDateKey Datetime ,EndDatekey Datetime)
SET DATEFORMAT YMD
Insert into #temp (BusName,BusType,StartDateKey,EndDatekey)
Select 'A' ,'AC','20130101','20130105' union ALL
Select 'A' ,'AC', '20130105' ,'20130110' union ALL
Select 'A' ,'AC', '20130110' ,'20130115' union ALL
Select 'A' ,'AC', '20130117' ,'20130118' Union ALL
Select 'B' , 'NON-AC', '20130119' ,'20130121' union ALL
Select 'B' , 'NON-AC', '20130122' ,'20130124' Union ALL
Select 'A' ,'AC', '20130125', NULL
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 28, 2016 at 6:21 am
MadAdmin (4/28/2016)
sunil.mvs (4/27/2016)
Hi ,MAX(Syskey) is not required . I made it , when i was doing hit and trail .
Thanks
Surya Sunil
Tried this, seems to work, but I made all the dates into datetime.
If you want it to work with int, you may need to hack this to make it convert on the fly since using an int, it wont work when going from one month into another month cos it does a data add of one day and joins.
DROP table #temp
CREATE table #temp (SysKey int identity(1,1) ,BusName varchar(9) ,BusType varchar(6) ,StartDateKey Datetime ,EndDatekey Datetime)
SET DATEFORMAT YMD
Insert into #temp (BusName,BusType,StartDateKey,EndDatekey)
Select 'A' ,'AC','20130101','20130112' union ALL
Select 'A' ,'AC', '20130113' ,'20130115' union ALL
Select 'A' ,'AC', '20130117' ,'20130118' Union ALL
Select 'B' , 'NON-AC', '20130119' ,'20130121' union ALL
Select 'B' , 'NON-AC', '20130122' ,'20130124' Union ALL
Select 'A' ,'AC', '20130125', NULL
;WITH recCTE AS (
SELECT BusName,BusType,StartDateKey,EndDatekey,
Convert(varchar(9), '') BusName2, Convert(varchar(6), '') BusType2, CONVERT(DateTime, Null) StartDateKey2, CONVERT(DateTime, Null) EndDatekey2,
0 as [Level], CONVERT(DateTime, StartDateKey) startd, CONVERT(DateTime, EndDatekey) endd --, CONVERT(DateTime, StartDateKey) minDate
FROM #temp
UNION ALL
SELECT
A.BusName, A.BusType,A.StartDateKey,A.EndDatekey,
B.BusName, B.BusType, B.StartDateKey, B.EndDatekey,
[Level] + 1, ISNULL(B.StartDateKey2,B.StartDateKey), A.EndDatekey --,CASE WHEN B.startd < minDate THEN B.startd ELSE A.StartDateKey END
FROM #temp AS A
INNER JOIN recCTE AS B ON A.StartDateKey = B.EndDatekey + 1 AND A.Busname = B.busName AND a.bustype = b.bustype
)
select BusName, BusType,startd,endd from (
SELECT startd,endd,BusName, BusType,RANK() OVER (PARTITION BY startd, BusName, BusType ORDER BY Level DESC) rid1,
RANK() OVER (PARTITION BY endD, BusName, BusType ORDER BY Level DESC) RID2
FROM recCTE AA )window
Where RID1*RID2=1
order by Startd,Busname,BusType
not getting the expected results for the following
Insert into #temp (BusName,BusType,StartDateKey,EndDatekey)
Select 'A' ,'AC', '20130401', '20130402' union ALL
Select 'A' ,'AC', '20130403' ,'20130404' union ALL
Select 'A' ,'AC', '20130405' ,'20130406' union ALL
Select 'A' ,'AC', '20130407' ,'20130408' Union ALL
Select 'A' ,'AC', '20130409' ,'20130410' Union ALL
Select 'A' ,'AC', '20130411' ,'20130430' ;
+-----------------------------------------------------------------------+
¦ BusName ¦ BusType ¦ startd ¦ endd ¦
¦---------+---------+-------------------------+-------------------------¦
¦ A ¦ AC ¦ 2013-04-01 00:00:00.000 ¦ 2013-04-06 00:00:00.000 ¦
¦ A ¦ AC ¦ 2013-04-03 00:00:00.000 ¦ 2013-04-08 00:00:00.000 ¦
¦ A ¦ AC ¦ 2013-04-05 00:00:00.000 ¦ 2013-04-10 00:00:00.000 ¦
¦ A ¦ AC ¦ 2013-04-07 00:00:00.000 ¦ 2013-04-30 00:00:00.000 ¦
+-----------------------------------------------------------------------+
--EDIT : assume it should be like this
+-----------------------------------------------------------------------+
¦ BusName ¦ BusType ¦ sdate ¦ edate ¦
¦---------+---------+-------------------------+-------------------------¦
¦ A ¦ AC ¦ 2013-04-01 00:00:00.000 ¦ 2013-04-30 00:00:00.000 ¦
+-----------------------------------------------------------------------+
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 28, 2016 at 2:07 pm
J Livingston SQL (4/28/2016)
not getting the expected results for the following
[/code]
Hi
Yes seems to not work in this case, strangely. I am guessing cos the CTE is making multiple roots to the trees in the joins.
Is there any other non recrusive CTE method you can think of to solve this or maybe add an easy change onto the query?
I did this purely to avoid using a cursor .(and hopefully the mention of a cursor could make other readers look into solving this type of problem as well.)
It is a fun problem.
April 28, 2016 at 2:17 pm
maybe.....
CREATE table #temp (SysKey int identity(1,1) ,BusName varchar(9) ,BusType varchar(6) ,StartDateKey Datetime ,EndDatekey Datetime)
Insert into #temp (BusName,BusType,StartDateKey,EndDatekey)
Select 'A' ,'AC', '20130401', '20130402' union ALL
Select 'A' ,'AC', '20130403' ,'20130404' union ALL
Select 'A' ,'AC', '20130405' ,'20130406' union ALL
Select 'A' ,'AC', '20130407' ,'20130408' Union ALL
Select 'A' ,'AC', '20130409' ,'20130410' Union ALL
Select 'A' ,'AC', '20130411' ,'20130430' Union ALL
Select 'B' ,'BC', '20130401', '20130401' union ALL
Select 'B' ,'BC', '20130401' ,'20130404' union ALL
Select 'B' ,'BC', '20130404' ,'20130406' union ALL
Select 'B' ,'BC', '20130410' ,'20130420' ;
WITH cte as (
SELECT BusName,
BusType,
StartDateKey,
EndDatekey,
CASE
WHEN DATEDIFF(day, LAG(enddatekey, 1, StartDateKey) OVER(PARTITION BY BusName, BusType ORDER BY Startdatekey ), StartDateKey) < 2
THEN NULL
ELSE ROW_NUMBER() OVER(PARTITION BY BusName, BusType ORDER BY Startdatekey )
END sortkey,
ROW_NUMBER() OVER(PARTITION BY BusName, BusType ORDER BY Startdatekey ) rn
FROM #temp
)
,
cte2 as (
SELECT BusName,
BusType,
StartDateKey,
EndDatekey,
CAST(
SUBSTRING(
MAX( CAST(rn AS BINARY(4)) + CAST(sortkey AS BINARY(4)) )
OVER( PARTITION BY BusName, BusType ORDER BY rn ROWS UNBOUNDED PRECEDING ),
5, 4)
AS INT) AS lastval
FROM cte
)
SELECT BusName,
BusType,
MIN(StartDateKey) AS sdate,
MAX(EndDatekey) AS edate
FROM cte2
GROUP BY BusName, BusType, lastval
ORDER BY BusName, BusType, sdate
DROP TABLE #temp
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 29, 2016 at 12:02 pm
Here's another way, starting with the dates as INT as in the OP's situation.
Sample Data:
CREATE table #temp (SysKey int identity(1,1) ,BusName varchar(9) ,BusType varchar(6) ,StartDateKey int ,EndDatekey int)
Insert into #temp (BusName,BusType,StartDateKey,EndDatekey)
Select 'A' ,'AC', 20130401, 20130402 union ALL
Select 'A' ,'AC', 20130403 ,20130404 union ALL
Select 'A' ,'AC', 20130405, 20130406 union ALL
Select 'A' ,'AC', 20130407 ,20130408 Union ALL
Select 'A' ,'AC', 20130409 ,20130410 Union ALL
Select 'A' ,'AC', 20130411 ,20130430 Union ALL
Select 'B' ,'BC', 20130401, 20130401 union ALL
Select 'B' ,'BC', 20130401 ,20130404 union ALL
Select 'B' ,'BC', 20130404 ,20130406 union ALL
Select 'B' ,'BC', 20130410 ,20130420 ;
Solution:
WITH convert_to_date AS
(
SELECT BusName,
BusType,
StartDateKey=CAST(CAST(StartDateKey AS CHAR(8)) AS DATE),
EndDateKey=CAST(CAST(EndDateKey AS CHAR(8)) AS DATE)
FROM #temp
),
covered_dates AS
(
SELECT *
FROM convert_to_date
CROSS APPLY
(
SELECT TOP(DATEDIFF(dd,StartDateKey,EndDateKey)+1)
covered_date=DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,StartDateKey)
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n1(n) --An on-the-fly numbers table
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n2(n)
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n3(n)
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n4(n)
)x
),
distinct_numbered_covered_dates AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY BusName ORDER BY covered_Date ASC)
FROM (
SELECT DISTINCT BusName,BusType,covered_date
FROM covered_dates
)x
)
SELECT BusName,
BusType,
StartDateKey=MIN(covered_date),
EndDateKey=MAX(covered_date)
FROM distinct_numbered_covered_dates
GROUP BY BusName,BusType,DATEADD(dd,-RN,covered_date);
The approach is this:
1) Convert integers to dates
2) For each row in the data, use CROSS APPLY to generate a list of the dates covered by that interval
3) Since some ranges might overlap and cover the same dates, get distinct combinations of bus and covered dates
4) Number the covered dates in ascending order, partitioned by bus
5) Islands of consecutive dates will have the feature that subtracting ROW_NUMBER days from the covered_date will result in the same date, so just get MIN and MAX grouped by that expression.
Cheers!
April 29, 2016 at 12:41 pm
Hi Jacob.....
on a larger data set I am getting
A TOP or FETCH clause contains an invalid value.
any ideas please?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 29, 2016 at 12:49 pm
J Livingston SQL (4/29/2016)
Hi Jacob.....on a larger data set I am getting
A TOP or FETCH clause contains an invalid value.
any ideas please?
Ah, most likely you have a NULL for the start or end date. Just have to modify to handle NULLs appropriately.
Cheers!
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply