April 29, 2016 at 1:09 pm
Jacob Wilkins (4/29/2016)
J Livingston SQL (4/29/2016)
Hi Jacob.....on a larger data set I am getting
A TOP or FETCH clause contains an invalid value.
any ideas please?
Ah, most likely you have a NULL for the start or end date. Just have to modify to handle NULLs appropriately.
Cheers!
bugger..test data script had NULLs in...thanks muchly.
moving on for testing purposes....am I correct in thinking if my test data has "date/time" and not integers...I can use your code as follows:
---ignore previous conversion
WITH
covered_dates AS
(
SELECT *
FROM tempdata
CROSS APPLY
(
SELECT TOP(DATEDIFF(dd,StartDateKey,EndDateKey)+1)
covered_date=DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,StartDateKey)
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n1(n) --An on-the-fly numbers table
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n2(n)
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n3(n)
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n4(n)
)x
),
distinct_numbered_covered_dates AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY BusName ORDER BY covered_Date ASC)
FROM (
SELECT DISTINCT BusName,BusType,covered_date
FROM covered_dates
)x
)
SELECT BusName,
BusType,
StartDateKey=MIN(covered_date),
EndDateKey=MAX(covered_date)
FROM distinct_numbered_covered_dates
GROUP BY BusName,BusType,DATEADD(dd,-RN,covered_date)
ORDER BY BusName,
BusType,
StartDateKey
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 29, 2016 at 1:19 pm
my test script......responses appreciated !
-- create some test data
USE tempdb
GO
IF OBJECT_ID('tempdb..tempdata', 'U') IS NOT NULL
DROP TABLE tempdb..tempdata ;
WITH ctetrans as (
SELECT TOP 5000000
BusName = CHAR(Abs(Checksum(Newid())) % 26 + 65),
BusType = CHAR(Abs(Checksum(Newid())) % 10 + 65)
+ CHAR(Abs(Checksum(Newid())) % 10 + 65),
TransDate = Dateadd(dd, Abs(Checksum(Newid())) % Datediff(dd, '2012', '2016'), '2012')
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CROSS JOIN sys.all_columns ac3
)
,
ctetempdata as (
SELECT BusName,
BusType,
TransDate AS StartDateKey,
lead(transdate, 1, transdate ) OVER(PARTITION BY BusName, BusType ORDER BY TransDate) AS EndDatekey,
ROW_NUMBER() OVER(PARTITION BY BusName, BusType ORDER BY TransDate) % 2 rnselect
FROM ctetrans
)
SELECT BusName,
BusType,
StartDateKey,
EndDatekey
INTO tempdata
FROM ctetempdata
WHERE rnselect = 0
-- see some testdata
SELECT COUNT(*) FROM tempdata;
SELECT BusName, BusType, StartDateKey, EndDatekey
FROM tempdata
WHERE (StartDateKey IS NULL) OR (EndDatekey IS NULL)
SELECT BusName,
BusType,
StartDateKey,
EndDatekey
FROM tempdata
WHERE(BusName = 'A') AND (BusType = 'AA')
ORDER BY BusName, BusType, StartDateKey;
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 29, 2016 at 1:21 pm
If the time matters (say, you have multiple rows per day), then I don't think this approach will work well.
If you only care about date, then it works pretty well, but it does rely on only the date mattering.
Cheers!
April 29, 2016 at 2:52 pm
Jacob...dont seem to get the results expected from your code in comparison to mine.....have I misread your code:
-- create some test data
USE tempdb
GO
IF OBJECT_ID('tempdb..tempdata', 'U') IS NOT NULL
DROP TABLE tempdb..tempdata ;
WITH ctetrans as (
SELECT TOP 500
BusName = CHAR(Abs(Checksum(Newid())) % 2 + 65),
BusType = CHAR(Abs(Checksum(Newid())) % 10 + 65)
+ CHAR(Abs(Checksum(Newid())) % 10 + 65),
TransDate = CAST(Dateadd(dd, Abs(Checksum(Newid())) % Datediff(dd, '2012', '2013'), '2012') AS DATE)
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CROSS JOIN sys.all_columns ac3
)
,
ctetempdata as (
SELECT BusName,
BusType,
TransDate AS StartDateKey,
lead(transdate, 1, TransDate) OVER(PARTITION BY BusName, BusType ORDER BY TransDate) AS EndDatekey,
ROW_NUMBER() OVER(PARTITION BY BusName, BusType ORDER BY TransDate) % 2 rnselect
FROM ctetrans
)
SELECT BusName,
BusType,
StartDateKey,
EndDatekey
INTO tempdata
FROM ctetempdata
WHERE rnselect = 0
-- see some testdata
SELECT COUNT(*) FROM tempdata;
--====================================================================================
-- Jacob solution
WITH
covered_dates AS
(
SELECT *
FROM tempdata
CROSS APPLY
(
SELECT TOP(DATEDIFF(dd,StartDateKey,EndDateKey)+1)
covered_date=DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,StartDateKey)
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n1(n) --An on-the-fly numbers table
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n2(n)
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n3(n)
CROSS JOIN
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n4(n)
)x
),
distinct_numbered_covered_dates AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY BusName ORDER BY covered_Date ASC)
FROM (
SELECT DISTINCT BusName,BusType,covered_date
FROM covered_dates
)x
)
SELECT BusName,
BusType,
StartDateKey=MIN(covered_date),
EndDateKey=MAX(covered_date)
FROM distinct_numbered_covered_dates
GROUP BY BusName,BusType,DATEADD(dd,-RN,covered_date)
ORDER BY BusName,
BusType,
StartDateKey;
-- JLS solution
WITH cte as (
SELECT BusName,
BusType,
StartDateKey,
EndDatekey,
CASE
WHEN DATEDIFF(day, LAG(enddatekey, 1, StartDateKey) OVER(PARTITION BY BusName, BusType ORDER BY Startdatekey ), StartDateKey) < 2
THEN NULL
ELSE ROW_NUMBER() OVER(PARTITION BY BusName, BusType ORDER BY Startdatekey )
END sortkey,
ROW_NUMBER() OVER(PARTITION BY BusName, BusType ORDER BY Startdatekey ) rn
FROM tempdata
)
,
cte2 as (
SELECT BusName,
BusType,
StartDateKey,
EndDatekey,
CAST(
SUBSTRING(
MAX( CAST(rn AS BINARY(4)) + CAST(sortkey AS BINARY(4)) )
OVER( PARTITION BY BusName, BusType ORDER BY rn ROWS UNBOUNDED PRECEDING ),
5, 4)
AS INT) AS lastval
FROM cte
)
SELECT BusName,
BusType,
MIN(StartDateKey) AS sdate,
MAX(EndDatekey) AS edate
FROM cte2
GROUP BY BusName,
BusType,
lastval
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 29, 2016 at 3:03 pm
J Livingston SQL (4/29/2016)
Jacob...dont seem to get the results expected from your code in comparison to mine.....have I misread your code:
You didn't misread it, but the sample data doesn't conform to a rule that the OP's data did (and I assumed in my solution), which is that a given bus can be of just one type. Just partition the ROW_NUMBER by bus type in addition to bus name and they should match.
I'll have to confirm later.
Cheers!
April 29, 2016 at 3:19 pm
Jacob Wilkins (4/29/2016)
J Livingston SQL (4/29/2016)
Jacob...dont seem to get the results expected from your code in comparison to mine.....have I misread your code:You didn't misread it, but the sample data doesn't conform to a rule that the OP's data did (and I assumed in my solution), which is that a given bus can be of just one type. Just partition the ROW_NUMBER by bus type in addition to bus name and they should match.
I'll have to confirm later.
Cheers!
got it !...nice solution
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 29, 2016 at 3:35 pm
J Livingston SQL (4/29/2016)
Jacob Wilkins (4/29/2016)
J Livingston SQL (4/29/2016)
Jacob...dont seem to get the results expected from your code in comparison to mine.....have I misread your code:You didn't misread it, but the sample data doesn't conform to a rule that the OP's data did (and I assumed in my solution), which is that a given bus can be of just one type. Just partition the ROW_NUMBER by bus type in addition to bus name and they should match.
I'll have to confirm later.
Cheers!
got it !...nice solution
Thanks! Yours works pretty nicely as well. It tends to perform better than my initial one in terms of CPU and duration.
I have a feeling they would be much closer if instead of doing the lazy DISTINCT subquery I either 1) did a ROW_NUMBER partitioned by all columns in the subquery and added a WHERE row_num=1 to get rid of duplicates, or 2) skipped the subquery altogether and just replaced the ROW_NUMBER with DENSE_RANK, since then duplicates wouldn't matter.
I'll have to test that out in a bit.
Cheers!
May 1, 2016 at 10:32 am
to the OP
you have marked my code as correct but havent posted any comments?
would be interested to know how you are getting on. ??
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
May 2, 2016 at 5:03 am
Tons of Thanks to you , in fixing and finding solution . But I am not sure , how it works on production data . As of now , I hold small data in development.
Work around to improve performance :
Run the logic over 100 Buses in a batch ,as logic is to condense records for each bus .
But I will inform, how I deployed in production .
Thanks
Surya Sunil
May 2, 2016 at 10:13 am
sunil.mvs (5/2/2016)
Tons of Thanks to you , in fixing and finding solution . But I am not sure , how it works on production data . As of now , I hold small data in development.Work around to improve performance :
Run the logic over 100 Buses in a batch ,as logic is to condense records for each bus .
But I will inform, how I deployed in production .
Thanks
Surya Sunil
I did post some test scripts earlier on in thus thread....have you tried them?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Viewing 10 posts - 16 through 24 (of 24 total)
You must be logged in to reply to this topic. Login to reply