July 7, 2016 at 4:49 am
You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:
;WITH DateSequence AS (
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM #testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)
),
MarkedBoundaries AS (
SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)
FROM DateSequence
),
PairedDates AS (
SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2
FROM MarkedBoundaries
WHERE OnOff = 0 OR LastOnOff = 0
)
SELECT Code,
StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),
EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')
FROM PairedDates
GROUP BY Code, Grp
ORDER BY Code, Grp
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 7, 2016 at 5:54 pm
ChrisM@Work (7/7/2016)
You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:
;WITH DateSequence AS (
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM #testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)
),
MarkedBoundaries AS (
SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)
FROM DateSequence
),
PairedDates AS (
SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2
FROM MarkedBoundaries
WHERE OnOff = 0 OR LastOnOff = 0
)
SELECT Code,
StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),
EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')
FROM PairedDates
GROUP BY Code, Grp
ORDER BY Code, Grp
Haven't tried this yet but looks awesome. It's one of the things that Itzik referred to at the very end of his article because, as he pointed out, MS hadn't come up with proper windowing functions that would do the Preceeding Rows thing, yet. He was pretty instrumental in pushing on MS to make it a priority and, man, I'm thankful for that.
Now that we've finally upgraded from 2005 to 1012 at work, I can see me working with it more and more. Thanks for the great example, Chris.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 8, 2016 at 7:49 am
ChrisM@Work (7/7/2016)
You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:
edit...the original OP post related to SQL 2008....so what follows with windowing functions will not work for OP
Hi Chris
had a look at your solution and decided to have a "play" <grin>
here is some code that builds a test harness
--test harness adpated from here http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle
SET NOCOUNT ON;
USE tempdb;
---- helper function GetNums.....
-- jls: altenative use your own tally table/function
--IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL
-- DROP FUNCTION dbo.GetNums;
--GO
--CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
--AS
--RETURN
-- WITH
-- L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
-- L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
-- L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
-- L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
-- L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
-- L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
-- Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
-- SELECT TOP (@n) n FROM Nums ORDER BY n;
--GO
IF OBJECT_ID('dbo.testtable') IS NOT NULL DROP TABLE dbo.testtable;
CREATE TABLE dbo.testtable
(
id INT NOT NULL IDENTITY(1, 1),
Code VARCHAR(14) NOT NULL,
startdate DATETIME NOT NULL,
enddate DATETIME NOT NULL,
CONSTRAINT PK_Sessions PRIMARY KEY(id),
CONSTRAINT CHK_endtime_gteq_starttime CHECK (enddate >= startdate)
);
-- code to create and populate the table testtable with 5,000,000 rows
DECLARE
@num_code AS INT = 1000,
@intervals_per_code AS INT = 5000,
@start_period AS DATETIME = '20110101',
@end_period AS DATETIME = '20110114',
@max_duration_in_ms AS INT = 3600000; -- 60 min
--TRUNCATE TABLE dbo.testtable;
WITH C AS
(
SELECT 'Code' + RIGHT('00' + CAST(U.n AS VARCHAR(10)), 10) AS code,
DATEADD(ms, ABS(CHECKSUM(NEWID())) % 86400000,
DATEADD(day, ABS(CHECKSUM(NEWID())) % DATEDIFF(day, @start_period, @end_period), @start_period)) AS startdate
FROM dbo.GetNums(@num_code) AS U
CROSS JOIN dbo.GetNums(@intervals_per_code) AS I
)
INSERT INTO dbo.testtable WITH (TABLOCK) (code, startdate, enddate)
SELECT code,
startdate,
DATEADD(ms, ABS(CHECKSUM(NEWID())) % (@max_duration_in_ms + 1), startdate) AS enddate
FROM C;
-- indexes
CREATE UNIQUE INDEX idx_user_start_id ON dbo.testtable(code, startdate, id);
CREATE UNIQUE INDEX idx_user_end_id ON dbo.testtable(code, enddate, id);
if I compare your code to an alternative that IBG suggested initially for Oracle, prior to necesary windowing functions being available in Sql, then in my tests IBG code wins hands down.
Maybe you will see differences...will be interested to see what you think?
(I have removed the requirement for timediff calc and the OP's issue with NULL for sanity)
-- IBG http://blogs.solidq.com/en/sqlserver/packing-intervals/ Solution 3, Using a Window Aggregate
WITH C1 AS
(
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
),
C2 AS
(
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
FROM C3
GROUP BY code, grpnum
ORDER BY code
-- CHrisM solution
WITH DateSequence AS (
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)
),
MarkedBoundaries AS (
SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)
FROM DateSequence
),
PairedDates AS (
SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2
FROM MarkedBoundaries
WHERE OnOff = 0 OR LastOnOff = 0
)
SELECT Code,
StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),
EndDate = MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END)
FROM PairedDates
GROUP BY Code, Grp
ORDER BY Code, Grp
--- IBG for SQL 2008
WITH C1
AS (
SELECT
code,
startdate AS dt,
1 AS type,
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY code ORDER BY startdate) AS s
FROM testtable
UNION ALL
SELECT
code,
Enddate AS dt,
-1 AS type,
ROW_NUMBER() OVER(PARTITION BY code ORDER BY enddate) AS e,
NULL AS s
FROM testtable),
C2
AS (SELECT *,
ROW_NUMBER() OVER(PARTITION BY code ORDER BY dt, type DESC) AS se
FROM C1),
C3
AS (SELECT *,
FLOOR((ROW_NUMBER() OVER(PARTITION BY code ORDER BY dt) - 1) / 2) + 1 AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0)
SELECT code,
MIN(dt) AS startdate,
MAX(dt) AS enddate
FROM C3
GROUP BY code,grpnum
ORDER BY code,grpnum
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 8, 2016 at 9:18 am
That's awesome G - I've been in and out of meetings all day so haven't had a chance to play yet, but I bet there's some tweaking potential!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 11, 2016 at 4:31 am
Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.
IBG's first query looks like this:
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.
The alternative I posted earlier looks like this:
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)
It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.
That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂
Thanks G for posting this up.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 11, 2016 at 5:03 am
Jeff Moden (7/7/2016)
ChrisM@Work (7/7/2016)
You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:
;WITH DateSequence AS (
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM #testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)
),
MarkedBoundaries AS (
SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)
FROM DateSequence
),
PairedDates AS (
SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2
FROM MarkedBoundaries
WHERE OnOff = 0 OR LastOnOff = 0
)
SELECT Code,
StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),
EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')
FROM PairedDates
GROUP BY Code, Grp
ORDER BY Code, Grp
Haven't tried this yet but looks awesome. It's one of the things that Itzik referred to at the very end of his article because, as he pointed out, MS hadn't come up with proper windowing functions that would do the Preceeding Rows thing, yet. He was pretty instrumental in pushing on MS to make it a priority and, man, I'm thankful for that.
Now that we've finally upgraded from 2005 to 1012 at work, I can see me working with it more and more. Thanks for the great example, Chris.
You're welcome ol' friend - but note the operational differences between the two versions. I was testing them on a beefy instance where parallelism paid off.
That sort is expensive because it spills. I wonder if there's a sneaky way around that?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 11, 2016 at 6:36 am
ChrisM@Work (7/11/2016)
Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.IBG's first query looks like this:
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.
The alternative I posted earlier looks like this:
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)
It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.
That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂
Thanks G for posting this up.
Hey Chris
try this
WITH C1 AS
(
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
),
C2 AS
(
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
--into #ibgdump2
FROM C3
GROUP BY code, grpnum
ORDER BY code
OPTION (RECOMPILE, QUERYTRACEON 8649) -- http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx
as IBG says in the post.....
http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle
What makes this solution so fast is the efficient use of parallelism. Note though that when testing it in different environments and with different arguments for number of users and intervals, I didn’t always get a parallel plan. If you’re not getting a parallel plan, it could be because the machine you’re using has fewer logical CPUs than 8. Just for test purposes, you can use the SQL Server service startup option -P8, which will cause SQL Server to use 8 schedulers like in an environment with 8 logical CPUs. The -P startup parameter is not an officially documented one, so use it just for test purposes to mimic a machine with a desired number of CPUs, not for production purposes. Also, I noticed that in some machines where I tested this code and didn’t get parallel plans, when changing the sample data to 2,000 users each with 2,500 intervals, instead of 1,000 by 5,000, I got parallel plans in more cases. Either way, this solution is still very fast even when using a serial plan.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 11, 2016 at 6:45 am
J Livingston SQL (7/11/2016)
ChrisM@Work (7/11/2016)
Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.IBG's first query looks like this:
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.
The alternative I posted earlier looks like this:
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)
It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.
That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂
Thanks G for posting this up.
Hey Chris
try this
WITH C1 AS
(
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
),
C2 AS
(
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
--into #ibgdump2
FROM C3
GROUP BY code, grpnum
ORDER BY code
OPTION (RECOMPILE, QUERYTRACEON 8649) -- http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx
as IBG says in the post.....
http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle
What makes this solution so fast is the efficient use of parallelism. Note though that when testing it in different environments and with different arguments for number of users and intervals, I didn’t always get a parallel plan. If you’re not getting a parallel plan, it could be because the machine you’re using has fewer logical CPUs than 8. Just for test purposes, you can use the SQL Server service startup option -P8, which will cause SQL Server to use 8 schedulers like in an environment with 8 logical CPUs. The -P startup parameter is not an officially documented one, so use it just for test purposes to mimic a machine with a desired number of CPUs, not for production purposes. Also, I noticed that in some machines where I tested this code and didn’t get parallel plans, when changing the sample data to 2,000 users each with 2,500 intervals, instead of 1,000 by 5,000, I got parallel plans in more cases. Either way, this solution is still very fast even when using a serial plan.
It's quick G but not quite as quick as this, which uses no Black Magic at all:
PRINT '== CJM2 =========================================================================='
SET STATISTICS IO, TIME ON
;WITH C2 AS
(
SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760
FROM testtable m
CROSS APPLY (
VALUES
(CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),
(CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))
) d (Code, ts, type, sub)
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
FROM C3
GROUP BY code, grpnum
ORDER BY code;
SET STATISTICS IO, TIME OFF
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 11, 2016 at 7:24 am
It's quick G but not quite as quick as this, which uses no Black Magic at all:
I dont get the same results Chris....any thoughts ??
PRINT '== IBG =========================================================================='
SET STATISTICS IO, TIME ON;
WITH C1 AS
(
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
),
C2 AS
(
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
into #ibgdump
FROM C3
GROUP BY code, grpnum
ORDER BY code
OPTION (RECOMPILE, QUERYTRACEON 8649)
SET STATISTICS IO, TIME OFF
DROP TABLE #ibgdump
PRINT '== CJM2 =========================================================================='
SET STATISTICS IO, TIME ON
;WITH C2 AS
(
SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760
FROM testtable m
CROSS APPLY (
VALUES
(CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),
(CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))
) d (Code, ts, type, sub)
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
into #cmdump
FROM C3
GROUP BY code, grpnum
ORDER BY code;
SET STATISTICS IO, TIME OFF
DROP TABLE #cmdump
== IBG ==========================================================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
Table 'testtable'. Scan count 18, logical reads 38914, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 26627 ms, elapsed time = 7075 ms.
(2693 row(s) affected)
== CJM2 ==========================================================================
Table 'testtable'. Scan count 9, logical reads 26134, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 34484 ms, elapsed time = 17962 ms.
(2693 row(s) affected)
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 11, 2016 at 7:31 am
J Livingston SQL (7/11/2016)
It's quick G but not quite as quick as this, which uses no Black Magic at all:
I dont get the same results Chris....any thoughts ??
PRINT '== IBG =========================================================================='
SET STATISTICS IO, TIME ON;
WITH C1 AS
(
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
),
C2 AS
(
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
into #ibgdump
FROM C3
GROUP BY code, grpnum
ORDER BY code
OPTION (RECOMPILE, QUERYTRACEON 8649)
SET STATISTICS IO, TIME OFF
DROP TABLE #ibgdump
PRINT '== CJM2 =========================================================================='
SET STATISTICS IO, TIME ON
;WITH C2 AS
(
SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760
FROM testtable m
CROSS APPLY (
VALUES
(CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),
(CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))
) d (Code, ts, type, sub)
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
into #cmdump
FROM C3
GROUP BY code, grpnum
ORDER BY code;
SET STATISTICS IO, TIME OFF
DROP TABLE #cmdump
== IBG ==========================================================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
Table 'testtable'. Scan count 18, logical reads 38914, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 26627 ms, elapsed time = 7075 ms.
(2693 row(s) affected)
== CJM2 ==========================================================================
Table 'testtable'. Scan count 9, logical reads 26134, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 34484 ms, elapsed time = 17962 ms.
(2693 row(s) affected)
Guessing - you're on 8 cores with less than the 200gb of ram I'm playing with?
== CJM2 ==========================================================================
Table 'testtable'. Scan count 17, logical reads 27332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 34617, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 82182 ms, elapsed time = 7076 ms.
== JLS1 ==========================================================================
SQL Server parse and compile time:
CPU time = 9 ms, elapsed time = 9 ms.
Table 'testtable'. Scan count 34, logical reads 40869, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 48161 ms, elapsed time = 7217 ms.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 11, 2016 at 7:37 am
guess I am !!
only 12 GB here
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 11, 2016 at 9:00 am
J Livingston SQL (7/11/2016)
guess I am !!only 12 GB here
Try with this index G, I'm getting about 10% improvement.
-- CREATE UNIQUE NONCLUSTERED INDEX [idx_Code_Dates] ON [dbo].[testtable] ([Code] ASC,[enddate] ASC,[startdate] ASC)
PRINT '== CJM2 =========================================================================='
SET STATISTICS IO, TIME ON
;WITH C2 AS (
SELECT
d.code ,
d.ts,
cnt = SUM(type) OVER(PARTITION BY d.code ORDER BY d.ts, d.type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - d.sub
FROM testtable m
CROSS APPLY (
VALUES
(CAST(m.Code AS CHAR(10)), CAST(m.StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),
(CAST(m.Code AS CHAR(10)), CAST(m.EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))
) d (code, ts, type, sub)
),
C3 AS (
SELECT code, ts,
grpnum = ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2)
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
FROM C3
GROUP BY code, grpnum
ORDER BY code;
SET STATISTICS IO, TIME OFF -- 2782
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 11, 2016 at 9:13 am
== CJM2 ==========================================================================
Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31218 ms, elapsed time = 16031 ms.
(2693 row(s) affected)
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 11, 2016 at 9:20 am
J Livingston SQL (7/11/2016)
== CJM2 ==========================================================================
Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31218 ms, elapsed time = 16031 ms.
(2693 row(s) affected)
On your phone? 😛 You've got nothing in cache!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 11, 2016 at 9:53 am
ChrisM@Work (7/11/2016)
J Livingston SQL (7/11/2016)
== CJM2 ==========================================================================
Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31218 ms, elapsed time = 16031 ms.
(2693 row(s) affected)
On your phone? 😛 You've got nothing in cache!
nope...not on phone
ran DBCC FREEPROCCACHE then
ran your code 10 times.....that was the best I got.
Microsoft SQL Server 2014 - 12.0.4213.0 (X64) Enterprise Evaluation Edition (64-bit) on Windows NT 6.3 <X64> (Build 10586: )
Windows 10
Intel i7 6700
16Gb RAM (12gb to SQL)
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Viewing 15 posts - 16 through 30 (of 39 total)
You must be logged in to reply to this topic. Login to reply