August 31, 2014 at 10:14 pm
Hi all,
I have a system log with NULL gaps between a sequence of numbers...see "BEFORE" sample below.
The number of gaps between the Sequence_ID's are arbitrary, but generally less then 50 records.
I'd like enumerate the gaps to produce the "AFTER" result, but do it with a single query or view, not through procedures.
I've been playing with windowed functions and groupings with no success. I'm guessing it'll need some recursive CTE logic, but I haven't been able to figure it out the correct loop.
Anyone have any ideas?
Thanks!
BEFORE:
PK_IDSequence_ID
1035586935587
3035586234 NULL
8355585 NULL
1235584 NULL
4675583 35583
4035582 NULL
6035382 NULL
1435581 NULL
2035580 NULL
3435553 35563
6603589 NULL
9475559 35552
AFTER:
PK_IDSequence_ID
1035586935587
3035586234 3
8355585 2
1235584 1
4675583 35583
4035582 4
6035382 3
1435581 2
2035580 1
3435553 35563
6603589 1
9475559 35552
August 31, 2014 at 10:57 pm
Quick thought, no need to use recursion or loops, rather use the LAG window function. Here is a quick sample code with some test data.
😎
USE tempdb;
GO
DECLARE @TEST_DATA TABLE
(
TD_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED
,TD_NUM INT NOT NULL
);
DECLARE @SAMPLE_SIZE INT = 10;
;WITH T(N) AS (SELECT N FROM ( VALUES (NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
,NUMS(N) AS(SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N
FROM T T1, T T2, T T3, T T4, T T5, T T6, T T7)
INSERT INTO @TEST_DATA (TD_NUM)
SELECT
ABS(FLOOR(CHECKSUM(NEWID()) / (RAND() * 10)))
FROM NUMS NM;
SELECT
TD.TD_ID
,TD.TD_NUM
,TD.TD_NUM - LAG(TD.TD_NUM) OVER (ORDER BY TD.TD_ID) AS GAP
FROM @TEST_DATA TD;
Results
TD_ID TD_NUM GAP
----------- ----------- ----------
1 467441025 NULL
2 336642282 -130798743
3 336717627 75345
4 355014646 18297019
5 52905649 -302108997
6 303559379 250653730
7 316539004 12979625
8 181767044 -134771960
9 521113123 339346079
10 392579077 -128534046
August 31, 2014 at 11:11 pm
Thanks, I'll take a look.
Realized my sample data was bad. The PK will be in increasing order, which might help, but it'll have same gaps, like the Sequence_ID.
Here's better sample code:
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035550, 35550)
SELECT
*
FROM @system_log
ORDER by PK_ID desc
Result:
PK_IDSequence_ID
103559035587
1035589NULL
1035586NULL
1035585NULL
1035584NULL
103558335583
1035582NULL
1035581NULL
1035579NULL
103557835553
1035554NULL
103555035550
Expected Result:
PK_IDSequence_ID
103559035587
10355894
10355863
10355852
10355841
103558335583
10355823
10355812
10355791
103557835553
10355541
103555035550
September 1, 2014 at 1:04 am
Here is another code example using your data sample
😎
USE tempdb;
GO
;WITH system_log(PK_ID,Sequence_ID) AS
( SELECT PK_ID,Sequence_ID FROM
(VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035550, 35550)) AS X(PK_ID,Sequence_ID)
)
SELECT
SL.PK_ID
,LEAD(SL.PK_ID,1,SL.PK_ID) OVER
(
ORDER BY SL.PK_ID
) AS LEAD_PK_ID
,LAG(SL.PK_ID,1,SL.PK_ID) OVER
(
ORDER BY SL.PK_ID
) AS LAG_PK_ID
,ISNULL(SL.Sequence_ID,LEAD(SL.PK_ID,1,SL.PK_ID) OVER
(
ORDER BY SL.PK_ID
) - SL.PK_ID) AS FORWARD_GAP
,ISNULL(SL.Sequence_ID,SL.PK_ID - LAG(SL.PK_ID,1,SL.PK_ID) OVER
(
ORDER BY SL.PK_ID
) ) AS REVERSE_GAP
FROM system_log SL;
Results
PK_ID LEAD_PK_ID LAG_PK_ID FORWARD_GAP REVERSE_GAP
----------- ----------- ----------- ----------- -----------
1035550 1035554 1035550 35550 35550
1035554 1035578 1035550 24 4
1035578 1035579 1035554 35553 35553
1035579 1035581 1035578 2 1
1035581 1035582 1035579 1 2
1035582 1035583 1035581 1 1
1035583 1035584 1035582 35583 35583
1035584 1035585 1035583 1 1
1035585 1035586 1035584 1 1
1035586 1035589 1035585 3 1
1035589 1035590 1035586 1 3
1035590 1035590 1035589 35587 35587
September 1, 2014 at 5:05 am
If you Binoogle Itzik Ben-Gan gaps and islands TSQL all of the top hits are relevant to your need.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
September 1, 2014 at 10:54 am
Thanks for the tip. I've actually recently purchased "Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions" written by Itzik Ben-Gan.
I think the main problem is the data set doesn't have a predictable Window to partition against, so I can't use Window Functions w/o have a way to traverse the data.
I'm working on a query using recursive CTE and LEAD/LAG. Looking at using a CROSS APPLY or FOR XML subqueries to generate the "Window"
I might break down and write a procedural solution. Still really want this as a single query.
September 1, 2014 at 5:29 pm
Here is a potential solution. It works with the sample data provided.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035550, 35550);
WITH BaseData as (
SELECT
PK_ID,
Sequence_ID,
rn3 = row_number() over (order by PK_ID) - row_number() over (partition by case when Sequence_ID is null then 0 else 1 end order by PK_ID)
FROM
@system_log
), InertimData as (
select
bd1.PK_ID,
bd1.Sequence_ID,
cn = row_number() over (partition by case when bd1.Sequence_ID is null then 0 else 1 end, bd1.rn3 order by bd1.PK_ID)
from
BaseData bd1
)
select
PK_ID,
coalesce(Sequence_ID, cn) Sequence_ID
from
InertimData
order by
PK_ID desc;
September 1, 2014 at 6:41 pm
Not exactly a traditional gaps and islands problem but fun nevertheless. Here's a relatively simple alternative.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035552, 35551),
(1035550, 35550)
SELECT PK_ID, Sequence_ID=CASE WHEN Sequence_ID IS NOT NULL THEN Sequence_ID ELSE
ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID)-1 END
FROM
(
SELECT PK_ID, Sequence_ID
,rn2=MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM @system_log
) a
ORDER by PK_ID desc;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 1, 2014 at 7:02 pm
Awesome, thanks dwain! Very elegant solution. I should be able to transfer this logic to the actual data quite easily.
September 1, 2014 at 7:04 pm
Thanks Lynn as well. Love having alternatives in the toolbelt.
September 1, 2014 at 7:10 pm
dwain.c (9/1/2014)
Not exactly a traditional gaps and islands problem but fun nevertheless. Here's a relatively simple alternative.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035552, 35551),
(1035550, 35550)
SELECT PK_ID, Sequence_ID=CASE WHEN Sequence_ID IS NOT NULL THEN Sequence_ID ELSE
ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID)-1 END
FROM
(
SELECT PK_ID, Sequence_ID
,rn2=MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM @system_log
) a
ORDER by PK_ID desc;
Dwain, I like yours as it appears to be more efficient than mine. I rewrote it slightly to make it cleaner in my opinion.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035550, 35550);
WITH BaseData as (
SELECT
PK_ID,
Sequence_ID,
rn2 = MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM
@system_log
)
SELECT
PK_ID,
Sequence_ID = CASE WHEN Sequence_ID IS NOT NULL
THEN Sequence_ID
ELSE ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID) - 1
END
FROM
BaseData
ORDER BY
PK_ID DESC;
September 1, 2014 at 7:17 pm
Lynn Pettis (9/1/2014)
dwain.c (9/1/2014)
Not exactly a traditional gaps and islands problem but fun nevertheless. Here's a relatively simple alternative.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035552, 35551),
(1035550, 35550)
SELECT PK_ID, Sequence_ID=CASE WHEN Sequence_ID IS NOT NULL THEN Sequence_ID ELSE
ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID)-1 END
FROM
(
SELECT PK_ID, Sequence_ID
,rn2=MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM @system_log
) a
ORDER by PK_ID desc;
Dwain, I like yours as it appears to be more efficient than mine. I rewrote it slightly to make it cleaner in my opinion.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035550, 35550);
WITH BaseData as (
SELECT
PK_ID,
Sequence_ID,
rn2 = MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM
@system_log
)
SELECT
PK_ID,
Sequence_ID = CASE WHEN Sequence_ID IS NOT NULL
THEN Sequence_ID
ELSE ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID) - 1
END
FROM
BaseData
ORDER BY
PK_ID DESC;
Thanks Lynn. Guess I skipped the last part of Make it Work, Make it Fast, Make it Pretty[/url]
🙂
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 1, 2014 at 7:34 pm
Dwain,
Actually, I just wanted to understand the pieces. I really need to learn the extensions to the windowing functions in 2012.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply