October 1, 2020 at 1:37 pm
Hi
In the below input test data, ID is unique and in ascending order (not incremental values).
I want to find runs of at least three 1s, and return the result below. The 0s would be any values that are not 1.
I'm thinking there must be some 'standard' way to do this, such as treating them as islands?
Thanks for any help.
--INPUT TEST DATA
declare @T table(ID int not null primary key, Val int not null)
insert @T(ID, Val)
select 1, 0 union all
select 10, 1 union all
select 17, 0 union all
select 18, 0 union all
select 27, 1 union all
select 34, 1 union all
select 42, 1 union all
select 52, 0 union all
select 62, 1 union all
select 63, 1 union all
select 65, 0 union all
select 73, 0 union all
select 79, 1 union all
select 85, 1 union all
select 89, 1 union all
select 90, 1
select * from @T
--EXPECTED RESULT
declare @R table(PosDesc varchar(10) not null, ID int null, Val int null)
insert @R(PosDesc, ID, Val)
select 'Before', 18, 0 union all
select 'First', 27, 1 union all
select 'Last',42, 1 union all
select 'After', 52, 0 union all
select 'Before', 73, 0 union all
select 'First', 79, 1 union all
select 'Last',90, 1 union all
select 'After', NULL, NULL
select * from @R
October 1, 2020 at 2:15 pm
select 'After', NULL, NULL
is not required in the result.
October 1, 2020 at 6:51 pm
It's not clear what you mean. I'm not clear what the @r means here.
First, without ordering, there's no way to tell what the runs are. You always, always need to have some ordering column.
Next, you could like do some running count, and when this reaches 3+, you could then use that to determine you've had a group. This is an islands type problem, but I don't think it's well defined.
October 1, 2020 at 7:40 pm
This is kind of messy but it should work, there's one more case case that you didn't have in your sample data as well.
;WITH TEMP_CTE AS(
select *
, LEAD(VAL, 1, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_VAL_ONE
, LEAD(VAL, 2, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_VAL_TWO
, LAG(VAL, 1, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_VAL_ONE
, LAG(VAL, 2, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_VAL_TWO
from @T
), TEMP_CTE_TWO AS(
SELECT *, CASE WHEN VAL = 1 AND ((NEXT_VAL_ONE = 1 AND NEXT_VAL_TWO = 1) OR (PREV_VAL_ONE = 1 AND PREV_VAL_TWO = 1) OR (PREV_VAL_ONE = 1 AND NEXT_VAL_ONE = 1)) THEN 1 ELSE 0 END AS IS_RUN FROM TEMP_CTE
), TEMP_CTE_THREE AS(
SELECT *
, LEAD(IS_RUN, 1, 0) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_RUN
, LAG(IS_RUN, 1, 0) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_RUN FROM TEMP_CTE_TWO
), TEMP_CTE_FOUR AS(
SELECT *
, CASE WHEN IS_RUN = 0 AND NEXT_RUN = 1 AND PREV_RUN = 1 THEN 'Before/After'
WHEN IS_RUN = 0 AND NEXT_RUN = 1 AND PREV_RUN = 0 THEN 'Before'
WHEN IS_RUN = 0 AND NEXT_RUN = 0 AND PREV_RUN = 1 THEN 'After'
WHEN IS_RUN = 1 AND NEXT_RUN = 1 AND PREV_RUN = 0 THEN 'First'
WHEN IS_RUN = 1 AND NEXT_RUN = 1 AND PREV_RUN = 1 THEN 'Middle'
WHEN IS_RUN = 1 AND NEXT_RUN = 0 AND PREV_RUN = 1 THEN 'Last'
ELSE 'None'
END AS RUN_TYPE
FROM TEMP_CTE_THREE
)
SELECT * FROM TEMP_CTE_FOUR
WHERE RUN_TYPE NOT IN('None', 'Middle')
ORDER BY ID ASC
October 1, 2020 at 7:57 pm
ZZartin thanks, that works. What if it was 'at least 4 1s' (would need new test data). Ideally the number would be variable, could that be achieved.
October 1, 2020 at 8:45 pm
Yes, you could try something like this,
declare @T table(ID int not null primary key, Val int not null)
declare @row_level int = 3
insert @T(ID, Val)
select 1, 0 union all
select 10, 1 union all
select 17, 0 union all
select 18, 0 union all
select 27, 1 union all
select 34, 1 union all
select 42, 1 union all
select 52, 0 union all
select 62, 1 union all
select 63, 1 union all
select 65, 0 union all
select 73, 0 union all
select 79, 1 union all
select 85, 1 union all
select 89, 1 union all
select 90, 1 union all
select 91, 0 union all
select 92, 1 union all
select 93, 1 union all
select 94, 1
;WITH TEMP_CTE AS(
SELECT *
, LAG(VAL, 1, 0) OVER(ORDER BY ID ASC) AS PREV_VAL
, LEAD(VAL, 1, 0) OVER(ORDER BY ID ASC) AS NEXT_VAL
, LAG(ID, 1, NULL) OVER(ORDER BY ID ASC) AS PREV_ID_VAL
, LEAD(ID, 1, NULL) OVER(ORDER BY ID ASC) AS NEXT_ID_VAL
FROM @T
), TEMP_CTE_TWO AS(
SELECT *, 1 AS ROW_LEVEL, ID AS START_ID, PREV_ID_VAL AS ORIG_PREV_ID FROM TEMP_CTE
WHERE VAL = 1 AND PREV_VAL = 0
UNION ALL
SELECT TEMP_CTE.*, TEMP_CTE_TWO.ROW_LEVEL + 1 AS ROW_LEVEL, TEMP_CTE_TWO.START_ID, TEMP_CTE_TWO.ORIG_PREV_ID FROM TEMP_CTE_TWO
INNER JOIN TEMP_CTE ON TEMP_CTE.ID = TEMP_CTE_TWO.NEXT_ID_VAL AND TEMP_CTE.VAL = 1
), TEMP_CTE_THREE AS(
SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'Before' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.ORIG_PREV_ID AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
UNION ALL
SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'First' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.START_ID AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
UNION ALL
SELECT TEMP_CTE_TWO.ID, TEMP_CTE_TWO.VAL, 'Last' AS RUN_TYPE FROM TEMP_CTE_TWO WHERE TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
UNION ALL
SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'After' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.NEXT_ID_VAL AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
)
SELECT * FROM TEMP_CTE_THREE
ORDER BY ID
October 1, 2020 at 9:03 pm
One way:
WITH
numbered AS (SELECT *, rn=ROW_NUMBER() OVER (ORDER BY ID ASC) FROM @T),
grouped AS (SELECT *,first_zero=MAX(CASE WHEN Val<>1 THEN rn END) OVER (ORDER BY ID ASC ROWS UNBOUNDED PRECEDING) FROM numbered),
first_and_such AS (SELECT before_rn=MIN(CASE WHEN Val<>1 THEN rn END),first_rn=MIN(CASE WHEN Val=1 THEN rn END),last_rn=MAX(CASE WHEN Val=1 THEN rn END) FROM grouped GROUP BY first_zero),
split AS (SELECT x.* FROM first_and_such CROSS APPLY (VALUES('Before',before_rn),('First',first_rn),('Last',last_rn),('After',last_rn+1))x(PosDesc,rn) WHERE last_rn-first_rn>1)
SELECT s.PosDesc,n.ID,n.Val
FROM split s LEFT JOIN numbered n ON s.rn=n.rn
ORDER BY s.rn ASC;
Cheers!
October 2, 2020 at 4:36 am
This will give you ID's for records with zeros wrapping the set of sequential records with 1's in between:
SELECT Last0.ID Last0_ID, next0.ID Next0.ID
FROM @T Last0
OUTER APPLY (select top 1 Id from @T F0 WHERE F0.ID > Last0.ID and F0.Val = 0) next0
WHERE last0.Val = 0
AND exists (select * from @T T1
where T1.ID > Last0.ID
and (T1.ID < next0.ID or next0.ID is null)
group by Val
having COUNT(id) >=3
)
You should be able to get required outcome simply by using it as a CTE.
Just cater for NULL in next0.ID.
_____________
Code for TallyGenerator
October 2, 2020 at 8:28 am
Yet another version.
Non-serious note to Jeff Moden: See, I actually am willing to use an identity as the clustering key.
DECLARE @min_vals_in_sequence int
SET @min_vals_in_sequence = 3
DROP TABLE IF EXISTS #T_sequential
CREATE TABLE #T_sequential ( seq_id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
ID int NULL, Val int NULL );
INSERT INTO #T_sequential ( ID, Val )
SELECT ID, Val
FROM @T
ORDER BY ID
INSERT INTO #T_sequential
VALUES(2000000000, 0)
SELECT
CASE WHEN T3.seq_id = T1.seq_id THEN 'Before'
WHEN T3.seq_id = T1.seq_id + 1 THEN 'First'
WHEN T3.seq_id = T2.seq_id - 1 THEN 'Last'
WHEN T3.seq_id = T2.seq_id THEN 'After'
END AS PosDesc,
CASE WHEN T3.ID = 2000000000 THEN NULL ELSE T3.seq_ID END AS ID,
CASE WHEN T3.ID = 2000000000 THEN NULL ELSE T3.Val END AS Val
FROM #T_sequential T1
CROSS APPLY (
SELECT TOP (1) *
FROM #T_sequential T2
WHERE T2.ID > T1.ID AND T2.Val = 0
ORDER BY T2.ID
) AS T2
INNER JOIN #T_sequential T3 ON T3.seq_id IN ( T1.seq_id, T1.seq_id + 1, T2.seq_id - 1, T2.seq_id )
WHERE T1.Val = 0 AND T2.seq_id - T1.seq_id - 1 >= @min_vals_in_sequence
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 2, 2020 at 2:14 pm
Non-serious note to Jeff Moden: See, I actually am willing to use an identity as the clustering key.
😀
--Jeff Moden
Change is inevitable... Change for the better is not.
October 2, 2020 at 4:21 pm
I played around with this for a while last night, trying to find the most concise complete solution (found one maybe 20% shorter than my previous solution, but its performance gets rather dreadful on larger datasets)
I also tried to improve on the performance of my previous solution, and here's another way.
It's not nearly as concise as my previous version, but performs better under all the circumstances I tested on my system (SQL Server 2019 on a high-ish end laptop), even if we help out my previous version by specifying a ROWS frame for the windowed aggregate (which allows it to make use of an in-memory worktable instead of the disk-based worktable used by the default RANGE).
WITH numbered AS (SELECT *, rn =ROW_NUMBER() OVER (ORDER BY ID ASC) FROM @T),
lag_lead AS (SELECT *, ID1 =LAG(ID,1) OVER (ORDER BY ID ASC), ID2 =LEAD(ID,1) OVER (ORDER BY ID ASC),
Val1=LAG(Val,1) OVER (ORDER BY ID ASC), Val2=LEAD(Val,1) OVER (ORDER BY ID ASC),
rn1 =LAG(rn,1) OVER (ORDER BY ID ASC), rn2 =LEAD(rn,1) OVER (ORDER BY ID ASC)
FROM numbered),
paired AS (SELECT x.* FROM lag_lead l CROSS APPLY (VALUES(0, ID1, Val1, rn1, ID, Val, rn),
(1, ID2, Val2, rn2, ID, Val,rn)
)x(pos, border_ID, border_Val, border_rn, ID, Val, rn)
WHERE l.Val=1 AND ISNULL(border_Val,0)<>1),
combined AS (SELECT Pos,before_ID=border_ID, before_Val=border_Val, first_ID=ID, first_Val=Val,
last_ID =LEAD(ID,1) OVER (ORDER BY ID ASC, pos ASC), last_val =LEAD(VAL,1) OVER (ORDER BY ID ASC, pos ASC),
after_ID=LEAD(border_ID,1) OVER (ORDER BY ID ASC, pos ASC), after_Val=LEAD(border_Val,1) OVER (ORDER BY ID ASC, pos ASC),
cnt =LEAD(rn,1) OVER (ORDER BY ID ASC, pos ASC)-rn
FROM paired)
SELECT x.*
FROM combined c
CROSS APPLY
(VALUES('Before',before_ID,before_Val),('First',first_ID,first_Val),('Last',last_ID,last_Val),('After',after_ID,after_Val))x(PosDesc,ID,Val)
WHERE c.pos=0 AND c.cnt>1
ORDER BY first_ID ASC;
Cheers!
October 2, 2020 at 5:08 pm
Thanks to all for the solutions!
On a quick look, sparked enough ideas for me to try and (hopefully) write a solution - don't want to be beaten by it. I will then come back and compare to the solutions provided here.
October 2, 2020 at 6:19 pm
>> I want to find runs of at least three 1s, and return the result below. The 0s would be any test_values that are not 1. <<
This is been a classic freshman computer science course problem for decades. But it's usually done with a raise in a procedural language. The reasons done in a procedural language is that the concept of an ordering that you're trying to sneak in here doesn't apply in SQL or RDBMS.
CREATE TABLE Tests
(test_id INTEGER NOT NULL PRIMARY KEY,
test_val INTEGER NOT NULL
CHECK (test_val I (0,1));
INSERT INTO Tests
VALUES
(10, 1),
(17, 0),
(18, 0),
(27, 1), (34, 1), (42, 1),
(52, 0),
(62, 1),(63, 1),
(65, 0),
(73, 0),
(79, 1),(85, 1),(89, 1),(90, 1);
I'm curious as to why you used a 50-year-old Sybase syntax to do a table constructor. That "UNION ALL" simply guaranteed your code won't port. Why did you want to create a result Table and materialize it? Wouldn't a view or query result be better? Are you still thinking about filesystems and having to physically write results to magnetic tapes?
I also don't understand what your "pos_description" means. I would have thought that you would return an identifier of some sort for each run, and perhaps its length. Since SQL does not use ordering, words like before and after are frowned upon. Another thing we don't like in SQL is creating unnecessary NULLs. And yes, it does look like something to do with islands.
CREATE VIEW TestsRuns
AS
SELECT test_id, test_val,
(ROW_NUMBER() OVER (ORDER BY test_id)
- ROW_NUMBER() OVER (PARTITION BY test_val
ORDER BY test_id)) AS test_grp,
FROM Tests;
(10, 1, 1 - 1 = 0),
(17, 0, 2 - 1 = 1),
(18, 0, 3 - 2 = 1),
(27, 1, 4 - 1 = 3),
(34, 1, 5 - 2 = 3),
(42, 1, 6 - 3 = 3),
(52, 0, 7 - 1 = 6),
(62, 1, 8 - 1 = 7),
(63, 1, 9 - 2 = 7),
(65, 0, 10 - 1 = 9),
(73, 0, 11 - 2 = 9),
(79, 1, 12 - 1 =11),
(85,1, 13 - 2 = 11),
(89, 1, 14 - 3 = 11),
(90, 1 ,15 - 4 = 11);
Given this view, you can now inspect the runs with either windowed functions or just a simple group by.
Please post DDL and follow ANSI/ISO standards when asking for help.
October 2, 2020 at 7:22 pm
>> I want to find runs of at least three 1s, and return the result below. The 0s would be any test_values that are not 1. <<
Since SQL does not use ordering, words like before and after are frowned upon. ...
Given this view, you can now inspect the runs with either windowed functions or just a simple group by.
That comment on ordering has to be unintended parody, since the code you posted uses two ORDER BYs to specify, well, the order of rows. And the real world orders (sequences) things all the time, including noting before and after. Such as, did you make a payment on/before it was due rather than after.
As to the final view, please show us how to get from that to the desired result. After already sorting twice and segmenting once the entire input data, you still haven't determined the specific rows that need included in the output.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply