Return parents for children with certain condition

Question

Return parents for children with certain condition

Charlottecb

Hall of Fame

Points: 3932
More actions
September 20, 2017 at 11:25 am

#340172

Ok, so here is my sample data I am testing against...
declare @t table (
Id  int ,
Section int,
DateOccured DATE,
    Missed int
);
insert into @t values
( 1 , 1 , '2014-01-01', 0),
( 2 , 1 , '2014-01-04', 1),
( 3 , 1 , '2014-01-06', 1),
( 4 , 1 , '2014-01-12', 0),
( 5 , 1 , '2014-02-05', 0),
( 6 , 1 , '2014-02-06', 1),
( 7 , 1 , '2014-02-20', 1),
( 8 , 1 , '2014-03-01', 1),
( 9 , 1 , '2014-03-05', 0),
( 10 , 1 , '2014-03-22', 0),
( 11 , 2 , '2014-02-06', 0),
( 12 , 2 , '2014-02-07', 1),
( 13 , 2 , '2014-02-08', 1),
( 14 , 2 , '2014-02-22', 0),
( 15 , 2 , '2014-02-24', 0),
( 16 , 2 , '2014-02-27', 0),
( 17 , 2 , '2014-02-28', 1),
( 18 , 2 , '2014-03-09', 0),
( 19 , 3 , '2014-03-11', 0),
( 20 , 3 , '2014-04-02', 0),
( 21 , 3 , '2014-04-13', 1),
( 22 , 3 , '2014-04-14', 0),
( 23 , 3 , '2014-05-01', 0),
( 24 , 3 , '2014-05-12', 1),
( 25 , 3 , '2014-05-14', 1),
( 26 , 3 , '2014-06-01', 1);
What I want to return is a list of sections that have 3 consecutive dates with the missed flags set to 1. Note consecutive dates are not 1st, 2nd, 3rd... It is 3 dates when ordered by date so for section 3, it should be returned as it had 3 records missed in a row occurring on 2014-05-12, 2014-05-14 and 2014-06-01. Section 2 wouldn't be returned as it only had 2 in a row, but section 1 would be as it had 3 consecutive missed slots.
Can anyone help? I'm thinking I might need to resort or a cursor and step through the data row by row.
This is as far as I had got...
SELECT TOP 1 WITH TIES
Id,
    Section,
    DateOccured
FROM @t
ORDER BY case when row_number() over(partition by Section order by DateOccured) <= 3 then 0 else 1 end;

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247202 More actions · Answer 1

This should do it - no CURSOR required.
WITH Missed AS ( SELECT t.Section , t.Missed , PrevMissed = LAG(t.Missed, 1) OVER (PARTITION BY t.Section ORDER BY t.Section , t.DateOccured ) , NextMissed = LEAD(t.Missed, 1) OVER (PARTITION BY t.Section ORDER BY t.Section , t.DateOccured ) FROM @t t ) SELECT m.Section FROM Missed m WHERE m.PrevMissed = 1 AND m.Missed = 1 AND m.NextMissed = 1 GROUP BY m.Section;

Charlottecb Hall of Fame Points: 3932 More actions · Answer 2

Phil, you are a legend!

One further small request for help... If I wanted to make it so that if the last occurence was not missed, the record was not returned, (in the sample data, section 1 would be excluded since (even though there were 3 missed in a row, the last one was not missed), how would I achieve that?
Many Thanks for taking the time to help me.

Phil Parkin SSC Guru Points: 247202 More actions · Answer 3

Charlottecb - Wednesday, September 20, 2017 12:01 PM
Phil, you are a legend!
One further small request for help... If I wanted to make it so that if the last occurence was not missed, the record was not returned, (in the sample data, section 1 would be excluded since (even though there were 3 missed in a row, the last one was not missed), how would I achieve that?
Many Thanks for taking the time to help me.

Here's a quick solution. It can almost certainly be improved on, if performance is not so good.
WITH Missed AS ( SELECT t.Section , t.Missed , PrevMissed = LAG(t.Missed, 1) OVER (PARTITION BY t.Section ORDER BY t.Section , t.DateOccured ) , NextMissed = LEAD(t.Missed, 1) OVER (PARTITION BY t.Section ORDER BY t.Section , t.DateOccured ) , LastMissed = lm.LastMissed FROM @t t CROSS APPLY ( SELECT TOP 1 LastMissed = t2.Missed FROM @t t2 WHERE t.Section = t2.Section ORDER BY t2.DateOccured DESC ) lm ) SELECT m.Section FROM Missed m WHERE m.PrevMissed = 1 AND m.Missed = 1 AND m.NextMissed = 1 AND m.LastMissed <> 0 GROUP BY m.Section;

Charlottecb Hall of Fame Points: 3932 More actions · Answer 4

Charlottecb

Hall of Fame

Points: 3932

September 20, 2017 at 12:19 pm

#1960244

Perfect Phil. Many, many thanks for your help.

drew.allen SSC Guru Points: 77014 More actions · Answer 5

Phil Parkin - Wednesday, September 20, 2017 12:11 PM
Charlottecb - Wednesday, September 20, 2017 12:01 PM
Phil, you are a legend!
One further small request for help... If I wanted to make it so that if the last occurence was not missed, the record was not returned, (in the sample data, section 1 would be excluded since (even though there were 3 missed in a row, the last one was not missed), how would I achieve that?
Many Thanks for taking the time to help me.
Here's a quick solution. It can almost certainly be improved on, if performance is not so good.
WITH Missed AS ( SELECT t.Section , t.Missed , PrevMissed = LAG(t.Missed, 1) OVER (PARTITION BY t.Section ORDER BY t.Section , t.DateOccured ) , NextMissed = LEAD(t.Missed, 1) OVER (PARTITION BY t.Section ORDER BY t.Section , t.DateOccured ) , LastMissed = lm.LastMissed FROM @t t CROSS APPLY ( SELECT TOP 1 LastMissed = t2.Missed FROM @t t2 WHERE t.Section = t2.Section ORDER BY t2.DateOccured DESC ) lm ) SELECT m.Section FROM Missed m WHERE m.PrevMissed = 1 AND m.Missed = 1 AND m.NextMissed = 1 AND m.LastMissed <> 0 GROUP BY m.Section;

This method takes 27 scans of the table. The following only requires one.

; WITH Missed AS ( SELECT *, SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MissedDays, ROW_NUMBER() OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS rn FROM @t ) SELECT Section FROM Missed WHERE rn = 1 AND MissedDays = 3;

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

drew.allen SSC Guru Points: 77014 More actions · Answer 6

I updated this to remove an expensive sort. The SUM was sorting ascending by the date, but the ROW_NUMBER was sorting descending, so two different sorts were required. I changed it so that both were sorting descending, and only one sort was required.

; WITH Missed AS ( SELECT *, SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays, ROW_NUMBER() OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS rn FROM @t ) SELECT Section FROM Missed WHERE rn = 1 AND MissedDays >= 3;

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Charlottecb Hall of Fame Points: 3932 More actions · Answer 7

Charlottecb

Hall of Fame

Points: 3932

September 20, 2017 at 2:34 pm

#1960273

Drew, that's a great solution. Many thanks.

jcelko212 32090 SSCrazy Eights Points: 9305 More actions · Answer 8

Charlottecb - Wednesday, September 20, 2017 11:25 AM

>> Ok, so here is my sample data I am testing against... <<

No, what you posted is crap. It is not a table because it has no key and all of the names of your columns are nullable so it can never have a key.

And then you wrote with a BIT flag! That's assembly language! In RDBMS. We discovered the state of a single being by not setting a flag, but by using predicates. Basically your approach to RDBMS and SQL is wrong not little wrong, but fundamentally wrong. Where is key in your postings??

CREATE TABLE Foobar
(something_section INTEGER NOT NULL,
occurence_date DATE NOT NULL,
PRIMARY KEY (something_section, occurence_date),
stupid_flg CHAR(1) NOT NULL
CHECK (stupid_flg IN ('0', '1')
);

Everything else you posted is total garbage, and makes me wonder why I spent writing books and all those decades of my life setting up SQL standards. Your table declaration has an extra column which, should not be there course, it looks like a sequential number for the physical row counts.

INSERT INTO Foobar VALUES
(1 , '2014-01-01', '0'),
(1 , '2014-01-04', '1')
(1 , '2014-01-06', '1'),
(1 , '2014-01-12', '0'),
(1 , '2014-02-05', '0'),
(1 , '2014-02-06', '1'),
(1 , '2014-02-20', '1'),
(1 , '2014-03-01', '1'),
(1 , '2014-03-05', '0'),
(1 , '2014-03-22', '0'),
(2 , '2014-02-06', '0'),
(2 , '2014-02-07', '1'),
(2 , '2014-02-08', '1'),
(2 , '2014-02-22', '0'),
(2 , '2014-02-24', '0'),
(2 , '2014-02-27', '0'),
(2 , '2014-02-28', '1'),
(2 , '2014-03-09', '0'),
(3 , '2014-03-11', '0'),
(3 , '2014-04-02', '0'),
(3 , '2014-04-13', '1'),
(3 , '2014-04-14', '0'),
(3 , '2014-05-01', '0'),
(3 , '2014-05-12', '1'),
(3 , '2014-05-14', '1'),
(3 , '2014-06-01', '1');

>> what I want to return is a list of something_sections that have 3 consecutive dates with the missed flags [sic] set to 1. Note consecutive dates are not 1st, 2nd, 3rd... It is 3 dates when ordered by date so for something_section 3, it should be returned as it had 3 records [sic: rows are not records] missed in a row occurring on 2014-05-12, 2014-05-14 and 2014-06-01. something_section 2 wouldn't be returned as it only had 2 in a row, but something_section 1 would be as it had 3 consecutive missed slots. <<

Why are you thinking of a consecutive sequential solution? Switch your mindset to sets and groups.. First we need a calendar table, then ordinal date number and it. Next, filter out the dates the don't have the nonrelational flag. If the minimum and the maximum dates are not three days, then they are out of your set. Suddenly your question is very, very easy; all that you to go ahead and finish writing it with the assumption of

Please post DDL and follow ANSI/ISO standards when asking for help.

Charlottecb Hall of Fame Points: 3932 More actions · Answer 9

Joe, chill out man - no one died, it's just a bit of code... It's not even a real table or data - I only created it to highlight my issue and provide something which is quick and easy to test against.

But thanks for your "constructive criticism"... I actually have one of your books so I'll be sure to re-read it again to make sure I'm doing things right 🙂

TheCTEGuy Ten Centuries Points: 1032 More actions · Answer 10

drew.allen - Wednesday, September 20, 2017 1:31 PM
I updated this to remove an expensive sort. The SUM was sorting ascending by the date, but the ROW_NUMBER was sorting descending, so two different sorts were required. I changed it so that both were sorting descending, and only one sort was required.
; WITH Missed AS ( SELECT *, SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays, ROW_NUMBER() OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS rn FROM @t ) SELECT Section FROM Missed WHERE rn = 1 AND MissedDays >= 3;
Drew

Just wanted to confirm that this might also work ?
; WITH Missed AS ( SELECT *, SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays, first_value(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS frstvalue FROM @t ) SELECT Section FROM Missed WHERE frstvalue = 1 AND MissedDays >= 3;

First solve the problem then write the code !

drew.allen SSC Guru Points: 77014 More actions · Answer 11

TheCTEGuy - Saturday, September 30, 2017 5:28 AM
drew.allen - Wednesday, September 20, 2017 1:31 PM
I updated this to remove an expensive sort. The SUM was sorting ascending by the date, but the ROW_NUMBER was sorting descending, so two different sorts were required. I changed it so that both were sorting descending, and only one sort was required.
; WITH Missed AS ( SELECT *, SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays, ROW_NUMBER() OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS rn FROM @t ) SELECT Section FROM Missed WHERE rn = 1 AND MissedDays >= 3;
Drew
Just wanted to confirm that this might also work ?
; WITH Missed AS ( SELECT *, SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays, first_value(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS frstvalue FROM @t ) SELECT Section FROM Missed WHERE frstvalue = 1 AND MissedDays >= 3;

I haven't tested, but I believe it would, but it would also be much less efficient, because FIRST_VALUE() requires a frame, and will use the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW if one is not specified whereas ROW_NUMBER() cannot use a frame. I know that specifying frames using ROWS rather than RANGE is much more efficient, because RANGE always writes out to a temp table, but ROWS will only write out to a temp table when the number of rows is over some amount (10000?) and for many aggregate functions there are only ever two rows (the current and the previous values).

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

drew.allen SSC Guru Points: 77014 More actions · Answer 12

Okay. I did a quick test, and my intuition bore out:

Here are the results:
Method duration ROW_NUMBER 14002 FIRST_VALUE/ROWS 24000 FIRST_VALUE/RANGE(Unspecified) 54004

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA