Return parents for children with certain condition

  • Ok, so here is my sample data I am testing against...

    declare @t table (
      Id  int ,
      Section int,
      DateOccured DATE,
        Missed int
    );

    insert into @t values
    ( 1 , 1 , '2014-01-01', 0),
    ( 2 , 1 , '2014-01-04', 1),
    ( 3 , 1 , '2014-01-06', 1),
    ( 4 , 1 , '2014-01-12', 0),
    ( 5 , 1 , '2014-02-05', 0),
    ( 6 , 1 , '2014-02-06', 1),
    ( 7 , 1 , '2014-02-20', 1),
    ( 8 , 1 , '2014-03-01', 1),
    ( 9 , 1 , '2014-03-05', 0),
    ( 10 , 1 , '2014-03-22', 0),

    ( 11 , 2 , '2014-02-06', 0),
    ( 12 , 2 , '2014-02-07', 1),
    ( 13 , 2 , '2014-02-08', 1),
    ( 14 , 2 , '2014-02-22', 0),
    ( 15 , 2 , '2014-02-24', 0),
    ( 16 , 2 , '2014-02-27', 0),
    ( 17 , 2 , '2014-02-28', 1),
    ( 18 , 2 , '2014-03-09', 0),

    ( 19 , 3 , '2014-03-11', 0),
    ( 20 , 3 , '2014-04-02', 0),
    ( 21 , 3 , '2014-04-13', 1),
    ( 22 , 3 , '2014-04-14', 0),
    ( 23 , 3 , '2014-05-01', 0),
    ( 24 , 3 , '2014-05-12', 1),
    ( 25 , 3 , '2014-05-14', 1),
    ( 26 , 3 , '2014-06-01', 1);

    What I want to return is a list of sections that have 3 consecutive dates with the missed flags set to 1. Note consecutive dates are not 1st, 2nd, 3rd... It is 3 dates when ordered by date so for section 3, it should be returned as it had 3 records missed in a row occurring on 2014-05-12, 2014-05-14 and 2014-06-01. Section 2 wouldn't be returned as it only had 2 in a row, but section 1 would be as it had 3 consecutive missed slots.
    Can anyone help? I'm thinking I might need to resort or a cursor and step through the data row by row.
    This is as far as I had got...

    SELECT TOP 1 WITH TIES
      Id,
        Section,
        DateOccured
    FROM @t
    ORDER BY case when row_number() over(partition by Section order by DateOccured) <= 3 then 0 else 1 end;

  • This should do it - no CURSOR required.
    WITH Missed
    AS
    (
      SELECT
       t.Section
      ,  t.Missed
      ,  PrevMissed = LAG(t.Missed, 1) OVER (PARTITION BY t.Section
                   ORDER BY
                     t.Section
                   , t.DateOccured
                 )
      ,  NextMissed = LEAD(t.Missed, 1) OVER (PARTITION BY t.Section
                    ORDER BY
                     t.Section
                    , t.DateOccured
                  )
      FROM @t t
    )
    SELECT m.Section
    FROM  Missed m
    WHERE
       m.PrevMissed  = 1
       AND m.Missed  = 1
       AND m.NextMissed = 1
    GROUP BY m.Section;

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil, you are a legend!

    One further small request for help... If I wanted to make it so that if the last occurence was not missed, the record was not returned, (in the sample data, section 1 would be excluded since (even though there were 3 missed in a row, the last one was not missed), how would I achieve that?
    Many Thanks for taking the time to help me.

  • Charlottecb - Wednesday, September 20, 2017 12:01 PM

    Phil, you are a legend!

    One further small request for help... If I wanted to make it so that if the last occurence was not missed, the record was not returned, (in the sample data, section 1 would be excluded since (even though there were 3 missed in a row, the last one was not missed), how would I achieve that?
    Many Thanks for taking the time to help me.

    Here's a quick solution. It can almost certainly be improved on, if performance is not so good.
    WITH Missed
    AS
    (
      SELECT
       t.Section
      , t.Missed
      , PrevMissed = LAG(t.Missed, 1) OVER (PARTITION BY t.Section
                   ORDER BY
                    t.Section
                   , t.DateOccured
                 )
      , NextMissed = LEAD(t.Missed, 1) OVER (PARTITION BY t.Section
                   ORDER BY
                     t.Section
                   , t.DateOccured
                 )
      , LastMissed = lm.LastMissed
      FROM
       @t t
      CROSS APPLY
      (
       SELECT TOP 1
          LastMissed = t2.Missed
       FROM  @t t2
       WHERE  t.Section = t2.Section
       ORDER BY t2.DateOccured DESC
      )  lm
    )
    SELECT m.Section
    FROM  Missed m
    WHERE
       m.PrevMissed  = 1
       AND m.Missed  = 1
       AND m.NextMissed = 1
       AND m.LastMissed <> 0
    GROUP BY m.Section;

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Perfect Phil. Many, many thanks for your help.

  • Phil Parkin - Wednesday, September 20, 2017 12:11 PM

    Charlottecb - Wednesday, September 20, 2017 12:01 PM

    Phil, you are a legend!

    One further small request for help... If I wanted to make it so that if the last occurence was not missed, the record was not returned, (in the sample data, section 1 would be excluded since (even though there were 3 missed in a row, the last one was not missed), how would I achieve that?
    Many Thanks for taking the time to help me.

    Here's a quick solution. It can almost certainly be improved on, if performance is not so good.
    WITH Missed
    AS
    (
      SELECT
       t.Section
      , t.Missed
      , PrevMissed = LAG(t.Missed, 1) OVER (PARTITION BY t.Section
                   ORDER BY
                    t.Section
                   , t.DateOccured
                 )
      , NextMissed = LEAD(t.Missed, 1) OVER (PARTITION BY t.Section
                   ORDER BY
                     t.Section
                   , t.DateOccured
                 )
      , LastMissed = lm.LastMissed
      FROM
       @t t
      CROSS APPLY
      (
       SELECT TOP 1
          LastMissed = t2.Missed
       FROM  @t t2
       WHERE  t.Section = t2.Section
       ORDER BY t2.DateOccured DESC
      )  lm
    )
    SELECT m.Section
    FROM  Missed m
    WHERE
       m.PrevMissed  = 1
       AND m.Missed  = 1
       AND m.NextMissed = 1
       AND m.LastMissed <> 0
    GROUP BY m.Section;

    This method takes 27 scans of the table.  The following only requires one.

    ;
    WITH Missed AS
    (
        SELECT *,
            SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MissedDays,
            ROW_NUMBER() OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS rn
        FROM @t
    )
    SELECT Section
    FROM Missed
    WHERE rn = 1
        AND MissedDays = 3;

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I updated this to remove an expensive sort.  The SUM was sorting ascending by the date, but the ROW_NUMBER was sorting descending, so two different sorts were required.  I changed it so that both were sorting descending, and only one sort was required.

    ;
    WITH Missed AS
    (
        SELECT *,
            SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays,
            ROW_NUMBER() OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS rn
        FROM @t
    )
    SELECT Section
    FROM Missed
    WHERE rn = 1
        AND MissedDays >= 3;

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Drew, that's a great solution. Many thanks.

  • Charlottecb - Wednesday, September 20, 2017 11:25 AM

    >> Ok, so here is my sample data I am testing against... <<

    No, what you posted is crap. It is not a table because it has no key and all of the names of your columns are nullable so it can never have a key.

    And then you wrote with a BIT flag! That's assembly language! In RDBMS. We discovered the state of a single being by not setting a flag, but by using predicates. Basically your approach to RDBMS and SQL is wrong not little wrong, but fundamentally wrong. Where is key in your postings??

    CREATE TABLE Foobar
    (something_section INTEGER NOT NULL,
    occurence_date DATE NOT NULL,
    PRIMARY KEY (something_section, occurence_date),
    stupid_flg CHAR(1) NOT NULL
    CHECK (stupid_flg IN ('0', '1')
    );

    Everything else you posted is total garbage, and makes me wonder why I spent writing books and all those decades of my life setting up SQL standards. Your table declaration has an extra column which, should not be there course, it looks like a sequential number for the physical row counts.

    INSERT INTO Foobar VALUES
    (1 , '2014-01-01', '0'),
    (1 , '2014-01-04', '1')
    (1 , '2014-01-06', '1'),
    (1 , '2014-01-12', '0'),
    (1 , '2014-02-05', '0'),
    (1 , '2014-02-06', '1'),
    (1 , '2014-02-20', '1'),
    (1 , '2014-03-01', '1'),
    (1 , '2014-03-05', '0'),
    (1 , '2014-03-22', '0'),
    (2 , '2014-02-06', '0'),
    (2 , '2014-02-07', '1'),
    (2 , '2014-02-08', '1'),
    (2 , '2014-02-22', '0'),
    (2 , '2014-02-24', '0'),
    (2 , '2014-02-27', '0'),
    (2 , '2014-02-28', '1'),
    (2 , '2014-03-09', '0'),
    (3 , '2014-03-11', '0'),
    (3 , '2014-04-02', '0'),
    (3 , '2014-04-13', '1'),
    (3 , '2014-04-14', '0'),
    (3 , '2014-05-01', '0'),
    (3 , '2014-05-12', '1'),
    (3 , '2014-05-14', '1'),
    (3 , '2014-06-01', '1');

    >> what I want to return is a list of something_sections that have 3 consecutive dates with the missed flags [sic] set to 1. Note consecutive dates are not 1st, 2nd, 3rd... It is 3 dates when ordered by date so for something_section 3, it should be returned as it had 3 records [sic: rows are not records] missed in a row occurring on 2014-05-12, 2014-05-14 and 2014-06-01. something_section 2 wouldn't be returned as it only had 2 in a row, but something_section 1 would be as it had 3 consecutive missed slots. <<

    Why are you thinking of a consecutive sequential solution? Switch your mindset to sets and groups.. First we need a calendar table, then ordinal date number and it. Next, filter out the dates the don't have the nonrelational flag. If the minimum and the maximum dates are not three days, then they are out of your set. Suddenly your question is very, very easy; all that you to go ahead and finish writing it with the assumption of

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Joe, chill out man - no one died, it's just a bit of code... It's not even a real table or data - I only created it to highlight my issue and provide something which is quick and easy to test against.

    But thanks for your "constructive criticism"... I actually have one of your books so I'll be sure to re-read it again to make sure I'm doing things right 🙂

  • drew.allen - Wednesday, September 20, 2017 1:31 PM

    I updated this to remove an expensive sort.  The SUM was sorting ascending by the date, but the ROW_NUMBER was sorting descending, so two different sorts were required.  I changed it so that both were sorting descending, and only one sort was required.

    ;
    WITH Missed AS
    (
        SELECT *,
            SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays,
            ROW_NUMBER() OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS rn
        FROM @t
    )
    SELECT Section
    FROM Missed
    WHERE rn = 1
        AND MissedDays >= 3;

    Drew

    Just wanted to confirm that this might also work ?

    ;
    WITH Missed AS
    (
      SELECT *,
       SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays,
      
            first_value(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS frstvalue
      FROM @t
    )
    SELECT Section
    FROM Missed
    WHERE frstvalue = 1
      AND MissedDays >= 3;

    First solve the problem then write the code !

  • TheCTEGuy - Saturday, September 30, 2017 5:28 AM

    drew.allen - Wednesday, September 20, 2017 1:31 PM

    I updated this to remove an expensive sort.  The SUM was sorting ascending by the date, but the ROW_NUMBER was sorting descending, so two different sorts were required.  I changed it so that both were sorting descending, and only one sort was required.

    ;
    WITH Missed AS
    (
        SELECT *,
            SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays,
            ROW_NUMBER() OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS rn
        FROM @t
    )
    SELECT Section
    FROM Missed
    WHERE rn = 1
        AND MissedDays >= 3;

    Drew

    Just wanted to confirm that this might also work ?

    ;
    WITH Missed AS
    (
      SELECT *,
       SUM(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MissedDays,
      
            first_value(Missed) OVER(PARTITION BY Section ORDER BY DateOccured DESC) AS frstvalue
      FROM @t
    )
    SELECT Section
    FROM Missed
    WHERE frstvalue = 1
      AND MissedDays >= 3;

    I haven't tested, but I believe it would, but it would also be much less efficient, because FIRST_VALUE() requires a frame, and will use the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW if one is not specified whereas ROW_NUMBER() cannot use a frame.  I know that specifying frames using ROWS rather than RANGE is much more efficient, because RANGE always writes out to a temp table, but ROWS will only write out to a temp table when the number of rows is over some amount (10000?) and for many aggregate functions there are only ever two rows (the current and the previous values).

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Okay.  I did a quick test, and my intuition bore out:

    Here are the results:
    Method                         duration
    ROW_NUMBER                        14002
    FIRST_VALUE/ROWS                  24000
    FIRST_VALUE/RANGE(Unspecified)    54004

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply