tune a table valued function that is using recursive CTE

  • We have a function that calculates student truancy records by counting their absence in a moving 30 days window. It uses a recursive CTE to count any 30 days passed.

    Then we use other table to join this table valued function, and it was extremely slow as records increased.

    I tuned it to select the TVF into temp table, then other table can join the temp table, it increased some performance.

    But still this is the most cost query in my program, any other things I can do to tune the performance?

    CREATE FUNCTION [dbo].[MaxTruantCount_Any30Day] ()

    RETURNS TABLE

    AS

    RETURN

    WITH FirstLastAbsentDates AS

    (

    SELECT a.StudentId

    ,MIN(a.Att_Date) AS FirstDay

    ,MAX(a.Att_Date) AS LastDay

    FROM TruantDay a

    JOIN SchoolYear b

    ON a.SchoolYearId = b.Id

    AND GETDATE() BETWEEN b.StartDate AND b.EndDate

    WHERE a.IsAbsent = 1

    AND a.IsCleared = 0

    GROUP BY a.StudentId

    HAVING MAX(a.Att_Date) > MIN(a.Att_Date)

    )

    , DateRangeEndDays (StudentId, EndDay) AS

    (

    SELECT a.StudentId

    ,a.LastDay

    FROM FirstLastAbsentDates a

    UNION ALL

    SELECT a.StudentId

    ,DATEADD(d, -1, c.EndDay) AS EndDay

    FROM FirstLastAbsentDates a

    JOIN DateRangeEndDays c

    ON c.StudentId = a.StudentId

    WHERE DATEADD(d, -1, c.EndDay) > a.FirstDay

    )

    , AbsenceCounts AS

    (

    SELECT a.StudentId

    , a.EndDay

    , COUNT(*) AS AbsenceCount

    FROM DateRangeEndDays a

    JOIN TruantDay b

    ON b.StudentId = a.StudentId

    WHERE b.IsAbsent = 1

    AND b.IsCleared = 0

    AND b.Att_Date BETWEEN DATEADD(d, -30, a.EndDay)

    AND a.EndDay

    GROUP BY a.StudentId, a.EndDay

    )

    , MaxAbsenceCounts AS

    (

    SELECT a.StudentId

    , MAX(a.AbsenceCount) AS MaxAbsenceCount

    FROM AbsenceCounts a

    GROUP BY a.StudentID

    )

    , MaxAbsenceDate AS

    (

    SELECT a.StudentId

    , MIN(b.EndDay) AS MaxAbsenceDate

    , a.MaxAbsenceCount

    FROM MaxAbsenceCounts a

    JOIN AbsenceCounts b

    ON a.StudentId = b.StudentId

    AND a.MaxAbsenceCount = b.AbsenceCount

    GROUP BY a.StudentId, a.MaxAbsenceCount

    )

    SELECT a.StudentId AS StudentId

    ,a.MaxAbsenceCount AS MaxAbsenceCount

    ,a.MaxAbsenceDate AS MaxAbsenceDate

    FROM MaxAbsenceDate a

    --order by a.StudentId

    ;

    • This topic was modified 3 years, 9 months ago by  sqlfriend.
  • That function can be rewritten to perform a lot better - but to do that we need sample data and tables to test, with expected results based on the sample data.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I attache the schema of sample data scrambled for testing

  • try to attach file again

    Attachments:
    You must be logged in to view attached files.
  • I would replace the recursive CTE 'DateRangeEndDays' with a look up against a calendar table

    DateRangeEndDays (StudentId, EndDay) AS
    SELECT a.StudentId,c.CalendarDate
    FROM FirstLastAbsentDates a
    INNER JOIN MyCalendar c ON c.CalendarDate BETWEEN a.FirstDay AND a.LastDay
    )

    See here for details about a calendar table

    https://www.sqlservercentral.com/steps/bones-of-sql-the-calendar-table

     

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • It seems the query is performing a partial cross join.  The cross join could be replaced by a tally function and window (rows between 30 preceding and current row) function.  As far as I can tell this query produces the same output

    with
    FirstLastAbsentDates(StudentId, EndDate, FirstDay, LastDay) AS (
    select a.StudentId, b.EndDate
    ,min(a.Att_Date)
    ,max(a.Att_Date)
    from dbo.testTruantDay a
    join dbo.testSchoolYear b ON a.SchoolYearId = b.Id
    and getdate() between b.StartDate AND b.EndDate
    where a.IsAbsent = 1
    and a.IsCleared = 0
    group by a.StudentId, b.EndDate
    having max(a.Att_Date) > min(a.Att_Date))
    select fl.StudentId, oa.cum_cnt as MaxAbsenceCount,
    isnull(oa.Att_Date, dateadd(day, 1, fl.FirstDay)) as MaxAbsenceDate
    from FirstLastAbsentDates fl
    outer apply (select top(1) td.Att_Date,
    sum(case when IsAbsent is null then 0 else 1 end)
    over (order by dt rows between 30 preceding and current row) cum_cnt
    from dbo.fnTally(0, datediff(day, fl.FirstDay, fl.LastDay)) fn
    cross apply (values (dateadd(day, fn.n, fl.FirstDay))) v(dt)
    left join dbo.testTruantDay td on fl.StudentId=td.StudentId
    and v.dt=td.Att_Date
    and td.IsAbsent=1
    and td.IsCleared=0
    order by cum_cnt desc) oa
    order by fl.StudentId;
    StudentIdMaxAbsenceCountMaxAbsenceDate
    700049912021-01-27 00:00:00.000
    700052472021-01-29 00:00:00.000
    700054022020-12-18 00:00:00.000
    7000559212020-12-16 00:00:00.000
    7000907222020-11-06 00:00:00.000
    7002389112020-11-06 00:00:00.000
    700384592020-12-18 00:00:00.000
    700444332020-11-16 00:00:00.000
    ...
    (246 rows returned)

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • sqlfriend wrote:

    I attache the schema of sample data scrambled for testing

    What is the expected output from this data?

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks, what is the benefit to use a calendar table, all we need is to know the first day and end day, If we create a calendar table,  that has each day in the table, every year we need then to recreate the calendar table.

  • I will give it a try, so the fnTally is the one I can copy here? thanks

    https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally

  • sqlfriend wrote:

    I will give it a try, so the fnTally is the one I can copy here? thanks

    https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally%5B/quote%5D

    That's the one I use.  There are others out there and if you're interested:

    Number series generator challenge solutions – Part 3

    Number series generator challenge solutions – Part 2

    Number series generator challenge solutions – Part 1

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Wow, that is so much faster. Thank you Steve. Now let me take a look to understand the code, is it the fnTally make it so much faster?

     

     

  • Jeffrey Williams wrote:

    sqlfriend wrote:

    I attache the schema of sample data scrambled for testing

    What is the expected output from this data?

    The result is just like above Steve posted.

  • sqlfriend wrote:

    Wow, that is so much faster. Thank you Steve. Now let me take a look to understand the code, is it the fnTally make it so much faster? 

    It's true fnTally is blazingly fast compared to other methods of generating series.  These 2 lines of code serve the purpose of a calendar table:

    dbo.fnTally(0, datediff(day, fl.FirstDay, fl.LastDay)) fn
    cross apply (values (dateadd(day, fn.n, fl.FirstDay))) v(dt)

    An actual calendar table would be just as fast or perhaps marginally faster.  The original query counts 30 day totals by generating the series over and over.  That was the slow part.  Switching to a moving window is more efficient

    sum(case when IsAbsent is null then 0 else 1 end) 
    over (order by dt rows between 30 preceding and current row) cum_cnt

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Another point:

    Original code is in a table-valued function, then I use other table to join it.

    For your code above, should I do same thing, put it in a table valued function?

     

    Thanks

  • Yes it could be put in the same function

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply