We have a function that calculates student truancy records by counting their absence in a moving 30 days window. It uses a recursive CTE to count any 30 days passed.
Then we use other table to join this table valued function, and it was extremely slow as records increased.
I tuned it to select the TVF into temp table, then other table can join the temp table, it increased some performance.
But still this is the most cost query in my program, any other things I can do to tune the performance?
CREATE FUNCTION [dbo].[MaxTruantCount_Any30Day] ()
RETURNS TABLE
AS
RETURN
WITH FirstLastAbsentDates AS
(
SELECT a.StudentId
,MIN(a.Att_Date) AS FirstDay
,MAX(a.Att_Date) AS LastDay
FROM TruantDay a
JOIN SchoolYear b
ON a.SchoolYearId = b.Id
AND GETDATE() BETWEEN b.StartDate AND b.EndDate
WHERE a.IsAbsent = 1
AND a.IsCleared = 0
GROUP BY a.StudentId
HAVING MAX(a.Att_Date) > MIN(a.Att_Date)
)
, DateRangeEndDays (StudentId, EndDay) AS
(
SELECT a.StudentId
,a.LastDay
FROM FirstLastAbsentDates a
UNION ALL
SELECT a.StudentId
,DATEADD(d, -1, c.EndDay) AS EndDay
FROM FirstLastAbsentDates a
JOIN DateRangeEndDays c
ON c.StudentId = a.StudentId
WHERE DATEADD(d, -1, c.EndDay) > a.FirstDay
)
, AbsenceCounts AS
(
SELECT a.StudentId
, a.EndDay
, COUNT(*) AS AbsenceCount
FROM DateRangeEndDays a
JOIN TruantDay b
ON b.StudentId = a.StudentId
WHERE b.IsAbsent = 1
AND b.IsCleared = 0
AND b.Att_Date BETWEEN DATEADD(d, -30, a.EndDay)
AND a.EndDay
GROUP BY a.StudentId, a.EndDay
)
, MaxAbsenceCounts AS
(
SELECT a.StudentId
, MAX(a.AbsenceCount) AS MaxAbsenceCount
FROM AbsenceCounts a
GROUP BY a.StudentID
)
, MaxAbsenceDate AS
(
SELECT a.StudentId
, MIN(b.EndDay) AS MaxAbsenceDate
, a.MaxAbsenceCount
FROM MaxAbsenceCounts a
JOIN AbsenceCounts b
ON a.StudentId = b.StudentId
AND a.MaxAbsenceCount = b.AbsenceCount
GROUP BY a.StudentId, a.MaxAbsenceCount
)
SELECT a.StudentId AS StudentId
,a.MaxAbsenceCount AS MaxAbsenceCount
,a.MaxAbsenceDate AS MaxAbsenceDate
FROM MaxAbsenceDate a
--order by a.StudentId
;
March 17, 2021 at 1:32 am
That function can be rewritten to perform a lot better - but to do that we need sample data and tables to test, with expected results based on the sample data.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
March 17, 2021 at 3:55 am
I attache the schema of sample data scrambled for testing
March 17, 2021 at 11:13 am
I would replace the recursive CTE 'DateRangeEndDays' with a look up against a calendar table
DateRangeEndDays (StudentId, EndDay) AS
SELECT a.StudentId,c.CalendarDate
FROM FirstLastAbsentDates a
INNER JOIN MyCalendar c ON c.CalendarDate BETWEEN a.FirstDay AND a.LastDay
)
See here for details about a calendar table
https://www.sqlservercentral.com/steps/bones-of-sql-the-calendar-table
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 17, 2021 at 2:21 pm
It seems the query is performing a partial cross join. The cross join could be replaced by a tally function and window (rows between 30 preceding and current row) function. As far as I can tell this query produces the same output
with
FirstLastAbsentDates(StudentId, EndDate, FirstDay, LastDay) AS (
select a.StudentId, b.EndDate
,min(a.Att_Date)
,max(a.Att_Date)
from dbo.testTruantDay a
join dbo.testSchoolYear b ON a.SchoolYearId = b.Id
and getdate() between b.StartDate AND b.EndDate
where a.IsAbsent = 1
and a.IsCleared = 0
group by a.StudentId, b.EndDate
having max(a.Att_Date) > min(a.Att_Date))
select fl.StudentId, oa.cum_cnt as MaxAbsenceCount,
isnull(oa.Att_Date, dateadd(day, 1, fl.FirstDay)) as MaxAbsenceDate
from FirstLastAbsentDates fl
outer apply (select top(1) td.Att_Date,
sum(case when IsAbsent is null then 0 else 1 end)
over (order by dt rows between 30 preceding and current row) cum_cnt
from dbo.fnTally(0, datediff(day, fl.FirstDay, fl.LastDay)) fn
cross apply (values (dateadd(day, fn.n, fl.FirstDay))) v(dt)
left join dbo.testTruantDay td on fl.StudentId=td.StudentId
and v.dt=td.Att_Date
and td.IsAbsent=1
and td.IsCleared=0
order by cum_cnt desc) oa
order by fl.StudentId;
StudentIdMaxAbsenceCountMaxAbsenceDate
700049912021-01-27 00:00:00.000
700052472021-01-29 00:00:00.000
700054022020-12-18 00:00:00.000
7000559212020-12-16 00:00:00.000
7000907222020-11-06 00:00:00.000
7002389112020-11-06 00:00:00.000
700384592020-12-18 00:00:00.000
700444332020-11-16 00:00:00.000
...
(246 rows returned)
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 17, 2021 at 4:04 pm
I attache the schema of sample data scrambled for testing
What is the expected output from this data?
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
March 17, 2021 at 4:19 pm
Thanks, what is the benefit to use a calendar table, all we need is to know the first day and end day, If we create a calendar table, that has each day in the table, every year we need then to recreate the calendar table.
March 17, 2021 at 4:21 pm
I will give it a try, so the fnTally is the one I can copy here? thanks
https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
March 17, 2021 at 4:42 pm
I will give it a try, so the fnTally is the one I can copy here? thanks
https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally%5B/quote%5D
That's the one I use. There are others out there and if you're interested:
Number series generator challenge solutions – Part 3
Number series generator challenge solutions – Part 2
Number series generator challenge solutions – Part 1
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 17, 2021 at 5:03 pm
Wow, that is so much faster. Thank you Steve. Now let me take a look to understand the code, is it the fnTally make it so much faster?
Wow, that is so much faster. Thank you Steve. Now let me take a look to understand the code, is it the fnTally make it so much faster?
It's true fnTally is blazingly fast compared to other methods of generating series. These 2 lines of code serve the purpose of a calendar table:
dbo.fnTally(0, datediff(day, fl.FirstDay, fl.LastDay)) fn
cross apply (values (dateadd(day, fn.n, fl.FirstDay))) v(dt)
An actual calendar table would be just as fast or perhaps marginally faster. The original query counts 30 day totals by generating the series over and over. That was the slow part. Switching to a moving window is more efficient
sum(case when IsAbsent is null then 0 else 1 end)
over (order by dt rows between 30 preceding and current row) cum_cnt
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 17, 2021 at 11:40 pm
Another point:
Original code is in a table-valued function, then I use other table to join it.
For your code above, should I do same thing, put it in a table valued function?
Thanks
March 18, 2021 at 11:16 am
Yes it could be put in the same function
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply