Trying to speed up this 'GetWorkingDays' function.

  • lars.hesselberg 49028 - Tuesday, October 24, 2017 3:35 AM

    Hi,

    How about this approach:

    DECLARE @BegDate DATETIME = '20000701',

    @EndDate DATETIME = '20000707';

    Declare @tHolidays TABLE (Holiday DATETIME PRIMARY KEY);

    INSERT INTO @tHolidays (Holiday)

    SELECT Holiday FROM (VALUES (CAST('20000529' AS DATETIME)),

    ('20000704'), ('20000904'), ('20001123'), ('20001124'), ('20001225'), ('20010101'), ('20010528'), ('20010704'), ('20010903'), ('20011122'),

    ('20011123'), ('20011225'), ('20020101'), ('20020527'), ('20020704'), ('20020902'), ('20021128'), ('20021129'), ('20021225'), ('20030101'),

    ('20030526'), ('20030704'), ('20030901'), ('20031127'), ('20031128'), ('20031225'), ('20040101'), ('20040531'), ('20040705'), ('20040906'),

    ('20041125'), ('20041126'), ('20041224'), ('20041231'), ('20050530'), ('20050704'), ('20050905'), ('20051124'), ('20051125'), ('20051226'),

    ('20060102'), ('20060529'), ('20060704'), ('20060904'), ('20061123'), ('20061124'), ('20061225'), ('20070101'), ('20070528'), ('20070704'),

    ('20070903'), ('20071122'), ('20071123'), ('20071225'), ('20080101'), ('20080526'), ('20080704'), ('20080901'), ('20081127'), ('20081128'),

    ('20081225'), ('20090101'), ('20090525'), ('20090703'), ('20090907'), ('20091126'), ('20091127'), ('20091225'), ('20100101'), ('20100531'),

    ('20100705'), ('20100906'), ('20101125'), ('20101126'), ('20101224'), ('20101231'), ('20110530'), ('20110704'), ('20110905'), ('20111124'),

    ('20111125'), ('20111226'), ('20120102'), ('20120528'), ('20120704'), ('20120903'), ('20121122'), ('20121123'), ('20121225'), ('20130101'),

    ('20130527'), ('20130704'), ('20130902'), ('20131128'), ('20131129'), ('20131225'), ('20140101'), ('20140526'), ('20140704'), ('20140901'),

    ('20141127'), ('20141128'), ('20141225'), ('20150101'), ('20150525'), ('20150703'), ('20150907'), ('20151126'), ('20151127'), ('20151225'),

    ('20160101'), ('20160530'), ('20160704'), ('20160905'), ('20161124'), ('20161125'), ('20161226'), ('20170102'), ('20170529'), ('20170704'),

    ('20170904'), ('20171123'), ('20171124'), ('20171225'), ('20180101'), ('20180528'), ('20180704'), ('20180903'), ('20181122'), ('20181123'),

    ('20181225'), ('20190101'), ('20190527'), ('20190704'), ('20190902'), ('20191128'), ('20191129'), ('20191225'), ('20200101'), ('20200525'),

    ('20200703'), ('20200907'), ('20201126'), ('20201127'), ('20201225'), ('20210101'), ('20210531'), ('20210705'), ('20210906'), ('20211125'),

    ('20211126'), ('20211224'), ('20211231'), ('20220530'), ('20220704'), ('20220905'), ('20221124'), ('20221125'), ('20221226'), ('20230102'),

    ('20230529'), ('20230704'), ('20230904'), ('20231123'), ('20231124'), ('20231225'), ('20240101'), ('20240527'), ('20240704'), ('20240902'),

    ('20241128'), ('20241129'), ('20241225'), ('20250101'), ('20250526'), ('20250704'), ('20250901'), ('20251127'), ('20251128'), ('20251225'),

    ('20260101'), ('20260525'), ('20260703'), ('20260907'), ('20261126'), ('20261127'), ('20261225'), ('20270101'), ('20270531'), ('20270705'),

    ('20270906'), ('20271125'), ('20271126'), ('20271224'), ('20271231'), ('20280529'), ('20280704'), ('20280904'), ('20281123'), ('20281124'),

    ('20281225'), ('20290101'), ('20290528'), ('20290704'), ('20290903'), ('20291122'), ('20291123'), ('20291225')

    ) h (Holiday);

    SELECT

    (DATEDIFF(dd, @BegDate, @EndDate) + 1)

    -(DATEDIFF(wk, @BegDate, @EndDate) * 2)

    -(CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)

    -(CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)

    - (SELECT SUM(CASE WHEN DATENAME(dw, Holiday) = 'Saturday' Or DATENAME(dw, Holiday) = 'Sunday' THEN 0 ELSE 1 END)

    FROM @tHolidays WHERE Holiday >= @BegDate AND Holiday <= @EndDate);

    Lars

    Good idea but try your code with the following dates, Lars.


    DECLARE  @BegDate DATETIME = '20000101',
             @EndDate DATETIME = '20000221'
    ;

    It returns a NULL even though there are several weekends involved that should be counted as "holidays"/non-working days.  The reason is because  the code only returns a value if the happens to be a holiday.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,
    You got me! The reason is obvious: number - NULL = NULL.
    Here is the fix:
    -- Comment in/out as needed for testing
    --DECLARE @BegDate DATETIME = '20000701', @EndDate DATETIME = '20000707';  -- 5 days, 1 Holiday => 4 Workdays
    DECLARE @BegDate DATETIME = '20000101', @EndDate DATETIME = '20000221'; -- 36 days, No holidays => 36 workdays
    --DECLARE @BegDate DATETIME = '20021225', @EndDate DATETIME = '20030101'; -- 6 days, 2 holidays => 4 workdays

    select (DATEDIFF(dd, @BegDate, @EndDate) + 1)

    Declare @tHolidays TABLE (Holiday DATETIME PRIMARY KEY);

    INSERT INTO @tHolidays (Holiday)
    SELECT Holiday FROM (VALUES (CAST('20000529' AS DATETIME)),
     ('20000704'), ('20000904'), ('20001123'), ('20001124'), ('20001225'), ('20010101'), ('20010528'), ('20010704'), ('20010903'), ('20011122'),
     ('20011123'), ('20011225'), ('20020101'), ('20020527'), ('20020704'), ('20020902'), ('20021128'), ('20021129'), ('20021225'), ('20030101'),
     ('20030526'), ('20030704'), ('20030901'), ('20031127'), ('20031128'), ('20031225'), ('20040101'), ('20040531'), ('20040705'), ('20040906'),
     ('20041125'), ('20041126'), ('20041224'), ('20041231'), ('20050530'), ('20050704'), ('20050905'), ('20051124'), ('20051125'), ('20051226'),
     ('20060102'), ('20060529'), ('20060704'), ('20060904'), ('20061123'), ('20061124'), ('20061225'), ('20070101'), ('20070528'), ('20070704'),
     ('20070903'), ('20071122'), ('20071123'), ('20071225'), ('20080101'), ('20080526'), ('20080704'), ('20080901'), ('20081127'), ('20081128'),
     ('20081225'), ('20090101'), ('20090525'), ('20090703'), ('20090907'), ('20091126'), ('20091127'), ('20091225'), ('20100101'), ('20100531'),
     ('20100705'), ('20100906'), ('20101125'), ('20101126'), ('20101224'), ('20101231'), ('20110530'), ('20110704'), ('20110905'), ('20111124'),
     ('20111125'), ('20111226'), ('20120102'), ('20120528'), ('20120704'), ('20120903'), ('20121122'), ('20121123'), ('20121225'), ('20130101'),
     ('20130527'), ('20130704'), ('20130902'), ('20131128'), ('20131129'), ('20131225'), ('20140101'), ('20140526'), ('20140704'), ('20140901'),
     ('20141127'), ('20141128'), ('20141225'), ('20150101'), ('20150525'), ('20150703'), ('20150907'), ('20151126'), ('20151127'), ('20151225'),
     ('20160101'), ('20160530'), ('20160704'), ('20160905'), ('20161124'), ('20161125'), ('20161226'), ('20170102'), ('20170529'), ('20170704'),
     ('20170904'), ('20171123'), ('20171124'), ('20171225'), ('20180101'), ('20180528'), ('20180704'), ('20180903'), ('20181122'), ('20181123'),
     ('20181225'), ('20190101'), ('20190527'), ('20190704'), ('20190902'), ('20191128'), ('20191129'), ('20191225'), ('20200101'), ('20200525'),
     ('20200703'), ('20200907'), ('20201126'), ('20201127'), ('20201225'), ('20210101'), ('20210531'), ('20210705'), ('20210906'), ('20211125'),
     ('20211126'), ('20211224'), ('20211231'), ('20220530'), ('20220704'), ('20220905'), ('20221124'), ('20221125'), ('20221226'), ('20230102'),
     ('20230529'), ('20230704'), ('20230904'), ('20231123'), ('20231124'), ('20231225'), ('20240101'), ('20240527'), ('20240704'), ('20240902'),
     ('20241128'), ('20241129'), ('20241225'), ('20250101'), ('20250526'), ('20250704'), ('20250901'), ('20251127'), ('20251128'), ('20251225'),
     ('20260101'), ('20260525'), ('20260703'), ('20260907'), ('20261126'), ('20261127'), ('20261225'), ('20270101'), ('20270531'), ('20270705'),
     ('20270906'), ('20271125'), ('20271126'), ('20271224'), ('20271231'), ('20280529'), ('20280704'), ('20280904'), ('20281123'), ('20281124'),
     ('20281225'), ('20290101'), ('20290528'), ('20290704'), ('20290903'), ('20291122'), ('20291123'), ('20291225')
     ) h (Holiday);

    SELECT

     (DATEDIFF(dd, @BegDate, @EndDate) + 1)
     -(DATEDIFF(wk, @BegDate, @EndDate) * 2)
     -(CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)
     -(CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)

     - COALESCE((SELECT SUM(CASE WHEN DATENAME(dw, Holiday) = 'Saturday' Or DATENAME(dw, Holiday) = 'Sunday' THEN 0 ELSE 1 END)), 0)

     FROM @tHolidays WHERE Holiday >= @BegDate AND Holiday <= @EndDate;

  • Lars - or this:

    SELECT

    (DATEDIFF(dd, @BegDate, @EndDate) + 1)

    - (DATEDIFF(wk, @BegDate, @EndDate) * 2)

    - (CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)

    - (CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)

    - ISNULL(SUM(CASE WHEN DATENAME(dw, Holiday) IN ('Saturday','Sunday') THEN 0 ELSE 1 END),0)

    FROM @tHolidays

    WHERE Holiday >= @BegDate

    AND Holiday <= @EndDate;

    However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
    Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Jeff,

    However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
    Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.

    Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
    scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
    The table variable is only for the sake of the approach, I guess in real life it would be a database table.

  • lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AM

    Hi Jeff,

    However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
    Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.

    Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
    scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
    The table variable is only for the sake of the approach, I guess in real life it would be a database table.

    For the same reason I don't set MAXDOP = 1 at the instance level.
    The reasons for not using scalar functions are well know and well documented. I try not to have too many ALWAYS & NEVER rules... 
    but until MS makes inline scalars a reality, they will remain on my never list.

  • Jeff Moden - Sunday, October 22, 2017 11:44 AM

    Here's my shot at this problem. Just to make life easy, I'll keep all the code together for this post.  The code is attached as .txt files, which is a bit annoying... why wouldn't a forum for SQL Server allow someone to save .SQL files, I'll never know. :blink:

    Build the Calendar Table
    First, see the attached "Calender Rev 02 Small.txt" file.  It's the old code I resurrected to make a Calendar table with some "special" columns along with some of the more traditional columns.  It does not have any "holidays" associated with Easter and doesn't contain any ISO Week/Year.  If you need any of that, post back.  The "special" columns are...

    1.  WorkdayNumber - An ascending count of workdays in the calendar table.  If a day is a weekend or holiday, the previous workday number is "smeared" down into the weekend or holiday.  This allows for some incredibly simple code (simple subtraction between two integers)  to determine the number of work days between two dates.  It also allows for other easy and fast calculations such as determining lead times and what things like "in 5 business days actually means all using a simple lookup and integer addition or subtraction.  This column is the reason why I wanted to resurrect this skinny and quite old table.  It's magic.  This is the same kind of column that Joe Celko spoke of in his post except the code I've included actually does such a thing and, using the proprietary 3 part UPDATE (also known as a "Quirky Update"), is nasty fast and even runs in SQL Server 2000 (haven't tested it in 6.5 or 7) because it doesn't use RBAR and it doesn't use Lead/Lag/Preceeding, etc.

    2.  DTSerial - The PK of this table is the DT column, which actually stands for "DateTime" and not just "Date".  DTSerial is the underlying integer value of the number of whole days that have passed since the 1st of January, 1900.  It doesn't have a huge impact for performance but, as you'll see, it can make code simpler and so I recently added it for this problem.

    3. DWOccurance -  Although not used for the problem on this thread, this column is very useful for greatly simplifying questions/queries like "What is the date of the 3rd Monday of June" or "Create a list of the dates for every Tuesday of the month for the next two years".  And, yes, "DW" stands for "Day of the Week" and is the same notation as what is offered in the SQL Server date/time functions.  Please get used to that or change it to "Weekday" if you can't memorize the simple and obvious 2 character names for the common date parts.  In any case, quitchurbitchin about it because I'll never use the long names for the date parts.  "Keep your eye upon the donut and not upon the hole".  😉

    Those columns and all the others are described in the header of the code that makes the table.  I didn't take the time to add those descriptions to each column as extended properties but will in the future.  This might make a good article.

    To use the code, open it in SSMS, look for "TODO" to find where you need to change the range of dates to be included and make any changes you need (preset for the requirements Jason posted earlier).  Then , after doing your safety review of the code, run the code.

    IF you ever need to Move, Add, or Delete (MAD) a holiday, do your thing to the calendar table making sure to set the IsHoliday column in the table and adding the description.  This can be done using a "bulk" bit of code to do the whole table (as in the original code for a given holiday) or just a one-off for a given year (I have to do this a lot of years at work) and then run the last two "paragraphs" of this code to renumber the WorkdayNumber column according to the changes in holidays.

    The code that builds the Calendar table also includes ONLY those indexes necessary to solve this problem

    Jeff, My apologies for the slow reply. I haven't been sandbagging on this I'm simply trying to move at a more deliberate pace this time around.

    Yesterday afternoon and well into the evening, was spent just on the Calendar table alone…

    First of all… you clearly put a huge amount of time & thought into this, going well beyond what any normal forum post could ever warrant. Seriously, if your goal was to dispel my belief that you’re one the very best SQL practitioner/educators in the world… you really missed the mark this time. I don’t mean to gush, but this really is exceptional work!

    Celko was the first one, on this thread, to introduce the idea of using “ordinal_business_nbr†[#1903571], not long after I had implemented the idea of dumping the table constructor in favor of the long CASE expressions. Different implementations of the same principal… simple addition and subtraction operations are orders of magnitude faster than counting/aggregation operations… In fact, that was final nudge that prompted me to post the “X2†version early, in the first place.  No need to rehash those gory details…

    In short, I’ve actually had a “Celko†version kicking around the ol’ noggin for a few days now…

    So, since I didn’t say it on Friday…  Thank you Joe!... You’re you and you take a lot of heat for that… while I’m not saying you don’t deserve most of it… I will say that you don’t necessarily get the recognition you deserve for the good stuff.  If I’m working on anything that involves hierarchies, Nested Sets is pretty much automatic at this point… And, thanks to you, I will NEVER make the mistake of conflating “records†for rows or “fields†for columns. 😉

    I’m actually taken aback that by the fact that all 3 solutions were developed independently and yet they all essentially work off of the same principal. Great minds really do think alike!  Either that, or this has always been common knowledge and I just showed up really late to the party…

    I was definitely WAY late to the party on the noticing that you’d used “SET DATEFIRST 1†to make Monday the 1st day of the week… I had altered you original code to append “_JM†to the end of your Calendar table because I wanted to live in the same test database as a copy of our existing Calendar table (thank you for the DROP warning by the way)… I probably spent close to an hour and a half trying to figure out how and where I’d screwed up a simple name alteration so badly that Friday’s are weekends and Sundays are workdays…

    Now that know what the cause is, and I can see how it makes setting DWOccurance easier and DW & DWOccurance are working on tandem to help calculate holidays. Is there a reason (that I haven’t found yet) to keep it like that? 1) It threw me for a big loop and I’m sure it’ll do the same to others and 2) I can’t make the same change to the incumbent DW column on the existing Calendar table.  I can, however create that column with a different name if it is indeed an ongoing part of the overall solution.

    The DTInt and DTSerial columns… Also ideas that have been rolling around, specifically, it occurred to me that 1) I wasn’t making good use of the CASE expressions ability to short circuit and 2) I could be getting killed by implicit conversions (CHAR à DATETIME) resulting from the hardcoded holiday dates. The DTInt version was what occurred to me first but basic tests showed that getting the input dates was expensive compared to the DTSerial  and had no advantage. I’m not going to lie, I was patting myself on the back a little when I saw it on the table. The fact that you’ve had it completely fleshed out, packaged up and waiting on me for a few days, clearly demonstrates I’m no pretender to the thrown… but I did take it as confirmation that I wasn’t just chasing squirrels.

    The conceptual details aside… Very cool t-sql… I wouldn’t have thought about that approach so it cool to see it done from a using a different approach (for me anyway). I’ll see if I can find the code I was using prior to resigning myself to using hard coded values. It worked and was fast, just not “do it a billion times in a function fastâ€â€¦ and finally, QUIRKY! I do have a soft spot for Quiry updates… even if I do have to go back to double check the “must do / must not do†checklist…

    And on that note, I figured I would share this…

    --=========================================================================
    -- start by making a fresh copy of Jeff's Calendar table.
    --=========================================================================
    IF OBJECT_ID('CodeTest.dbo.Calendar_Copy', 'U') IS NULL
    BEGIN    -- DROP TABLE dbo.Calendar_Copy;
        SELECT
            cj.DT, cj.DTNext, cj.DTInt, cj.DTSerial, cj.YY, cj.MM, cj.DD, cj.DW,
            cj.DWOccurance, cj.IsWorkDay, cj.IsHoliday,
            WorkDayNumber = CAST(NULL AS INT),    --<<== this method relys on NULLs as opposed to 0's.
            cj.HolidayName
            INTO CodeTest.dbo.Calendar_Copy
        FROM
            dbo.Calendar_JM cj;

        -- With the exception of NULL's inplave of zeros,
        -- bring it to a state just prior to the "Observed" code.
        UPDATE cc SET
            cc.IsWorkDay = 1,
            cc.IsHoliday = 0,
            cc.HolidayName = REPLACE(cc.HolidayName, ' (Observed)', '')
        FROM
            dbo.Calendar_Copy cc
        WHERE
            cc.HolidayName LIKE '% (Observed)';        

        ALTER TABLE dbo.Calendar_Copy
            ADD CONSTRAINT PK_Calendar_Copy PRIMARY KEY CLUSTERED (DT) WITH FILLFACTOR = 100;
    END;

    --=========================================================================
    -- use the LAG & LEAD windowing functions to update the "Observed" values
    -- in a single refference to the base table. Using a covering index key
    -- prevents a sort operation in the execution plan.
    --=========================================================================
    WITH
        cte_Observed AS (
            SELECT
                cc.DW,
                cc.IsHoliday,
                cc.IsWorkDay,
                cc.HolidayName,
                oIsHoliday = CASE                    -- this is where I was getting beat up by the DATEFIRST = 1 ... I was using 2 & 6 origionally.
                                WHEN cc.DW = 1 AND LAG(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 1
                                WHEN cc.DW = 5 AND LEAD(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 1
                                ELSE cc.IsHoliday
                            END,
                oIsWorkDay = CASE
                                WHEN cc.DW = 1 AND LAG(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 0
                                WHEN cc.DW = 5 AND LEAD(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 0
                                ELSE cc.IsWorkDay
                            END,
                oHolidayName = CASE
                                WHEN cc.DW = 1 THEN ISNULL(NULLIF(LAG(cc.HolidayName, 1, cc.HolidayName) OVER (ORDER BY cc.DT), '') + ' (Observed)', cc.HolidayName)
                                WHEN cc.DW = 5 THEN ISNULL(NULLIF(LEAD(cc.HolidayName, 1, cc.HolidayName) OVER (ORDER BY cc.DT), '') + ' (Observed)', cc.HolidayName)
                                ELSE cc.HolidayName
                            END
            FROM
                dbo.Calendar_Copy cc
            )
    UPDATE o SET
        o.IsHoliday = o.oIsHoliday,
        o.IsWorkDay = o.oIsWorkDay,
        o.HolidayName = o.oHolidayName
    FROM
        cte_Observed o
    WHERE
        o.oIsHoliday = 1
        AND o.DW IN (1, 5);

    --=========================================================================
    -- start by filling the WorkDayNumber w/ ROW_NUMBER() where IsWorkDay = 1
    -- and leaving non-workdays NULL.
    --=========================================================================
    WITH
        cte_WorkDayNumberFill AS (
            SELECT
                cc.WorkDayNumber,
                RN = ROW_NUMBER() OVER (ORDER BY cc.DT)
            FROM
                dbo.Calendar_Copy cc
            WHERE
                cc.IsWorkDay = 1
            )
    UPDATE wf SET
        wf.WorkDayNumber = wf.RN
    FROM
        cte_WorkDayNumberFill wf;

    --=========================================================================
    -- Smear the over the NULL values using MAX() in a window frame.
    --=========================================================================
    WITH
        cte_WorkDayNumberSmear AS (
            SELECT
                cc.WorkDayNumber,
                wdns = MAX(cc.WorkDayNumber) OVER (ORDER BY cc.DT ROWS UNBOUNDED PRECEDING)
            FROM
                dbo.Calendar_Copy cc
            )
    UPDATE ws SET
        ws.WorkDayNumber = ISNULL(ws.wdns, 0)
    FROM
        cte_WorkDayNumberSmear ws
    WHERE
        ws.WorkDayNumber IS NULL;

    --=========================================================================
    -- The net result...
    --=========================================================================
    SELECT
        cc.DT, cc.DTNext, cc.DTInt, cc.DTSerial, cc.YY, cc.MM, cc.DD, cc.DW,
        cc.DWOccurance, cc.IsWorkDay, cc.IsHoliday, cc.WorkDayNumber, cc.HolidayName
    FROM
        dbo.Calendar_Copy cc;

  • lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AM

    Hi Jeff,

    However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
    Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.

    Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
    scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
    The table variable is only for the sake of the approach, I guess in real life it would be a database table.

    It's not "Jeff" you're responding to.  It should be "Chris".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AM

    Hi Jeff,

    However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
    Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.

    Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
    scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
    The table variable is only for the sake of the approach, I guess in real life it would be a database table.

    The Cartesian product is what costs dearly in performance.  That's why the I wrote the special column in the Calendar table I built.  I traded off a Cartesian product for each row for doing 2 lookups from memory for each row.

    If you'd like to encapsulate your good code as an iTVF, I'd be happy to test it using the million row test table that I posted.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, October 26, 2017 8:53 AM

    lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AM

    Hi Jeff,

    However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
    Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.

    Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
    scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
    The table variable is only for the sake of the approach, I guess in real life it would be a database table.

    The Cartesian product is what costs dearly in performance.  That's why the I wrote the special column in the Calendar table I built.  I traded off a Cartesian product for each row for doing 2 lookups from memory for each row.

    If you'd like to encapsulate your good code as an iTVF, I'd be happy to test it using the million row test table that I posted.

    That's what makes Joe's contribution so attractive. All of the work is performed up front when you create the calendar table, or adjust the holidays. I can't see how any other solution is going to come close to two seeks.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Something about keeping the running 'work day' count in the table just feels wrong.  I know, get over it because it works and is faster.  So I decided to test out Jeff's code versus mine with the same 'TestData' table., Jeff's was faster.  I guess I would need to know what the purpose for this function would be to know which one I would implement.  Is this a function that is being called by 'millions of users' at one time?  Or is this part of some nightly batch processing to calculate the working days?  Or some thing else?  If it was a nightly batch I may go with mine. Needing instant feed back, then probably Jeff's.
    Here is the results of my testing.
    Using Jeff's code
    ========== DirectCode uses Calendar Table ===============================================
    Table 'Workfile'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestData'.Scan count 1, logical reads 3599, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Calendar'.Scan count 2, logical reads 64, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.

     SQL Server Execution Times:
       CPU time = 671 ms,  elapsed time = 7370 ms.

    My code
    Table 'Worktable'.Scan count 994398, logical reads 2105638, physical reads 0, read-ahead reads 0,lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestData'.Scan count 5, logical reads 3645, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table'WorkingDayTable'. Scan count 1, logical reads 22, physical reads 0, read-aheadreads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:
       CPU time = 14327 ms,  elapsed time = 38613 ms.Just looking at one record in the TestData table.
    Jeff's code
    ========== DirectCode uses Calendar Table ===============================================
    Table 'Calendar'.Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestData'.Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.

     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.

    My Code
    Table'WorkingDayTable'. Scan count 1, logical reads 22, physical reads 0, read-aheadreads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestData'.Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.

     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 6 ms.

    Dates used for the test above.
    beg_dt        end_dt        Span
    2017-10-2602:50:09.963        2017-12-0414:22:10.740        39

    My Code:
    DECLARE  @beg_dt       DATETIME
            ,@end_dt        DATETIME
            ,@BusinessDays  INT
    ;SET STATISTICSTIME,IO ON;
     SELECT @beg_dt        = td.beg_dt
            ,@end_dt        = td.end_dt
            ,@BusinessDays  =
                (SELECT SUM(WorkDay) AS NumberOfWorkingDays
    FROM dbo.WorkingDayTable
    WHERE DateField >= td.beg_dt
    AND DateField <= td.end_dt )
       FROM dbo.TestData td
       --WHERE td.beg_dt = '2017-10-2602:50:09.963'
       ;
        SET STATISTICS TIME,IO OFF
    ;

    I just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine.  Oh well, not going to retest it.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 - Thursday, October 26, 2017 3:25 PM

    I just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine.  Oh well, not going to retest it.

    It could also provide the wrong answer because the Holidays don't have times and could fall outside the range if beg_dt is a non-workday.

    Also, I'm getting 1.2 seconds on the return for my function code and 0.6 seconds on the direct code.  Are you returning to the screen or to variables to take the display time out of the picture?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • below86 - Thursday, October 26, 2017 3:25 PM

    I guess I would need to know what the purpose for this function would be to know which one I would implement.  Is this a function that is being called by 'millions of users' at one time?  Or is this part of some nightly batch processing to calculate the working days?  Or some thing else?  

    That's the kind of question that would be relevant when trade-offs are being made... For example, trying to determine weather or not the increased read performance of a new index is worth the cost of slower writes and larger backups.
    I don't see how that applies to user defined functions, because I don't see a trade-off... What is the down side of have faster, more efficient functions?

    But, to answer your question... Everything. This is only one of many functions and they are used almost everywhere. As a business we are the go-between for insurance companies and service providers. One of out primary "value-adds" is that our contracts with various vendors mandate various time tolerances for various services. As a result, EVERYTHING gets measured on a temporal scale.
    If you jump back to the 2nd page, I posted a screen shot of index usage stats for the main application database ordered by total user reads in descending order... https://www.sqlservercentral.com/Forums/FindPost1903411.aspx
    The 1st two tables are calendar tables...

  • Jeff Moden - Thursday, October 26, 2017 5:19 PM

    below86 - Thursday, October 26, 2017 3:25 PM

    I just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine.  Oh well, not going to retest it.

    It could also provide the wrong answer because the Holidays don't have times and could fall outside the range if beg_dt is a non-workday.

    Also, I'm getting 1.2 seconds on the return for my function code and 0.6 seconds on the direct code.  Are you returning to the screen or to variables to take the display time out of the picture?

    I ran it like yours, to variables to take the display out of it.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • Jason A. Long - Thursday, October 26, 2017 5:22 PM

    below86 - Thursday, October 26, 2017 3:25 PM

    I guess I would need to know what the purpose for this function would be to know which one I would implement.  Is this a function that is being called by 'millions of users' at one time?  Or is this part of some nightly batch processing to calculate the working days?  Or some thing else?  

    That's the kind of question that would be relevant when trade-offs are being made... For example, trying to determine weather or not the increased read performance of a new index is worth the cost of slower writes and larger backups.
    I don't see how that applies to user defined functions, because I don't see a trade-off... What is the down side of have faster, more efficient functions?

    But, to answer your question... Everything. This is only one of many functions and they are used almost everywhere. As a business we are the go-between for insurance companies and service providers. One of out primary "value-adds" is that our contracts with various vendors mandate various time tolerances for various services. As a result, EVERYTHING gets measured on a temporal scale.
    If you jump back to the 2nd page, I posted a screen shot of index usage stats for the main application database ordered by total user reads in descending order... https://www.sqlservercentral.com/Forums/FindPost1903411.aspx
    The 1st two tables are calendar tables...

    I'm all for having the code run as fast as possible.  Like I said it is just my hang up, not liking the column with the running count.  And what I was getting at was that the time saved on this function if it was running as part of a batch processing was not significant enough(IMHO) for me to want to use the calendar with the running count.  As someone who also works in the insurance industry I understand the need for calculating the 'working days'.  I guess in my case we don't have a need, at this time, to calculate this on the fly, it would be more of a nightly batch process.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • Okay... where to begin???

    First... I haven't been putting this off... the exact opposite actually. I've been sucked in... It the posting updates, getting a full nights sleep and asking for help, that have been putt off for too long... 

    Jeff,
    If the roles were reversed, and I was the one who stepped up and delivered a package of that caliber, to someone else... At the very least, I'd want SOME feedback and some test results. No excuses, I should have been back here with my preliminary results on Wednesday. Long story short, I saw something I've never seen before, and went down the rabbit hole after it... I can only hope that you can accept my apology, I really am sorry.

    An explanation is in order... 
    It's 4:30 AM so it won't a great explanation, but here goes...
    #1... The function you supplied is fast... It's really fast... When I put it against the last few that I'd been working with, it blew their doors off. anywhere from 0.9 - 1.3 secs vs 4.5 -5.5 secs... and that's the part that's been throwing me for a loop.
    I've seen my fair share of bad estimates. Sometimes it's takes longer than I'd like to figure it out, but I get there and usually learn something new along the way... 
    What sucked me in, wasn't the fact that, your function was crushing mine... It's the fact that it shouldn't have. 
    I can't tell you how much time I've killed staring at execution plan node values... Looking for bad estimates, mismatches between estimated and actual rows... any thing...
    And that's what's weird... the actuals and estimates were consistently a smack on match... 
    Let me start by saying that, I make no claims about having mastered the art of reading execution plans... but I've never seen the optimized make a 100% accurate prediction about row count and an be so off when it comes to duration.
    So, I figured it should be easy enough figure out a 3 node trivial plan... No clues... I wanted to write it off as yous going parallel and mine not... but nothing was adding up.
    Fast forward... alot... I've got    DBCC SETCPUWEIGHT(1000); &  DBCC SETIOWEIGHT(1); the damned thing are going parallel and I see "something"... 
    All of the actual numbers add up to roughly 1/4 second, until it hits the gather streams and then just stops hangs out, drinks a beer.
    So, where I'm at now... I started dumping sys.dm_os_wait_stats into a table with a time stamp and comparing the changes... 
    capture_dt    wait_type    c_waiting_tasks_count    c_wait_time_sec
    2017-10-29 00:04:47.4415917    CXPACKET    NULL    NULL
    2017-10-29 00:51:42.3217744    CXPACKET    58    116.832000
    2017-10-29 01:05:48.6912441    CXPACKET    55    122.180000
    2017-10-29 01:21:27.8460532    CXPACKET    55    116.422000
    2017-10-29 01:30:08.0868170    CXPACKET    54    117.491000

    So... 2 mins per test round of CXPACKET waits...
    The problem is, the more research I do, the more this sounds like a symptom, than the actual problem... Meaning I'm still in the dark about the actual problem.
    I'll provide more details, execution plans and test scripts tomorrow... errr... after I wake up later today...

Viewing 15 posts - 61 through 75 (of 92 total)

You must be logged in to reply to this topic. Login to reply