Calculating Work Days

  • Another thing that helped the performance of this table is having the id column as the integer representation of that particular day.

    Eg. With today being 11/04/2007 (dd/mm/yyyy)

    SELECT 'TodayAsInt' = convert(int, convert( float, getdate()))

    Result = 39181

    With this as your id column, it makes a good candidate as a clustered primary key index. I don't know why Microsoft did not include a date only datatype but hey, what can you do.

    I had to convert to float first to prevent any rounding.

    -- JP

  • Jeff,

    Have you (or anyone!) come across the issue of linking to a date table on date when the date table has dates with times 00:00:00 and the dates you are attempting to link to the table could contain any time (but you just want to join by date)?

  • Sure... it's a very common request... and most folks do it the wrong way, in my opinion... they'll write some nasty (or nice) index killing formula in the WHERE clause of their query and things pretty much go down hill from there...

    When I make a date table, I include, of course, a "TheDate" column, but I also include a column (calculated or real depending on the client's wishes) called "NextDay" and it's basis is simply TheDate+1.  Then, the queries get pretty simple...

      SELECT yada-yada

       FROM sometable s,

            datetable d

      WHERE s.somedatecol >= d.TheDate

        AND s.somedatecol <  d.NextDay

    Of course, even if you don't have a "NextDay" column, the calculations to do as you ask aren't real difficult... you just have to be careful not to put a formula on the s.somedatecol or you stand a real good chance of not being able to use an index.  Something like the following usually works pretty good...

     SELECT yada-yada

       FROM sometable s,

            datetable d

      WHERE s.somedatecol >= d.TheDate

        AND s.somedatecol <  d.TheDate+1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheDate column would also make a good clustered key candidate...  see my last post immediately above for how to get around the "joining with time" problem... it's a very common and effective solution because dates with no times actually have a time of midnight and are represented by underlying numeric values having no decimal places.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff... I'd never considered that approach.  So simple it's brilliant.

  • Thanks, Paul... I appreciate the feedback.

    Rambling on a bit... If you consider that dates are nothing but underlying numbers that represent the number of whole and fractional days since midnight, 01 Jan 1900, dates get real easy for most things if you also know that calculations on column data in the WHERE clause are usually a bad thing.  For example, if most folks where given the following data...

    1.000

    1.250

    1.357

    2.000

    2.125

    2.625

    3.179

    3.875

    ....

    ... and the question was how to return all the rows that, when stripped of the decimal places, have a value of 2, some folks would do rounding, casting to INT, etc, etc... The solution is a lot simpler than that...

      SELECT *

       FROM thedata

      WHERE thevalue >= 2 --The desired integer value

        AND thevalue <  2+1 --The next highest integer value (next highest date)

    What's intriguing, is that some folks will very quickly solve the problem, as above, when they are working with numbers... and then lose their minds when it comes to dates.  Dates are just numbers underneath and, in most cases, just don't need to be treated any differently than numbers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I found in a situation where you want to join by date only, convert your two date columns to its integer representation in the JOIN or WHERE clause when comparing the two. I found this to be very fast in the past.

    This will give you your date as an integer - convert(int, convert( float, getdate()))

    -- JP

  • That's exactly what I was talking about... if you "convert the two columns", you destroy most opportunities to use an index if one is available.  The method I displayed allows indexes to come into play.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • How are the chances of using indexes diminished? I ran this script to see how conversions deal with clustered indexes. For this test it still used the clustered indexes in the execution plan.

    -- Dates 1

    CREATE TABLE dbo.DatesToCheck1

    (Today datetime,

    NextDay datetime

    )

    CREATE CLUSTERED INDEX idxDatesToCheck1_Date1

    ON dbo.DatesToCheck1(Today)

    INSERT INTO dbo.DatesToCheck1 (Today, NextDay)

    VALUES (getdate(), dateadd(day, 1, getdate()))

    -- Dates 2

    CREATE TABLE dbo.DatesToCheck2

    (Today datetime,

    NextDay datetime

    )

    CREATE CLUSTERED INDEX idxDatesToCheck2_Date1

    ON dbo.DatesToCheck2(Today)

    INSERT INTO dbo.DatesToCheck2 (Today, NextDay)

    VALUES (dateadd(hour, 5, getdate()), dateadd(day, 1, dateadd(hour, 5, getdate())))

    -- join the tables on their clusterd indexes

    SELECT *

    FROM dbo.DatesToCheck1 AS DatesToCheck1

    JOIN dbo.DatesToCheck2 AS DatesToCheck2

    ON convert(int, convert(float, DatesToCheck1.Today)) = convert(int, convert(float, DatesToCheck2.Today))

    -- Drop the tables

    DROP TABLE dbo.DatesToCheck1

    DROP TABLE dbo.DatesToCheck2

    -- JP

  • True... it appears to still have used an index... but notice it did NOT use an INDEX SEEK anywhere... instead, it used an INDEX SCAN which is quite slow compared to an INDEX SEEK.  And, that's what I'm talking about...

    I modified your example a bit... I added the extra column I was talking about (TodayNoTime) and I put a non-clustered index on it leaving your original indexes alone.  Run it and then do an execution plan on the two SELECTs (at the same time) at the bottom of the code.... the first one is your code and the second is mine... notice that mine uses an INDEX SEEK on one of the legs and that, even with the bookmark lookup, it should be faster (according to the Queary Cost Relative to Batch).

    --DROP TABLE DatesToCheck1,DatesToCheck2

    -- Dates 1

    CREATE TABLE dbo.DatesToCheck1

    (Today datetime,

    NextDay datetime,

    TodayNoTime DATETIME

    )

    CREATE CLUSTERED INDEX idxDatesToCheck1_Date1

    ON dbo.DatesToCheck1(Today)

    CREATE INDEX idxDatesToCheck1_Date2

    ON dbo.DatesToCheck1(TodayNoTime)

    INSERT INTO dbo.DatesToCheck1 (Today, NextDay,TodayNoTime)

    VALUES (getdate(), dateadd(day, 1, getdate()), DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0))

    -- Dates 2

    CREATE TABLE dbo.DatesToCheck2

    (Today datetime,

    NextDay datetime,

    TodayNoTime DATETIME

    )

    CREATE CLUSTERED INDEX idxDatesToCheck2_Date1

    ON dbo.DatesToCheck2(Today)

    CREATE INDEX idxDatesToCheck2_Date2

    ON dbo.DatesToCheck2(TodayNoTime)

    INSERT INTO dbo.DatesToCheck2 (Today, NextDay,TodayNoTime)

    VALUES (getdate(), dateadd(day, 1, getdate()), DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0))

    -- join the tables on their clusterd indexes

    SELECT *

    FROM dbo.DatesToCheck1 AS DatesToCheck1

    JOIN dbo.DatesToCheck2 AS DatesToCheck2

    ON convert(int, convert(float, DatesToCheck1.Today)) = convert(int, convert(float, DatesToCheck2.Today))

    SELECT *

    FROM dbo.DatesToCheck1 AS DatesToCheck1

    JOIN dbo.DatesToCheck2 AS DatesToCheck2

    ON DatesToCheck1.TodayNoTime = DatesToCheck2.TodayNoTime

    So, let me rephrase for those semantisists in the crowd... doing a calculation on a join column prevents the PROPER use of an index for the highest performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Can't argue with the facts

    -- JP

  • A small problem with the function is that it depends on the language setting of the users session.  The change below will allow it to check if StartDate is Sunday and EndDate is Saturday, no matter what the setting of DATEFIRST or LANGUAGE.

    RETURN (
                    SELECT
                  --Start with total number of days including weekends
                    (DATEDIFF(dd,@StartDate,@EndDate)+1)
                  --Subtact 2 days for each full weekend
                   -(DATEDIFF(wk,@StartDate,@EndDate)*2)
                  --If StartDate is a Sunday, Subtract 1
                   -(CASE WHEN DATEDIFF(DD,-53690,@StartDate)%7 = 6
                          THEN 1
                          ELSE 0
                      END)
                  --If EndDate is a Saturday, Subtract 1
                   -(CASE WHEN DATEDIFF(DD,-53690,@EndDate)%7 = 5
                          THEN 1
                          ELSE 0
                      END)
                    )
    
     
  • That's cool... Thanks Micheal...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've read through many of the posts in this discussion and find myself sorely out of my league.  I came across the discussion because I need to figure out a way to calculate only work days in a query that counts the number of days between the receipt of a new effort and the 1st response to that effort.  I'm using the following Flag in my query, which works well but counts weekends and holidays.

    NewTurnAroundFlag: DateDiff("y",[Date Received],[1st Response])

    It is clear to me, from the many posts I read, that I have a lot to learn about database programming; but I fear the learning process will be slow.  The comments about using SELECT statements, creating Calendar tables, etc. go way over my head, as I do not know how to get these into my query.

    Do you have any suggestions for a simpleton like me?

  • A "simpleton" would not come to the conclusions you have nor ask the intelligent question you have.

    Try something like this to eliminate weekends...

    SELECT   (DATEDIFF(dd, [Date Received], [1st Response]) + 1)  -(DATEDIFF(wk, [Date Received], [1st Response]) * 2)  -(CASE WHEN DATENAME(dw, [Date Received]) = 'Sunday' THEN 1 ELSE 0 END)  -(CASE WHEN DATENAME(dw, [1st Response]) = 'Saturday' THEN 1 ELSE 0 END)
      AS NewTurnAroundFlag,
      othercolumnsfromyourtable
    
      FROM yourtable

    In order to eliminate holidays, you would need to build a table with Holidays in it.  Post the schema for your holiday table and we'll take it to the next step with you...  the holiday table really only needs to be a list of dates that you consider to be holidays.  Might want to include descriptions of the holidays in the table, but we won't be using those to solve this...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 156 total)

You must be logged in to reply to this topic. Login to reply