Averages over 28 days instead of a month

  • Hi all,

    I have a requirement to calculate an average attendance figure based on 28 day periods instead of a month and I'm struggling to think of a way to achieve this.

    Here is some sample data, I've only included a really small set:

    IF OBJECT_ID('tempdb..#SignIn') IS NOT NULL

    DROP TABLE #SignIn;

    CREATE TABLE #SignIn(

    PersonId INT NOT NULL

    ,SignInDate DATE NOT NULL);

    INSERT INTO #SignIn(PersonId, SignInDate)

    SELECT

    PersonId

    ,SignInDate

    FROM (VALUES(1, '2015-01-01')

    ,(1, '2015-01-02')

    ,(1, '2015-01-04')

    ,(1, '2015-01-07')

    ,(1, '2015-01-08')

    ,(1, '2015-01-10')

    ,(1, '2015-01-12')

    ,(1, '2015-01-15')

    ,(1, '2015-01-19')

    ,(1, '2015-01-22')

    ,(1, '2015-01-26')

    ,(1, '2015-02-02')

    ,(1, '2015-02-04')

    ,(1, '2015-02-07')

    ,(1, '2015-02-08')

    ,(1, '2015-02-10')

    ,(1, '2015-02-15')

    ,(1, '2015-02-19')

    ,(1, '2015-02-26')

    ,(1, '2015-03-01')

    ,(1, '2015-03-03')

    ,(1, '2015-03-05')

    ,(1, '2015-03-08')

    ,(1, '2015-03-11')

    ,(1, '2015-03-13')

    ,(1, '2015-03-16')

    ,(1, '2015-03-18')

    ,(1, '2015-03-20')

    ,(1, '2015-03-21')

    ,(1, '2015-03-26')

    ,(1, '2015-03-27')

    ,(2, '2015-10-02')

    ,(2, '2015-10-04')

    ,(2, '2015-10-07')

    ,(2, '2015-10-08')

    ,(2, '2015-10-10')

    ,(2, '2015-10-12')

    ,(2, '2015-10-15')

    ,(2, '2015-10-19')

    ,(2, '2015-10-22')

    ,(2, '2015-10-26')

    ,(2, '2015-11-02')

    ,(2, '2015-11-04')

    ,(2, '2015-11-07')

    ,(2, '2015-11-08')

    ,(2, '2015-11-10')

    ,(2, '2015-11-15')

    ,(2, '2015-11-19')

    ,(2, '2015-11-26')

    ,(2, '2015-12-01')

    ,(2, '2015-12-03')

    ,(2, '2015-12-05')

    ,(2, '2015-12-08')

    ,(2, '2015-12-11')

    ,(2, '2015-12-13')

    ,(2, '2015-12-16')

    ,(2, '2015-12-18')

    ,(2, '2015-12-20')

    ,(2, '2015-12-21')

    ,(2, '2015-12-27')

    ,(2, '2015-12-28')

    ,(2, '2016-01-02')

    ,(2, '2016-01-03')

    ) V(PersonId, SignInDate);

    Now usually when calculating this sort of average based on months I would use a query similar to the one shown below (I would normally use a calendar table and group on YYYYMM as opposed to MONTH):

    WITH Counts

    AS (

    SELECT

    PersonId

    ,MONTH(SignInDate) AS MonthSignInDate

    ,COUNT(*) AS CountSignIns

    FROM #SignIn

    GROUP BY

    PersonId

    ,MONTH(SignInDate)

    )

    SELECT

    PersonId

    ,AVG(CountSignIns) AS AvgSignIns

    FROM Counts

    GROUP BY

    PersonId

    ORDER BY

    PersonId

    For each person the first 28 day window will start from their first SignIn date and continue from there. Ideally only completed (full 28 days) will be included.

    I'm struggling to think of a way to calculate the 28 day windows and assign something to group on.

    Thanks in advance for your suggestions. If you need anything clarifying please let me know.

  • when does each 28-day period start?

    you could calculate from the first of the year using DATEDIFF and then integer divide by 28 to get the 4-week period, and then group. Lynn Pettis has a nice article somewhere on date functions in SQL Server that would probably help.

  • pietlinden (1/3/2016)


    when does each 28-day period start?

    you could calculate from the first of the year using DATEDIFF and then integer divide by 28 to get the 4-week period, and then group. Lynn Pettis has a nice article somewhere on date functions in SQL Server that would probably help.

    Lynn's article is at http://www.sqlservercentral.com/blogs/lynnpettis/2009/03/25/some-common-date-routines/.

    The question about the starting date is right on. You could define a new 28-day window starting with every day if you really want to, but I don't think that's what you're after. If you want it based on sequential, 4-week blocks that'll give you 13 blocks per year, but none of them will be based on a calendar month. The definition of the date window you want to use is key in determining how to write the query.

  • As an alternative to the already mentioned ideas, how about adding a Calendar table to your database?

    A Calendar table is a table that holds all days for a period that's way longer than you'll ever need, with a Date column as the Primary Key, and a bunch of other columns holding all the information relevant for your company - like weekday in your own language, whether it's a workday or a holiday, fiscal year, and in your case which 28-day period you're in.

    In your query you can then join the table to the Calendar table based on date (careful if your underlying data is datetime, need to convert to date first, or use a BETWEEN in the join criteria) and then group on 28-day period.

    The Calendar table is pretty generic and can have lots of other uses. If you would need it for just this query, you can alternatively make a table with one row per 28-day period, with period number as the primary key and two columns StartDate and EndDate (where I suggest using a half-open interval, so end date would actually be the first date of the next period, and all time stamps up to microseconds before EndDate are in the current period). In this case, join your table to the Periods table based on YourTable.DateTime >= Periods.StartDate AND YourTable.DateTime < Periods.EndDate, then group by period number.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Good job with the DDL and the sample data Raastarr, made this very quick and straight forward question to answer;-)

    😎

    There are few ways of doing this, here is the first one I thought of.

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID('tempdb..#SignIn') IS NOT NULL

    DROP TABLE #SignIn;

    CREATE TABLE #SignIn(

    PersonId INT NOT NULL

    ,SignInDate DATE NOT NULL);

    INSERT INTO #SignIn(PersonId, SignInDate)

    SELECT

    PersonId

    ,SignInDate

    FROM (VALUES(1, '2015-01-01')

    ,(1, '2015-01-02')

    ,(1, '2015-01-04')

    ,(1, '2015-01-07')

    ,(1, '2015-01-08')

    ,(1, '2015-01-10')

    ,(1, '2015-01-12')

    ,(1, '2015-01-15')

    ,(1, '2015-01-19')

    ,(1, '2015-01-22')

    ,(1, '2015-01-26')

    ,(1, '2015-02-02')

    ,(1, '2015-02-04')

    ,(1, '2015-02-07')

    ,(1, '2015-02-08')

    ,(1, '2015-02-10')

    ,(1, '2015-02-15')

    ,(1, '2015-02-19')

    ,(1, '2015-02-26')

    ,(1, '2015-03-01')

    ,(1, '2015-03-03')

    ,(1, '2015-03-05')

    ,(1, '2015-03-08')

    ,(1, '2015-03-11')

    ,(1, '2015-03-13')

    ,(1, '2015-03-16')

    ,(1, '2015-03-18')

    ,(1, '2015-03-20')

    ,(1, '2015-03-21')

    ,(1, '2015-03-26')

    ,(1, '2015-03-27')

    ,(2, '2015-10-02')

    ,(2, '2015-10-04')

    ,(2, '2015-10-07')

    ,(2, '2015-10-08')

    ,(2, '2015-10-10')

    ,(2, '2015-10-12')

    ,(2, '2015-10-15')

    ,(2, '2015-10-19')

    ,(2, '2015-10-22')

    ,(2, '2015-10-26')

    ,(2, '2015-11-02')

    ,(2, '2015-11-04')

    ,(2, '2015-11-07')

    ,(2, '2015-11-08')

    ,(2, '2015-11-10')

    ,(2, '2015-11-15')

    ,(2, '2015-11-19')

    ,(2, '2015-11-26')

    ,(2, '2015-12-01')

    ,(2, '2015-12-03')

    ,(2, '2015-12-05')

    ,(2, '2015-12-08')

    ,(2, '2015-12-11')

    ,(2, '2015-12-13')

    ,(2, '2015-12-16')

    ,(2, '2015-12-18')

    ,(2, '2015-12-20')

    ,(2, '2015-12-21')

    ,(2, '2015-12-27')

    ,(2, '2015-12-28')

    ,(2, '2016-01-02')

    ,(2, '2016-01-03')

    ) V(PersonId, SignInDate);

    /* GATHER THE NUMBER OF DAYS FOR EACH PERSON */

    ;WITH DAY_COUNTS AS

    (

    SELECT

    SI.PersonId

    ,MIN(SI.SignInDate) AS FIRST_DATE

    ,MAX(SI.SignInDate) AS LAST_DATE

    ,DATEDIFF(DAY,MIN(SI.SignInDate),MAX(SI.SignInDate)) AS NUM_DAYS

    FROM #SignIn SI

    GROUP BY SI.PersonId

    )

    /* SEED FOR THE TALLY TABLE TO CREATE THE CALENDAR */

    ,T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))

    /* CALENDAR TABLE BASE */

    ,CALENDAR_BASE AS

    (

    SELECT

    DC.PersonId

    ,DATEADD(DAY,NM.N,DC.FIRST_DATE) AS DATE_VALUE

    ,DC.FIRST_DATE

    ,DC.LAST_DATE

    ,((NM.N -1) / 28) + 1 AS PERIOD

    ,NM.N

    FROM DAY_COUNTS DC

    CROSS APPLY (SELECT TOP((DC.NUM_DAYS)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4) AS NM(N)

    )

    /* CALENDAR TABLE WITH 28 DAYS PERIODS */

    ,INLINE_CALENDAR AS

    (

    SELECT

    CAL.PersonId

    ,CAL.DATE_VALUE

    ,CAL.PERIOD

    ,DATEDIFF(DAY,FIRST_VALUE(CAL.DATE_VALUE) OVER

    (

    PARTITION BY CAL.PersonId

    ,CAL.PERIOD

    ORDER BY CAL.DATE_VALUE

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    )

    ,LAST_VALUE(CAL.DATE_VALUE) OVER

    (

    PARTITION BY CAL.PersonId

    ,CAL.PERIOD

    ORDER BY CAL.DATE_VALUE

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    )) + 1 AS DAY_IN_PERIOD

    FROM CALENDAR_BASE CAL

    )

    /* SELECT ONLY THE COMPLETED PERIODS */

    SELECT

    SI.PersonId

    ,IC.PERIOD

    ,COUNT(SI.PersonId) AS COUNT_SIGN_IN

    FROM #SignIn SI

    INNER JOIN INLINE_CALENDAR IC

    ON SI.PersonId = IC.PersonId

    AND SI.SignInDate = IC.DATE_VALUE

    WHERE IC.DAY_IN_PERIOD = 28

    GROUP BY SI.PersonId

    ,IC.PERIOD;

    Results

    PersonId PERIOD COUNT_SIGN_IN

    ----------- -------------------- -------------

    1 1 10

    1 2 8

    1 3 11

    2 1 9

    2 2 8

    2 3 10

  • Now would be a good time to shift the train of thought to ISO Weeks, which always start the year in the week that Jan 4th occurs. Since 2012 has a DATEPART for ISOWEEK, an (ISOWeek-1)/4 would provide a fairly easy and predictable grouping that wouldn't actually require the use of a calendar table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/3/2016)


    Now would be a good time to shift the train of thought to ISO Weeks, which always start the year in the week that Jan 4th occurs. Since 2012 has a DATEPART for ISOWEEK, an (ISOWeek-1)/4 would provide a fairly easy and predictable grouping that wouldn't actually require the use of a calendar table.

    The ISO_WEEK method could be along these lines

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID('tempdb..#SignIn') IS NOT NULL

    DROP TABLE #SignIn;

    CREATE TABLE #SignIn(

    SI_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED

    ,PersonId INT NOT NULL

    ,SignInDate DATE NOT NULL);

    CREATE NONCLUSTERED INDEX NCLIDX_TMP_SIGNIN_PERSON_SIGNINDATE ON #SignIn (PersonId ASC, SignInDate ASC);

    INSERT INTO #SignIn(PersonId, SignInDate)

    SELECT

    PersonId

    ,SignInDate

    FROM (VALUES(1, '2015-01-01')

    ,(1, '2015-01-02')

    ,(1, '2015-01-04')

    ,(1, '2015-01-07')

    ,(1, '2015-01-08')

    ,(1, '2015-01-10')

    ,(1, '2015-01-12')

    ,(1, '2015-01-15')

    ,(1, '2015-01-19')

    ,(1, '2015-01-22')

    ,(1, '2015-01-26')

    ,(1, '2015-02-02')

    ,(1, '2015-02-04')

    ,(1, '2015-02-07')

    ,(1, '2015-02-08')

    ,(1, '2015-02-10')

    ,(1, '2015-02-15')

    ,(1, '2015-02-19')

    ,(1, '2015-02-26')

    ,(1, '2015-03-01')

    ,(1, '2015-03-03')

    ,(1, '2015-03-05')

    ,(1, '2015-03-08')

    ,(1, '2015-03-11')

    ,(1, '2015-03-13')

    ,(1, '2015-03-16')

    ,(1, '2015-03-18')

    ,(1, '2015-03-20')

    ,(1, '2015-03-21')

    ,(1, '2015-03-26')

    ,(1, '2015-03-27')

    ,(2, '2015-10-02')

    ,(2, '2015-10-04')

    ,(2, '2015-10-07')

    ,(2, '2015-10-08')

    ,(2, '2015-10-10')

    ,(2, '2015-10-12')

    ,(2, '2015-10-15')

    ,(2, '2015-10-19')

    ,(2, '2015-10-22')

    ,(2, '2015-10-26')

    ,(2, '2015-11-02')

    ,(2, '2015-11-04')

    ,(2, '2015-11-07')

    ,(2, '2015-11-08')

    ,(2, '2015-11-10')

    ,(2, '2015-11-15')

    ,(2, '2015-11-19')

    ,(2, '2015-11-26')

    ,(2, '2015-12-01')

    ,(2, '2015-12-03')

    ,(2, '2015-12-05')

    ,(2, '2015-12-08')

    ,(2, '2015-12-11')

    ,(2, '2015-12-13')

    ,(2, '2015-12-16')

    ,(2, '2015-12-18')

    ,(2, '2015-12-20')

    ,(2, '2015-12-21')

    ,(2, '2015-12-27')

    ,(2, '2015-12-28')

    ,(2, '2016-01-02')

    ,(2, '2016-01-03')

    ) V(PersonId, SignInDate);

    /* MARK THE FIRST ENTRY AND COUNT FROM THERE */

    ;WITH BASE_DATA AS

    (

    SELECT

    SI.PersonId

    ,YEAR(SI.SignInDate) AS SI_YEAR

    ,(1 + DATEPART(ISO_WEEK,SI.SignInDate) -

    MIN(DATEPART(ISO_WEEK,SI.SignInDate)) OVER

    (

    PARTITION BY SI.PersonId

    )) / 4.0 AS GROUP_ISOWEEK

    ,(SI.SignInDate)

    FROM #SignIn SI

    )

    /* ADD GROUP IDENTIFIER AND LAST ENTRY IN EACH GROUP */

    ,GROUPED_AND_MAX_ENTRIES AS

    (

    SELECT

    BD.PersonId

    ,BD.SI_YEAR

    ,CEILING(BD.GROUP_ISOWEEK) AS GROUP_ISOWEEK

    ,MAX(BD.GROUP_ISOWEEK) OVER

    (

    PARTITION BY BD.PersonId

    ,BD.SI_YEAR

    ,CEILING(BD.GROUP_ISOWEEK)

    ) AS MAX_ISO_WEEK_IN_GROUP

    ,BD.SignInDate

    FROM BASE_DATA BD

    )

    /* EXCLUDE INCOMPLETE GROUPS */

    SELECT

    GME.PersonId

    ,GME.SI_YEAR

    ,GME.GROUP_ISOWEEK

    ,COUNT(GME.SignInDate) AS SIGN_IN_COUNT

    FROM GROUPED_AND_MAX_ENTRIES GME

    WHERE GME.MAX_ISO_WEEK_IN_GROUP <= FLOOR(GME.MAX_ISO_WEEK_IN_GROUP)

    GROUP BY GME.PersonId

    ,GME.SI_YEAR

    ,GME.GROUP_ISOWEEK

    ;

    Results

    PersonId SI_YEAR GROUP_ISOWEEK SIGN_IN_COUNT

    ----------- ----------- --------------- -------------

    1 2015 1 10

    1 2015 2 8

    1 2015 3 11

    2 2015 1 9

    2 2015 2 8

    2 2015 3 10

  • Thanks for all of the responses.

    I've gone with the first solution from Eirikur as it did exactly what was required :-).

  • Raastarr (1/4/2016)


    Thanks for all of the responses.

    I've gone with the first solution from Eirikur as it did exactly what was required :-).

    You are very welcome and thanks for the feedback.

    😎

  • Hugo Kornelis (1/3/2016)


    As an alternative to the already mentioned ideas, how about adding a Calendar table to your database?

    A Calendar table is a table that holds all days for a period that's way longer than you'll ever need, with a Date column as the Primary Key, and a bunch of other columns holding all the information relevant for your company - like weekday in your own language, whether it's a workday or a holiday, fiscal year, and in your case which 28-day period you're in.

    In your query you can then join the table to the Calendar table based on date (careful if your underlying data is datetime, need to convert to date first, or use a BETWEEN in the join criteria) and then group on 28-day period.

    The Calendar table is pretty generic and can have lots of other uses. If you would need it for just this query, you can alternatively make a table with one row per 28-day period, with period number as the primary key and two columns StartDate and EndDate (where I suggest using a half-open interval, so end date would actually be the first date of the next period, and all time stamps up to microseconds before EndDate are in the current period). In this case, join your table to the Periods table based on YourTable.DateTime >= Periods.StartDate AND YourTable.DateTime < Periods.EndDate, then group by period number.

    +1

    Don Simpson



    I'm not sure about Heisenberg.

  • DonlSimpson (1/5/2016)


    Hugo Kornelis (1/3/2016)


    As an alternative to the already mentioned ideas, how about adding a Calendar table to your database?

    A Calendar table is a table that holds all days for a period that's way longer than you'll ever need, with a Date column as the Primary Key, and a bunch of other columns holding all the information relevant for your company - like weekday in your own language, whether it's a workday or a holiday, fiscal year, and in your case which 28-day period you're in.

    In your query you can then join the table to the Calendar table based on date (careful if your underlying data is datetime, need to convert to date first, or use a BETWEEN in the join criteria) and then group on 28-day period.

    The Calendar table is pretty generic and can have lots of other uses. If you would need it for just this query, you can alternatively make a table with one row per 28-day period, with period number as the primary key and two columns StartDate and EndDate (where I suggest using a half-open interval, so end date would actually be the first date of the next period, and all time stamps up to microseconds before EndDate are in the current period). In this case, join your table to the Periods table based on YourTable.DateTime >= Periods.StartDate AND YourTable.DateTime < Periods.EndDate, then group by period number.

    +1

    I know that everyone thinks that a Calendar table is a good idea but, just like making the choice between using cascading CTEs (cCTEs, not to be confused with Recursive CTEs, which suck) to build a sequence of numbers or using a Tally you really need to make sure. In the case of Tally Tables, they usually edge out the cCTEs for performance but come with a price tag in the form of Logical Reads. Even though those are VERY fast (it's memory I/O), Logical Reads still aren't close to being free.

    The same holds true with a Calendar Table but, depending on how wide the table is and how properly indexed it is for the related queries, the number of Logical Reads can really sky rocket. And, unlike a Tally Table or cCTE that does the same, I've found that a whole lot of people use a Calendar table as a crutch for anything and everything of a temporal nature. For example, people lookup things like the name of a month or day for reporting purposes instead of using the simple, very fast, very efficient, very low resource usage DATENAME function. Seriously? :blink: I can certainly see having a very narrow Calendar table to help you find things like 5 business days ahead or behind of a date for SLAs/etc, with respect to weekends and holidays, but not things like looking up even 1 date to return the character based name of a Month or a Day. That's a bit like writing an SQLCLR to return a remainder for a modulus (True story, there. Actually had someone try to get that by me).

    For this 28 day thing? A Calendar table might actually slow things down.

    So, yes, "+1" to Calendar tables but include the advice that they should only be created for certain things and they should never be used as a substitute for knowledge of T-SQL and the available intrinsic functionality. And, for goodness sake, NEVER EVER build a surrogate key in the form of an IDENTITY or SEQENCE column in a Calendar table. Some idiot might actually FK to it from another table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Raastarr (1/4/2016)


    Thanks for all of the responses.

    I've gone with the first solution from Eirikur as [font="Arial Black"]it did exactly what was required [/font]:-).

    Careful now... Does it? With some serious respect for a usually faultless colleague, I'm thinking it doesn't.

    Let's see some proof... How many entries does your test data contain for persons 1 and 2? The answer is easy to find out.

    --===== Find the number of unique date entries for each PersonID

    SELECT PersonID, UniqueSignInDates = COUNT(DISTINCT SignInDate)

    FROM #SignIn

    GROUP BY PersonID

    ORDER BY PersonID

    ;

    GO

    Here are the count results from that.

    PersonID UniqueSignInDates

    ----------- -----------------

    1 31

    2 32

    Let's see the previous results from the code you've adopted along with a couple of observational notes...

    PersonId SI_YEAR GROUP_ISOWEEK SIGN_IN_COUNT

    ----------- ----------- --------------- -------------

    1 2015 1 10 [font="Arial Black"]--Note: 10+8+11 is only 29, not 31[/font]

    1 2015 2 8

    1 2015 3 11

    2 2015 1 9 [font="Arial Black"]--Note: 9+8+10 is only 27, not 32[/font]

    2 2015 2 8

    2 2015 3 10

    Like I said, unless I just don't understand what the heck a "period" actually is, there seems to be a couple of pieces of the pie missing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/5/2016)


    Raastarr (1/4/2016)


    Thanks for all of the responses.

    I've gone with the first solution from Eirikur as [font="Arial Black"]it did exactly what was required [/font]:-).

    Careful now... Does it? With some serious respect for a usually faultless colleague, I'm thinking it doesn't.

    Let's see some proof... How many entries does your test data contain for persons 1 and 2? The answer is easy to find out.

    --===== Find the number of unique date entries for each PersonID

    SELECT PersonID, UniqueSignInDates = COUNT(DISTINCT SignInDate)

    FROM #SignIn

    GROUP BY PersonID

    ORDER BY PersonID

    ;

    GO

    Here are the count results from that.

    PersonID UniqueSignInDates

    ----------- -----------------

    1 31

    2 32

    Let's see the previous results from the code you've adopted along with a couple of observational notes...

    PersonId SI_YEAR GROUP_ISOWEEK SIGN_IN_COUNT

    ----------- ----------- --------------- -------------

    1 2015 1 10 [font="Arial Black"]--Note: 10+8+11 is only 29, not 31[/font]

    1 2015 2 8

    1 2015 3 11

    2 2015 1 9 [font="Arial Black"]--Note: 9+8+10 is only 27, not 32[/font]

    2 2015 2 8

    2 2015 3 10

    Like I said, unless I just don't understand what the heck a "period" actually is, there seems to be a couple of pieces of the pie missing.

    The 28 day periods will be different for each person as the first period will begin from their first SignIn record. So for person 1 that is 2015-01-01, after 28 days the next period begins. However for person 2 their first record was 2015-10-02 so that is when their first period starts. Only records that fall within a completed 28 day period should be included which is why the counts are slightly less as some records don't qualify.

    Hopefully that is a little clearer?

    I have just thought of a potential problem though. It is quite possible for someone not to sign in for months (possibly even years) at a time which at the moment would be classed as a completed period (full 28 days) with a sign in count of 0. This will skew the average, would using something like NULLIF(SignInCount, 0) be recommended here or would I be better off trying to amend the inline calendar table here?

  • Raastarr (1/6/2016)


    The 28 day periods will be different for each person as the first period will begin from their first SignIn record. So for person 1 that is 2015-01-01, after 28 days the next period begins. However for person 2 their first record was 2015-10-02 so that is when their first period starts. Only records that fall within a completed 28 day period should be included which is why the counts are slightly less as some records don't qualify.

    That is a great example of the relevance of giving all information up front. A lot of people now spent a lot of time looking for solutions based on the idea that your company uses fixed 28-day periods (it may be rare, but there are still a few companies in my country that pay wages every 4 weeks instead of every month - that was what I was thinking about when I read this thread).

    If the periods are different for everyone, then using a calendar table will not help you at all. That technique is mostly useful for storing fixed date-related information.

    To divide time in 28-day periods since the start date of a person, use DATEDIFF(day, StartDate, CurrentDate) and divide by 28 using integer division. That gives you a number that changes once per 28 days. Depending on the exact definition of the 28-day periods, you may or may not have to subtract one day from the DATEDIFF result before doing the division.

    I have just thought of a potential problem though. It is quite possible for someone not to sign in for months (possibly even years) at a time which at the moment would be classed as a completed period (full 28 days) with a sign in count of 0. This will skew the average, would using something like NULLIF(SignInCount, 0) be recommended here or would I be better off trying to amend the inline calendar table here?

    And that changes the probel yet again. As mentioned before, a calendar table is not useful in this case. And NULLIF (or the prefered alternative COALESCE) will not help either, this replaces a NULL value in a row but does not create rows, and your problem is that if someone has not signed in during an entire period, there will be no rows for that period.

    The technique to use here is to generate on the spot a virtual table with all the relevant periods. In this case you will need to use a numbers table (that's even simpler than a calendar table - it's a table with just a single column, integer and primary key, that holds all numbers from some starting point (I usually start at zero) up to a very high number (a million works for almost all cases). [Some people prefer to not have a stored numbers table but use a CTE to generate one on the spot; I only resort to that technique if I cannot add tables to the database, because it makes the queries a lot clunkier]). Here is how you can generate all periods you need for all people:

    FROM dbo.Persons AS p

    CROSS APPLY (SELECT DATEADD(day, n.N * 28, p.StartDate) AS PeriodStart, DATEADD(day, (n.N + 1) * 28, p.StartDate) AS PeriodEnd

    FROM dbo.Numbers AS n

    WHERE n.N >= 0

    AND n.N <= DATEDIFF(day, p.StartDate, <<highest possible date>>) / 28) AS periods

    Using this as a starting point, you can then join this to the table with signon data; in the join condition you want to include SignOnDate >= periods.PeriodStart AND SignOnDate < periods.PeriodEnd

    Make sure to use an outer join, not an inner join, in order to not lose the periods without signons.

    To top it off, use GROUP BY <<unique columns for person>>, periods.PeriodStart, periods,PeriodEnd. In the SELECT list, include those same columns, plus COUNT(SignOnDate) to count the total number of signons in that period. (Do not use COUNT(*) - that counts rows and a period without signon is still a row - however, the row has a NULL SignOnDate, so COUNT(SignOnDate) will not count it).

    EDIT: Somehow the forum software messed up my smaller than and larger than symbols, I hope I have them all repaired now


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Raastarr (1/6/2016)


    Jeff Moden (1/5/2016)


    Raastarr (1/4/2016)


    Thanks for all of the responses.

    I've gone with the first solution from Eirikur as [font="Arial Black"]it did exactly what was required [/font]:-).

    Careful now... Does it? With some serious respect for a usually faultless colleague, I'm thinking it doesn't.

    Let's see some proof... How many entries does your test data contain for persons 1 and 2? The answer is easy to find out.

    --===== Find the number of unique date entries for each PersonID

    SELECT PersonID, UniqueSignInDates = COUNT(DISTINCT SignInDate)

    FROM #SignIn

    GROUP BY PersonID

    ORDER BY PersonID

    ;

    GO

    Here are the count results from that.

    PersonID UniqueSignInDates

    ----------- -----------------

    1 31

    2 32

    Let's see the previous results from the code you've adopted along with a couple of observational notes...

    PersonId SI_YEAR GROUP_ISOWEEK SIGN_IN_COUNT

    ----------- ----------- --------------- -------------

    1 2015 1 10 [font="Arial Black"]--Note: 10+8+11 is only 29, not 31[/font]

    1 2015 2 8

    1 2015 3 11

    2 2015 1 9 [font="Arial Black"]--Note: 9+8+10 is only 27, not 32[/font]

    2 2015 2 8

    2 2015 3 10

    Like I said, unless I just don't understand what the heck a "period" actually is, there seems to be a couple of pieces of the pie missing.

    The 28 day periods will be different for each person as the first period will begin from their first SignIn record. So for person 1 that is 2015-01-01, after 28 days the next period begins. However for person 2 their first record was 2015-10-02 so that is when their first period starts. Only records that fall within a completed 28 day period should be included which is why the counts are slightly less as some records don't qualify.

    Hopefully that is a little clearer?

    I have just thought of a potential problem though. It is quite possible for someone not to sign in for months (possibly even years) at a time which at the moment would be classed as a completed period (full 28 days) with a sign in count of 0. This will skew the average, would using something like NULLIF(SignInCount, 0) be recommended here or would I be better off trying to amend the inline calendar table here?

    Yes. That still makes the code incorrect because one of the rows ignored is the first row for person 1.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply