Counting Daily Active Accounts per Category

  • Hello everyone,

    I am trying to get the number of daily active accounts per given category. I have the following table that contains accounts, their category, and when they moved into that category.

    `

    ID     Category       Category_Date

    11       1                        2021-01-05

    12      3                       2021-01-05

    11       2                       2021-01-18

    12      2                       2021-01-19

    14      5                      2021-02-01

    `

    where ID is the unique identifier. The category can change and the table will be updated automatically with the date of update.

    What I'm trying to do is calculate the daily active accounts in each category, meaning ID 11 will be included in the daily active count for Category 1 on January 14th despite the category_date being the 5th and will be in the daily count of category 2 on January 30th despite the new category date being 18th.

    In the end, I'm trying to have a table that has dates as the first column, the category numbers as the second column, and the number of active accounts per day per category in the final column.

    I tried using COUNT(ID) OVER(partition by category, category_date), but this gives me the number of accounts in categories that were changed/updated on the date. Is this possible to do on SQL? Any help would be greatly appreciated.

  • Is this the sort of thing you are after?

    --Set up test data
    DROP TABLE IF EXISTS #CatTable;

    CREATE TABLE #CatTable (Id INT, Category INT, CategoryDate DATE);

    INSERT #CatTable (Id, Category, CategoryDate)
    VALUES
    (11, 1, '20210105')
    ,(12, 3, '20210105')
    ,(11, 2, '20210118')
    ,(12, 2, '20210119')
    ,(14, 5, '20210201');

    --Create and populate temp table to hold all dates between min and max dates from data
    DECLARE
    @MinDate DATE
    , @MaxDate DATE;

    SELECT
    @MinDate = MIN(ct.CategoryDate)
    , @MaxDate = MAX(ct.CategoryDate)
    FROM #CatTable ct;

    DROP TABLE IF EXISTS #Dates;

    CREATE TABLE #Dates (CategoryDate DATE PRIMARY KEY CLUSTERED);

    INSERT #Dates (CategoryDate)
    SELECT y.d
    FROM
    (
    SELECT d = DATEADD(DAY, x.rn - 1, @MinDate)
    FROM
    (
    SELECT TOP (DATEDIFF(DAY, @MinDate, @MaxDate) + 1)
    rn = ROW_NUMBER() OVER (ORDER BY s1.object_id)
    FROM sys.all_objects s1
    CROSS JOIN sys.all_objects s2
    ORDER BY s1.object_id
    ) x
    ) y;

    --Return results

    WITH categories
    AS
    (SELECT
    ct.Id
    , ct.Category
    , ct.CategoryDate
    , EndDate = ISNULL(
    DATEADD(
    DAY
    , -1
    , LEAD(ct.CategoryDate, 1, NULL) OVER (PARTITION BY ct.Id ORDER BY ct.CategoryDate)
    )
    , @MaxDate
    )
    FROM #CatTable ct)
    SELECT
    d.CategoryDate
    , CategoryCount = COUNT(1)
    FROM categories
    JOIN #Dates d
    ON d.CategoryDate >= categories.CategoryDate
    AND d.CategoryDate <= categories.EndDate
    GROUP BY d.CategoryDate ORDER BY d.CategoryDate

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Perhaps:

    -- Consumable test data
    CREATE TABLE #t
    (
    ID int NOT NULL
    ,Category int NOT NULL
    ,Category_Date date NOT NULL
    ,PRIMARY KEY (ID, Category_Date)
    );
    INSERT INTO #t
    VALUES (11, 1, '20210105')
    ,(12, 3, '20210105')
    ,(11, 2, '20210118')
    ,(12, 2, '20210119')
    ,(14, 5, '20210201');

    -- Maybe
    WITH CategoryRange
    AS
    (
    SELECT Category, Category_Date AS StartDate
    ,COALESCE(LEAD(Category_Date) OVER (PARTITION BY ID ORDER BY Category_Date), '99991231') AS EndDate
    FROM #t
    )
    ,DateRange
    AS
    (
    SELECT Category_Date AS StartDate
    ,COALESCE(LEAD(Category_Date) OVER (ORDER BY Category_Date), '99991231') AS EndDate
    FROM #t
    GROUP BY Category_Date
    )
    ,Gaps
    AS
    (
    SELECT C.Category, D.StartDate, D.EndDate
    ,COUNT(1) AS ActiveAccounts
    ,CASE
    WHEN LAG(D.EndDate) OVER (PARTITION BY C.Category, COUNT(1) ORDER BY D.StartDate) = D.StartDate
    THEN 0
    ELSE 1
    END AS Gap
    FROM CategoryRange C
    JOIN DateRange D
    ON C.StartDate < D.EndDate
    AND C.EndDate > D.StartDate
    GROUP BY C.Category, D.StartDate, D.EndDate
    )
    ,Grps
    AS
    (
    SELECT Category, ActiveAccounts, StartDate, EndDate
    ,SUM(GAP) OVER (PARTITION BY Category, ActiveAccounts ORDER BY StartDate) AS Grp
    FROM Gaps
    )
    SELECT MIN(StartDate) AS StartDate
    ,MAX(EndDate) AS EndDate
    ,Category, ActiveAccounts
    FROM Grps
    GROUP BY Category, ActiveAccounts, Grp;

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply