Pulling stats for multiple columns out of a table efficiently

  • Dear Experts

    I have a table that accumulates daily stats, defined so:

    CREATE TABLE [dbo].[Stats2]

           ([StatsDate] [datetime] PRIMARY KEY,
           [TypeA_Calls] [int] NULL,
           [TypeB_Calls] [int] NULL,
           [TypeC_Calls] [int] NULL);

    I need a SELECT statement, or sequence of statements, which can pull out the earliest and latest date on which each stat is nonzero. This is of course would be easy to do if there were only one stat (using the TypeA_calls field as an example):

    SELECT

           min(statsdate) as First_TypeA_Date,
           max(statsdate) as Last_TypeA_Date

           from dbo.Stats
           where typea_calls>0;

    However, I cannot see how to generalize this to multiple columns of stats  (I have only defined 3 columns in my example table for brevity, but in the real-world table there are about 20 of them!). The complicating factor seems to be that each of the WHERE clauses has to be tailored to the specific column.

    I could of course have 3 (or 20-odd) SELECT statements, each putting one column’s worth of stats into a temporary table and then reporting from that, but I can’t help thinking there must be a way of avoiding 3 (or 20-odd) logical table scans.

    Or maybe I could do this using a cursor… ?

    Yours, seeking enlightenment.

    MarkD

  • drop table if exists #Stats2;
    go
    CREATE TABLE [dbo].#Stats2
    ([StatsDate] [datetime] PRIMARY KEY,
    [TypeA_Calls] [int] NULL,
    [TypeB_Calls] [int] NULL,
    [TypeC_Calls] [int] NULL);

    insert #Stats2 values
    ('2020-12-01', null, null, 23),
    ('2020-12-02', 1, null, 23),
    ('2020-12-03', 2, null, 23),
    ('2020-12-04', null, 4, null),
    ('2020-12-05', null, 2, 47),
    ('2020-12-06', 3, 3, 3);

    select min(case when TypeA_Calls is not null then StatsDate else null end) min_a,
    max(case when TypeA_Calls is not null then StatsDate else null end) max_a,
    min(case when TypeB_Calls is not null then StatsDate else null end) min_b,
    max(case when TypeB_Calls is not null then StatsDate else null end) max_b,
    min(case when TypeC_Calls is not null then StatsDate else null end) min_c,
    max(case when TypeC_Calls is not null then StatsDate else null end) max_c
    from #Stats2;

    Output

    min_amax_amin_bmax_bmin_cmax_c
    2020-12-02 00:00:00.0002020-12-06 00:00:00.0002020-12-04 00:00:00.0002020-12-06 00:00:00.0002020-12-01 00:00:00.0002020-12-06 00:00:00.000

    Alternately the CASE expressions (or IIF functions) could be in the FROM clause

    select min(dtm.a) min_a, max(dtm.a) max_a,
    min(dtm.b) min_b, max(dtm.b) max_b,
    min(dtm.c) min_c, max(dtm.c) max_c
    from #Stats2
    cross apply (values (iif(TypeA_Calls is not null, StatsDate, null),
    iif(TypeB_Calls is not null, StatsDate, null),
    iif(TypeC_Calls is not null, StatsDate, null))) dtm(a,b,c);

    • This reply was modified 3 years, 11 months ago by  Steve Collins.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Can you change the data model? A more  EAV-like data model might be more appropriate in this case -- e.g.,

    CREATE TABLE [dbo].[StatsDefinitions]

    (
    [StatsID] [int] NOT NULL PRIMARY KEY,
    [StatsName] varchar(30) NOT NULL,
    [StatsDescrtiption] varchar(255) NOT NULL
    );

    CREATE TABLE [dbo].[Stats]

    ([StatsDate] [datetime] NOT NULL,
    [StatsID] [int] NULL,
    [StatsValue] [int] NULL)
    PRIMARY KEY PK_Stats (StatsDate ASC, StatsID ASC)
    );

    Then your select looks like this:

    SELECT
    StatsDefinition.StatsID,
    StatsDefinition.StatsName,
    min(statsdate) as First_Stat_Date,
    max(statsdate) as Last_Stat_Date
    from dbo.Stats
    INNER JOIN dbo.StatsDefinitions ON StatsDefinitions.StatsID = Stats.StatsID
    where Stats.StatsValue > 0;

    If this has to be flattened into 60 First/Last columns -- First_Type<N>_Date, Last_Type<N>_Date -- then do you return any row in which any of the 30 values is greater than zero?

  • Two  - no, three - really instructive answers here.

    I think Steve Collins' first suggestion was the sort of thing I had at the back of my mind. That CROSS APPLY was nifty though. I wonder which one scales better?

    I agree with ratbak about using a more EAV approach, but I suspect that external circumstances will disallow this. I don't know for sure as this question came from a colleague.

    Re ratbak's question, if I understand it, I think the answer is yes - that is, if any of the stats for a particular day is non-zero, we would want to display all of them in cross-tab fashion, otherwise omit that day.

     

    Many thanks

    MarkD

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply