Aggregation - is this an appropriate problem for BI to solve?

  • First let me say I have no experience at all with BI. I have a feeling it could help us out tremendously, but haven't had time to get to a class to find out for sure, and my attempts at self teaching have ended in frustration.

    Our application collects readings from thousands of sensors every 15 minutes. As you could imagine, over time this leads to very large tables (currently ~300 million rows), and a need to aggregate this data for easier processing and display.

    Our business requirements dictate that we aggregate on both an hourly and daily basis. Currently we use a C# windows service to do this aggregation, which has worked fine for 2+ years. However, that solution is starting to show its age and is having trouble keeping up without killing performance. Every time the service runs, it looks for readings where the 'processed' flag is set to 0, and generates the aggregate for the correct time period. It then sets the 'processed' flag to 1 for all readings in that time period.

    The structure of the tables is listed below. Please have a look and let me know if you think BI could replace our current aggregation method with some kind of cube, or if you think we should just optimize our current process. BTW, there is a lot of room for optimization (getting rid of the readingid and keying off of sensorid and readingtime, using a small table to dump rows into and have the aggregate service add them to the main reading table after it processes them, etc...).

    CREATE TABLE [dbo].[Reading](

    [ReadingId] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

    [SensorId] [int] NOT NULL,

    [ReadingTime] [datetime] NOT NULL,

    [ReadingValue] [float] NOT NULL,

    [HourlyProcessed] [bit] NULL DEFAULT (0),

    [DailyProcessed] [bit] NULL DEFAULT (0),

    CONSTRAINT [PK_Reading] PRIMARY KEY NONCLUSTERED

    (

    [ReadingID] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 80) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[ReadingHourlyAggregate](

    [ReadingHourlyAggregateId] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

    [SensorId] [int] NOT NULL,

    [ReadingTime] [datetime] NOT NULL,

    [AverageReadingValue] [float] NOT NULL,

    [ReadingSum] [float] NULL,

    [ReadingCount] [int] NULL,

    [ReadingMax] [float] NULL,

    [ReadingMin] [float] NULL,

    CONSTRAINT [PK_ReadingHourlyAggregate] PRIMARY KEY NONCLUSTERED

    (

    [ReadingHourlyAggregateId] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    Regards,

    Ryan

  • This could be a very lengthy reply, but I'll cut it down to a few lines.

    Your aggregating data, therefore, a datawarehouse could really help, for a start a database is far more suited to aggregate data on tables than any programming language (horses for courses). If you get even a semi decent sized server and generate SSIS packages to do your aggregation you'll probably find it rips through it compared to your current solution.

  • Doesn't even need to be SSIS... can be some decent stored procs. But the point is an excellent one...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply