January 9, 2008 at 3:48 pm
First let me say I have no experience at all with BI. I have a feeling it could help us out tremendously, but haven't had time to get to a class to find out for sure, and my attempts at self teaching have ended in frustration.
Our application collects readings from thousands of sensors every 15 minutes. As you could imagine, over time this leads to very large tables (currently ~300 million rows), and a need to aggregate this data for easier processing and display.
Our business requirements dictate that we aggregate on both an hourly and daily basis. Currently we use a C# windows service to do this aggregation, which has worked fine for 2+ years. However, that solution is starting to show its age and is having trouble keeping up without killing performance. Every time the service runs, it looks for readings where the 'processed' flag is set to 0, and generates the aggregate for the correct time period. It then sets the 'processed' flag to 1 for all readings in that time period.
The structure of the tables is listed below. Please have a look and let me know if you think BI could replace our current aggregation method with some kind of cube, or if you think we should just optimize our current process. BTW, there is a lot of room for optimization (getting rid of the readingid and keying off of sensorid and readingtime, using a small table to dump rows into and have the aggregate service add them to the main reading table after it processes them, etc...).
CREATE TABLE [dbo].[Reading](
[ReadingId] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[SensorId] [int] NOT NULL,
[ReadingTime] [datetime] NOT NULL,
[ReadingValue] [float] NOT NULL,
[HourlyProcessed] [bit] NULL DEFAULT (0),
[DailyProcessed] [bit] NULL DEFAULT (0),
CONSTRAINT [PK_Reading] PRIMARY KEY NONCLUSTERED
(
[ReadingID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[ReadingHourlyAggregate](
[ReadingHourlyAggregateId] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[SensorId] [int] NOT NULL,
[ReadingTime] [datetime] NOT NULL,
[AverageReadingValue] [float] NOT NULL,
[ReadingSum] [float] NULL,
[ReadingCount] [int] NULL,
[ReadingMax] [float] NULL,
[ReadingMin] [float] NULL,
CONSTRAINT [PK_ReadingHourlyAggregate] PRIMARY KEY NONCLUSTERED
(
[ReadingHourlyAggregateId] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Regards,
Ryan
January 11, 2008 at 9:39 am
This could be a very lengthy reply, but I'll cut it down to a few lines.
Your aggregating data, therefore, a datawarehouse could really help, for a start a database is far more suited to aggregate data on tables than any programming language (horses for courses). If you get even a semi decent sized server and generate SSIS packages to do your aggregation you'll probably find it rips through it compared to your current solution.
January 11, 2008 at 8:36 pm
Doesn't even need to be SSIS... can be some decent stored procs. But the point is an excellent one...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply