May 9, 2010 at 9:01 am
Hi All,
Iam pretty much a newbie in SQL server so I need some assistance(from the SQL gurus) on how I should build my aggregation tables.
In my system database ,we have a primary table ,which stores detail level data.There are around 500,000 records that are written to this table every day.And I will have to do an aggregation of the data in this table using sum and average functions.Also the primary table is purged regularly so at a given time it stores data only for last 15 days.So if I create simple views on the primary table I would get aggregated data only for the last 15 days,where as I need to store the aggregated data for around 1 year.So I would have to basically store the aggregated data in some other table so that I can retain the data for a period of 1-2 years.
Iam not quite sure on what approach I should take for this requirement.Any suggestions would be really helpful.
Thanks..rn
May 9, 2010 at 3:50 pm
I think if you are using additional tables for storing information about reports that you needed is good way for facing much record.
If you are using MS SQL Server Express, I recommend you to store the reports in different instance to make the access more faster.
Regards,
Eko Indriyawan
May 10, 2010 at 2:54 am
You need to have a look at Aggregated Fact tables.
May 10, 2010 at 5:53 am
That sounds like quite a job. 500,000 records in one day is no small order of information to deal with.
It would pobably help if you provided a general breakdown of the table's schema. Are we dealing with a lot of columns of comma-delimited data, or is it mostly well constrained numbers?
I'm not sure it would be necessary to store the table in a separate database instance. Having a separate table in the same db shouldn't really hurt performance, and if it would be accessed very heavily in the future, you could look at storing it on its own filegroup.
Shooting from the hip, my suggestion would be to build a nice normalized table to hold the aggregate information and then run a sql server agent job sometime after midnight that contains the insert statement that aggregates the information from the day before.
something like
INSERT AggregateTable (SaleDate,ProductID,AvePrice)
SELECT
/*could even store this as integer primary key in aggregate table*/
CONVERT(varchar,DATEADD(dd,-1,CURRENT_TIMESTAMP),112)
,ProductID
,AVE(Price)
FROM dbo.YourBigTable b
WHERE DATEDIFF(dd,b.Date,CURRENT_TIMESTAMP)=1;
That is of course depending upon your indexes on the YourBigTable. The select part is the fun part you get to optimize.
Let me know how it goes.
Keith Wiggans
May 10, 2010 at 11:04 am
Hi kwiggans,
Correct me if I'm wrong about multi-instance.
Btw, have you tried to use multi-instance and manage it?
Just for your information, based on my experience for
using multi-instance to manage big records, the performance
of my application for accessing data, transaction etc... more
faster than just use one instance.
I think if we use multi-instance, we can ilustrate it like multi-threading.
I talk this about MS SQL Server Express not for Enterprise version,
because MS SQL Server Enterprise has been support multi-treading.
Regards,
Eko Indriyawan
May 10, 2010 at 3:07 pm
I could absolutely see how using multiple instances of SQL server would improve performance if say most of the tables in one instance were being read and the other instance was being used to write.
I beleive we are talking about data warehousing here right? I have not been given the opportunity to set up a system like that, but I would love to try it.
I think it's essential to allow smart users quick access to readily accessible data. I was proposing more of a quick and dirty approach in case there wasn't time to implement a full scale multi-instance setup.
Keith Wiggans
May 10, 2010 at 4:51 pm
I don't we have enough information to help you much, but if you are going to store the information so that you ahve a history, I suspect that you'll need to store the SUM of the values and the associated counts. If you only store the Average for a day, then it make it difficult to calculate an accurate Average for a week or other time period. Since you're source table is only getting around 500K rows per day, you could create a History table with all the detail and store a rolling window of 1-2 years. Possibly with a partition scheme of some kind (monthly perhaps?).
If you have some more detail you want to share we might be able to help a bit more.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply