October 15, 2007 at 10:11 am
Hello Everyone,
I am currently looking at designing a reporting database for users to query data, there are massive/intensive reports that runs on the database. Below are the current challenges that we face:
The database is getting huge in size due to expected data growth
The size of the database is making crucial tasks such as data backup take ages
The size of the database means that archiving/house keeping needs to be done from time to time.
Performance is slowly degrading.
The data isn’t normalized and wouldn’t be worth the time and effort of normalization
The data is static, so once written to the database, is very unlikely to change.
Proposed solution
At the moment, I am looking for new ideas.
The new idea must solve the following problems
Allow for fast querying of data
Faster loading of data from flat files
I look forward to hearing from you all.
Thanks
October 17, 2007 at 10:15 pm
Even though you can't use full partitioning, I would think that partitioned views could help ... especially if the older data can be effectively held in "archive" tables on different filegroups. Then you may not need to run full backups, just filegroup backups. If you can mark some filegroups as read-only, you won't have to back them up at all on a regular basis.
Matt.
October 18, 2007 at 3:54 pm
I have a few more questions to help clarify your situation and then be able to help:
The database is getting huge in size due to expected data growth --> What is the current size of Data and Log portions at present ? How much growth in data occurs over what interval ?
The size of the database is making crucial tasks such as data backup take ages --> How long is 'ages' ? Minutes ? Hours ?
The size of the database means that archiving/house keeping needs to be done from time to time. --> This is normal for any database system that holds data historically.
Performance is slowly degrading. [b]--> What kind of optimizations are you running ? Update Statistics ? sp_recompile & sp_refreshview after Update Statistics ? Index Reorgs/Rebuilds ?[/b]
The data isn’t normalized and wouldn’t be worth the time and effort of normalization --> un-normalized data is a relational database's achilles heel. I really suggest some normalization.
The data is static, so once written to the database, is very unlikely to change. --> OK, partitioning may help.
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
October 21, 2007 at 1:53 pm
Hello,
Thanks guys for your answers: I will answer the questions above:
The current size of the database is around 2 Terabytes, and the monthly file groups hold about 300Gb of data, all of which are mostly static and unnormalised.
At the moment, the fastest full backups takes around 30Hours, this is the best we can get at the moment.
Performance jobs in place includes updating the indexes and de-fraging the indexes as well.
With normalisation, this is always the best thing to do, but the problem is that it would take a considerable amount of time to do, we are talking of months here, and besides a lot of things has been tied into the unjnormalised data, any changes now will mean a complete rebuild of the entire framework, scheduled reports, all user reports, DTS/SSIS packages, crucial system processes, training users to new structure etc.
Taking all the above into consideration, I hope that someone can come up with a solution here.
Thanks
October 21, 2007 at 9:43 pm
30 Hours to backup... are you going straight to tape? Take a look at a backup compression product like Litespeed (Quest), SQL Backup (Red Gate) or SQL Safe (Idera) with a backup straight to disk.
With a database that big, disk management is a big deal, things like filegroups can make your life much easier especially since the data does not change once written - data that does not change broken up into logical "chunks" is much easier to manage than volatile data.
With static data that breaks up easily (e.g. by month) partitioning is definitely an option as well.
Joe
October 22, 2007 at 7:42 am
Sounds like you're getting yourself into quite a box. It's difficult to offer precise suggestions, since you haven't listed much about the application you're attempting to do. However, a couple of thoughts:
- On the subject of normalization, you can limit the impacts on the user community through a careful migration path that includes views to replace existing denormalized tables. While not a quick fix (and difficult to wean people off then views then), it'll allow you to proceed with the most valuable changes.
- Two Terabytes is a substantial (but not unmanageable) data source. Consider carefully typical usage. For example, you could use partitioned views (i.e. by month), but make sure you cluster the tables by the criteria most often queried against. Slicing up your IO often produces the most dramatic performance gains.
- Reconsider the SQL Standard / Enterprize question. We've all had to deal with the budget issue at some point, but what type of FTE support will be needed performing the optimizations manually, vs using the extra features supplied with Enterprize. It's all about cost / benefit.
- You don't say what type of reporting application this is. If it's a data warehousing application (which I suspect it may be), Consider implementing Analysis Services. Of course, the disruption and change is much higher, but the potential benefits are also larger. Note: Again, partition support is in Enterprize, so you must move up a version.
- Reconsider the granularity of the data, particularly older data. If you summarize it, you'll get compression on disk, IO, etc.
If you get a chance, post your schema. There may be some easy to do optimizations you've missed.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply