April 21, 2010 at 11:13 am
So I'm brand new to this company and here is their scenario. The company as a whole gets about 4 to 5 new contracts a year and each contract may generate multiple projects. So of course there is a Projects table that averages about 5 million records. This Projects table has about 4 other child tables and each of their record counts average 10 million or greater (lets refer to the Projects table and its child tables as the "Project Set". In the recent past they have experienced a lot of performance issues with slow query times etc. Our Team Lead says they've done everything as far as proper usage of indexes to help speed performance but slow queries is still an issue because of the sheer number of records.
So in order to improve performance, our Team Lead is proposing generating a "Project Set" for every project. So if 3 new projects are created (H123, H456, and H789) then a "Project Set" of tables is created for each project (i.e. H123_ProjectDetails, H123_ProjectResources, H456_ProjectDetails, H456_ProjectResources, H789_ProjectDetails, H789_ProjectResources). By doing this he says we will be querying against an average of about 500,000 records which will of course improve performance.
His line of thinking is that he expects for us to only get about 15 to 20 new projects a year and so the number of tables in the database won't get too bloated. Plus as completed projects reach a certain age we can copy these tables into an archive database and then drop those tables altogether from the production database.
Of course my argument is that I would think this breaks normalization and basic db design and also what happens when the company grows (which it is steadily growing at a good pace) and we start getting 100 new projects a year? Nevertheless, this is what he has decided to do.
My spidey sense seems to think that this is a mistake but I am giving him the benefit of the doubt because I've never worked with tables this large. So is this design a common solution to improve performance or is there a better way of handling this?
Also I just noticed that I placed this topic under "Sql Server 2K5" but we are actually using 2008.
April 21, 2010 at 11:26 am
More than my spidey sense is tingling because of this..
Just think about the queries required to get data from all those tables.. That raises questions right there..
Here is my big question.. Does all of this data need to be in these tables RIGHT NOW. What I mean is that do they need to have access to 55M rows of data in their live system?
This SCREAMS archive to me.
They claim to have done the work on proper indexing, check anyway..
But there are also a lot of other questions I would ask..
How big is the database? 40-60M rows of data can vary widely on total size..
SQL Version and Edition?
What about the hardware?
Is it old?
How many processors?
What kind?
Does SQL have access to all of them?
How much memory total?
How much is SQL allowed to have?
How much is SQL using?
Is SQL starving the OS and the OS is thrashing?
32/64 bit?
Local disk?
How fast are the disks?
SAN?
Have the poorly performing queries been looked at in depth to not only see how they use indexes but whether they are the best wy to perform the action?
All of these affect performance.. And quite frankly SQL 2005 should be giving good performance even on this size of load.
CEWII
April 21, 2010 at 11:32 am
We are using Sql Server 2008. I have no idea what the specs are pertaining to the database servers but I agree that's a good place to start asking questions at. I will pass on the questions you raised and see what happens. Thanks!
April 21, 2010 at 11:43 am
SQL 2008 is even better. You would not believe the anemic hardware I've seen and then heard people complain about SQL not performing.. I have to fight the urge to say "what did you expect, 2 processors and 2GB of memory and 100GB of live data.."
CEWII
April 21, 2010 at 12:22 pm
To a degree, what he's proposing is partitioning your data. That's actually a fine way to go, but you can actually work with sQL SErver to have it do a lot of the work for you. Look into BOL for details on it, but I think that's the direction he's trying to head in.
It's not as scary as it sounds, although it could be.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply