March 29, 2011 at 7:03 am
Not sure where to post this and I'm not sure if I am going to gain anything by posting this. At the very least I could warn other developers out there.
I have a very badly designed database. It has grown out of control and there is very little I can do about it.
I cannot change the structure due to the application design. I'm also struggling to design an archive system, also due to the application design. The system is approaching a terrabyte and the more data that is loaded the slower the system becomes due to inneficient design.
Our immediate problem is this. Sometimes when we do a large load of data the system grinds to a halt until we rebuild the database. The batch requests per second is much lower than usual, as well the very little IO, however the CPU is suffering. Our only solution is a rebuild. After shutting down application servers, we BCP the data out, drop the table, create the table, BCP the data back in, add the indexes. This is the only method we have had success with when this anomaly happens.
March 29, 2011 at 7:22 am
What other solutions have you tried?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 29, 2011 at 7:55 am
Have you look at locks and wait states to understand why it's doing this? You should be able to at least narrow down the problem.
App code or not (and I assume we're not talking 3rd party here), if things are this bad, that you have to take the productoin system off-line, I don't know of many businesses that couldn't find a bit of time for some refactoring. It sounds like you need it.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 29, 2011 at 7:59 am
Are those tables heaps or clusters?
What's logical fragmentation? forwarding pointers? avg page space used? at the point that you need to do a 'rebuild'?
What version of SQL are you using? What edition?
p.s. How experienced are your DBAs?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 29, 2011 at 8:02 am
I'm thinking when he does a load of data, the statistics go out of date, and "it grinds to a halt"? updating statistics right after the large load seems to be one of the first things i'd look at.
Lowell
March 29, 2011 at 8:22 am
If you can't change the design or application. then you could change the hardware... I have seen some good performance improvements from using SSDs on badly designed databases, though at over 1TB it will be expensive..
March 29, 2011 at 12:42 pm
Sometimes when we do a large load of data the system grinds to a halt until we rebuild the database
Just a thought, do you drop the indexes prior to a large load of data and then rebuild same indexes afterwards?
It could be the updating on the indexes during the large data load that causes some (if not all) of your slowdown.
---------------------------------------------
If data can screw you, it will; never assume the data are correct.
March 29, 2011 at 12:45 pm
Also, there may be a little bit of back-end tweaking that can be done, whilst using views to present the same logical output to the application. Not knowing the structure of the db, this is just a thought aired.
---------------------------------------------
If data can screw you, it will; never assume the data are correct.
March 29, 2011 at 2:51 pm
KevinGurney (3/29/2011)
Also, there may be a little bit of back-end tweaking that can be done, whilst using views to present the same logical output to the application. Not knowing the structure of the db, this is just a thought aired.
I'd second this. You're in a lousy position. Views to imitate schema may be one of your ways out of this. Throw in indexed views and you might be looking at a possible short term solution while you puzzle out what the table level should actually look like.
Can you take us step by step through your 'large data load', including existing rowcounts, expected inclusion volume, number of tables, if you use staging tables to cleanse the data if there's any cleanup to be done, etc?
A reindexing every load isn't unheard of, but it does depend on the volume. It also depends on the 'noise' area in the table. Most huge tables have a very small area that's being heavily updated with the rest of it just being there for reference, if at all. I've seen as low as 2% fragmentation on a huge table be enough to take the processing speed of a system down.
At the time, I didn't have easy partitioning available. However, if you can isolate this to being a portion of your issue (which is what it sounds like to me offhand), it may be a possibility. The biggest reason to do this is reindexing will be much faster in the area(s) of the partition that don't involve the remaining large volume.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 30, 2011 at 3:47 am
Hi, and thanks for all the replies. Yesterday was a long day and we finally got the rebuilds complete last night. I'm not happy at all with the solution, as it is unheard of to take a production system offline to rebuild tables in order to get the system working again. I'm also unhappy that we cannot quite pinpoint the underlying problem.
Allow me to answer your questions:
What other solutions have you tried?
We have tried "ALTER INDEX WITH REORGANIZE", "ALTER INDEX WITH REBUILD" without success.
Have you look at locks and wait states to understand why it's doing this? You should be able to at least narrow down the problem.
The excessive wait stats were all SOS_SCHEDULER_YIELD. At the time there were very few requests and very little IO, but the CPU was running between 85% and 100%.
App code or not (and I assume we're not talking 3rd party here), if things are this bad, that you have to take the production system offline, I don't know of many businesses that couldn't find a bit of time for some refactoring. It sounds like you need it.
3rd party (and 4th party) stuff. This software has a data layer that builds SQL on the fly. It creates temporary stored procedures (CREATE PROCEDURE #aabb001 @P1, @P2 AS .....) then it executes these with the necessary parameters. The customer does not want us to create stored procedures as they do not want to be vendor dependant. It is a nightmare to maintain and optimise.
Are those tables heaps or clusters?
Clusters
What's logical fragmentation?
I couldn't take a reading, but file growth is set at 1Gb per file. They are on a dedicated disk array on a SAN.
forwarding pointers? avg page space used? at the point that you need to do a 'rebuild'?
Not sure. I only checked index fragmentation which was 99% on one of the tables, but an index rebuild did not help much.
What version of SQL are you using? What edition?
SQL Server 2008 R2 (RTM) - 10.50.1600.1
p.s. How experienced are your DBAs?
My boss was working with me on the issue and he knows his SQL very well. Been around the block a few times. My DBA skills are ok but I'm primarily a developer.
I'm thinking when he does a load of data, the statistics go out of date, and "it grinds to a halt"? updating statistics right after the large load seems to be one of the first things i'd look at.
I did not think of statistics as Auto Create Stats and Auto Update Stats are both on.
If you can't change the design or application. then you could change the hardware... I have seen some good performance improvements from using SSDs on badly designed databases, though at over 1TB it will be expensive..
Too expensive for this client, besides the disks were completely underutilised yesterday.
Just a thought, do you drop the indexes prior to a large load of data and then rebuild same indexes afterwards?
Not at all.
Also, there may be a little bit of back-end tweaking that can be done, whilst using views to present the same logical output to the application. Not knowing the structure of the db, this is just a thought aired.
We currently have a partitioning proposal with the client as well as an archiving proposal. The archiving is difficult due to the applications data layer, but we want to implement distributed views with insert and update triggers.
Can you take us step by step through your 'large data load', including existing rowcounts, expected inclusion volume, number of tables, if you use staging tables to cleanse the data if there's any cleanup to be done, etc?
The first time this happened, in December, the client did an annual run that updated each and every member on the system. The update actually created a new version of each so there was a large number of new rows in a large number of tables, although I cannot give you numbers. Monday nights update was very different. We needed to add new properties to all the members. The strange thing was that my script to add the new properties was run two weeks ago with no ill effect. My script added the properties with a default value of NULL. The script added about 250,000 rows to 5 tables, so over a million new rows. On Monday evening their application, via the data layer, performed an update in a loop to set the value to a valid string. They manage to update 91,000 rows in about 10 hours before stopping the process. This is when the system croaked. Don't ask me why they decided to perform an update like this, it is very typical of the incompetence we have to deal with on a daily basis. I have since informed them that a SQL script could perform the update in a set in a matter of minutes, but that is besides the point.
March 30, 2011 at 4:48 am
March 30, 2011 at 4:58 am
I suspect that you may need to get a specialist in (on-site) for a couple of days. I can recommend some good people here. Sounds like it's maybe too big/complex a problem for the forums.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 30, 2011 at 5:14 am
GilaMonster (3/30/2011)
I suspect that you may need to get a specialist in (on-site) for a couple of days. I can recommend some good people here. Sounds like it's maybe too big/complex a problem for the forums.
Thanks for the offer, but we are in South Africa. 🙂
I will figure this one out though. Will wait for it to raise its head again.
March 30, 2011 at 5:20 am
Sean Pearce (3/30/2011)
GilaMonster (3/30/2011)
I suspect that you may need to get a specialist in (on-site) for a couple of days. I can recommend some good people here. Sounds like it's maybe too big/complex a problem for the forums.Thanks for the offer, but we are in South Africa. 🙂
😀
So am I. PM me.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 30, 2011 at 5:32 am
... and Gail is the one I would call to save my life with a problem like this. Can't ask for a better consultant.
Good luck!
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply