July 21, 2014 at 3:26 pm
I have come across a question on of the job interviews, i have already answered the question but was curious how other DBA's would answer this.
"With a quick rollout of a new site, very little attention was given to optimization across the SQL server farms. The new site has now grown to a point where scalability needs attention. You’ve already swept through all the tables and created indexes where they would be helpful, but you still need to reduce the load on the SQL servers by 50%. List 3 other areas to look for optimization opportunities. (Personal experiences are preferred over hypothetical scenarios). "
Below is what i have listed, i am sure there could be more. Trying to find out what are key things other's would have targeted.
1) Most of the times applications time out due to blocking, first and foremost check if there is any blocking on the databases. If no blocking then check for long running queries, i have a custom script that i can fire off to identify top 5 long running queries. Some queries in my case naturally run longer, in my case there is usually a job which is calling a stored procedure which is chewing up all the resources. As immediate fix I would stop the job and monitor performance through custom script or tool ( we use XYZ for performance monitoring) . Take the code of the job and optimize it to make sure it doesn't cause any issue. If there aren't any long running queries then i will look at some server dashboard reports, i have come across issues where one of the NIC cards dropped and overall throughput to the SAN has also increased. In this case i will have to work with Infra team to resolve the issue and then review the metrics to make sure it is within our baseline.
2) i) We have noticed some significant temp db issues for our OLTP environment. When there are more concurrent sessions we noticed high wait times on tempdb datafiles, we were able to resolve this by splitting tempdb into 8 different files with equal size and each file was pre-grown to same size. We have noticed significant improvement in the application as now all the temp objects are scaled out across multiple files. This setting is good for OLTP but OLAP we used SSD.
ii) The other setting i would look at is modifying MAXDOP and CTP values, SQL server ships with default values which might not be suitable for all the environments. I was able to find the optimal values by monitoring CXPacket wait type using custom scripts. I also did enough load testing to identify appropriate value. After changing MAXDOP to 4 and CTP to 20 , i have noticed significant improvement in overall performance. I was able to measure this by monitoring key counters like CXPacket wait types.
iii) Another area that i have looked in past is how SQL server is managing memory, i have enforced a standard to have min and max memory setting on every sql server. We had a server which did not have memory cap, this was causing paging issues where sql server would use local Pagefile to get the data which is way slower than fetching data from memory. To fix this i had to do two things a) set min/max memory setting , typically i assign no more than 80% of total memory to sql b) grant sql service account access to locked pages in memory, this will prevent sql server from paging out.
July 21, 2014 at 3:37 pm
Reports and/or traces on long running and/or high I/O queries. Mark for review and optimization.
Perfmon bottlenecking. Try to find out if it's memory pressure, CPU pressure, I/O pressure, etc. Attempt to determine the cause of the load and/or slowness.
Fire half the users... oh, errr, no?
Review storage. Determine if the SAN is sharing spindles, if there's fiber concerns, are the controllers maxed out, etc.
Perform blocking sampling, determine if there's recurring concurrency issues. Dig in and find out why.
SELECT * FROM sys.sql_modules where definition like '%cursor%'
Ask the users where the problems, or worst offenders, seem to be. Work with front end developers to see what the calls are, and determine if it's reasonably static data that could be cached with a few extra parameters to confirm that the data hasn't changed since last cache for updates. Polling on max(Rowversion) in a table is GREAT for that, btw.
There's a few hundred ways to start poking around, but in general, I always start with a single question.
"Where does it seem to hurt?"
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
July 21, 2014 at 3:45 pm
The one thing I would start with is gathering performance metrics. Know what is happening on the system. What do the wait stats look like. Do we have queuing somewhere. Gather the information first, then start making determinations on additional actions such as addressing blocking, adjusting files and file storage, adjusting cost threshold for parallelism (well, maybe do that immediately). But before I started outlining possible solutions, I'd absolutely suggest getting the metrics in line. Last thing you want to do is say "Well, at the last company this query hint helped, so we'll put it EVERYWHERE" or something like that. And I've seen that solution enacted. You must start from a position of knowledge.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 21, 2014 at 4:26 pm
curious_sqldba (7/21/2014)
I have come across a question on of the job interviews, i have already answered the question but was curious how other DBA's would answer this."With a quick rollout of a new site, very little attention was given to optimization across the SQL server farms. The new site has now grown to a point where scalability needs attention. You’ve already swept through all the tables and created indexes where they would be helpful, but you still need to reduce the load on the SQL servers by 50%. List 3 other areas to look for optimization opportunities. (Personal experiences are preferred over hypothetical scenarios). "
If we have already swept through creating indexes, then I should sweep through finding those indexes that are no longer being used. A high number of unused indexes could reduce the load a good bit right off. Other than that I would capture another baseline, since we already have been doing that too (right?) and compare that to the previous baseline to see what may have been fixed by creating those new indexes.
I would then start sniffing through the plan cache and wait stats to see where the pain points are located.
Shawn Melton
Twitter: @wsmelton
Blog: wsmelton.github.com
Github: wsmelton
July 21, 2014 at 4:59 pm
The question ends with the following...
List 3 other areas to look for optimization opportunities. (Personal experiences are preferred over hypothetical scenarios). "
My answer would be...
Based on nearly 20 years of personal experience in the area of optimization, the 3 areas that I'd look into are...
1. Crap code. :blink:
2. Crap code. :pinch:
3. Crap code. :sick:
To find these problems, I'd use the little proc I wrote called "sp_ShowWorst", which is similar to the "Performance - Top Queries by..." reports available in SSMS at the instance level except it also gives the saved execution plan and a whole lot of other information (Adam Machanic also has one that takes a lot more parameters than mine, depending on what your needs are).
Then, "IT DEPENDS" comes into full swing. Depending on what was actually found, queries may truly be crap code and need to be rewritten, may need to be just tweaked for index usage or other optimizations, ad hoc and ORM code might be moved to stored procedures or retuned at the non-SQL Server source, or certain common frequent lookup results could be cached on the Web Server (or whatever).
It may also be that indexes aren't actually doing any good and need to be dropped to decrease timeouts and deadlocks, etc, etc.
The bottom line is that it's most likely going to be some form of code change (or maybe just adding the correct index if you REALLY get lucky and, no, adding covering indexes to everything is NOT the answer). The very same methods I described above have recently allowed me to take a server from 40-45% average CPU usage across 16 CPUs to just 5-7% (depending on the time of day), to reduce memory requirements, and to reduce logical reads measured in the quadrillions of bytes in an 8 hour period to just several billion.
The sick part about all of this is that writing the code correctly to begin with would allow us to avoid all of this without taking any extra development time. In most cases, writing the code correctly usually takes less time although I'll admit that you have to actually know what you're doing instead of thinking that ORMs and embedded code can actually do it all. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply