Maintenance Plan Review

  • Ah, that was a great way to start the morning. Thanks, Ken.

    It's a small, small company and everyone who works here regularly (unlike me, the glorious contractor!) is highly overworked and seriously overpaid. They have a web app that was, apparently, created by monkeys at a consulting firm and delivered as all compiled code that no one here can touch. So, restarting the service multiple times a day was the answer--can't argue with success!

    They are now embarking on creating their own web site code--pity for them cause we know how THAT is going to go for a while. Me, I'm just the database schmoe for a few months so I have it easy!

    Now, back to the issues at hand. I did just get lucky last week. I'm sure some maintenance helped but yesterday morning there was a 'need to restart the service' emergency call. At that point the whole site is down just as everyone is stumbling into the office trying to find their head after a weekend. Service was restarted and that 'fixed' the issue.

    Spent the day looking at the web trace data collection and the error logs. Looks like they periodically have bouts where they have a request blocked about 1x per second. I then figured out how to catch the deadlocks in the log files and caught one bout of that at 7 pm last night and a smaller issue this morning.

    Now, how the heck do you get that info into a readable form once it's in the log files???

  • Re: Log files

    You mean the SQL ERRORLOG or were you using Profiler?

  • SQL Error log. Just setting up Profiler.

  • Duh. My bad. You specifically said "error logs".

    Because this is a web app front end I'm thinking you have a generic user account that [everything] webby uses. It might also use connection pooling which leaves a fingerprint of a ton of connections that are mostly sleeping when you run sp_who. Either or both together cramp the troubleshooting on the database side a bit.

    The ERRORLOG file really isn't going to help much. You'll need SQL Profiler. Using this tool is somewhat of a black art but don't panic. Ow. Only kidding. <g> You want to get as granular as you can with your filters. There's typically two culprits at either end of the spectrum: the "little" queries that are fired off in a loop 100,000 times for each query, or the heavy hitters that might do something stupid like "SELECT DISTINCT *..."

    Try filtering for the heavy hitters first. Set the Profiler filter for duration > maybe 500. CPU, Reads and Writes are also a good starting point. You may have to adjust the bar a bit. Start at what might be a middle range. You'll soon have a feel where you can point the spotlight on just the long running queries.

    Run the output to a file local on the server, on a drive that has some decent free space. Have it roll over files at like 500 Mb. If you're granular enough you shouldn't have a problem. For the time being, WATCH THIS STREAM LIVE. If things get crazy with the output, shut it down, review what you caught and see how you might adjust the filter. Heck, you might get lucky and actually find what you're looking for in that mess.

    Between Duration, CPU, Reads and Writes, you should get a decent picture of the heavy hitters. Let it run throughout the day. Be cautious about leaving it run overnight unless you know for sure you won't overdose on output.

    Good luck and reply as things develop. I'm sure my suggestions are basic and might draw a bit of heat from other posters but IMHO it's always best to start simple. Clearly this problem has been around for a while so any clues you uncover will eventually lead you to the answer.

    BTW: After you have your trace file, you can import it into a table (on your dev server of course) and query the heck out of it.

  • Thanks for the filters. I had the columns I wanted in the trace but needed to restrict it. Initially running it drug down the server to the point of having to kill the trace. Better now.

    Nothing surprising there so far. Lots of bad code including your fav, 'Select Distinct *'. Happily (?), however that query appears to return no data and is executed over a table with 2 rows and 4 columns. More concerning is the giant wad of dynamic sql for retrieving categories that executes multiple times per minute creating a temp table which is then indexed prior to figuring out what categories should be pushed upward. Unfortunately, I don't find that the dynamic sql is being built server-side--nothing in sql_modules searching with a fragment from a comment string from the dynamic sql. And then there are the hundreds of instances of select * from tblCategory that brings back 35K rows.

    I haven't had the trace running through a crash yet but I'm sure that will happen...

  • Dynamic SQL can be a necessary evil in certain cases but it sounds like in this case the chimps didn't know how to write good code so they just "built" the query they needed with concatenated strings of SQL. I guess the best you might be able to do is trap the query and optimize the objects it uses (eg: build helper indexes, build statistics on certain columns, etc.)

    This is where Profiler can help. Import the trace file into a table like I said earlier. Then start doing some analysis to see which queries are unique with a count on how many times they ran and an average of their duration, cpu, reads and writes. This will at least give you an idea of where the dynamic SQL might be falling down.

    I know this is probably crazy talk but can you get the vendor involved? If your employer paid for that "compiled, no touch" code it usually means "it's compiled, you can't support it." Maybe you can throw this steaming pile of ...code... back over the fence.

    BTW: Dynamic SQL doesn't have the benefit of the optimizer remembering the execution plan for it. Generally. There might be recall if the string is [exactly] like the same one executed earlier but if there's a different variable or different value somewhere, not so much.

    Adam Machanic wrote a "who_is_active" proc (http://sqlblog.com/blogs/adam_machanic/archive/2012/03/22/released-who-is-active-v11-11.aspx) that might be helpful for troubleshooting your blocking problem. Personally, I wish it wasn't in "stored proc" form because some places are [really] skittish about creating permanent objects. Running a simple query is a little more ...ahem... "under the radar". Ironically, the stored proc runs dynamic SQL. In this case, it's OK because it's not being executed thousands of times an hour.

  • Yes, of course this Dynamic SQL is just garbage. Joins (unfiltered) 5 tables and a view and then adds in two cross applys to the output. All that is part of a loop that builds the final chunk of product data, including image files, in segments. All things considered, the performance is pretty good!

    I've used more than my fair share of dynamic sql but for valid performance reasons, this code should not exist.

    I've been using Adam's WhoIsActive for a while but I didn't catch that release. Thanks. As you guessed earlier, though, everyone is just in there as 'WebClient'. And once an other round of 'the site is hung' fire drill starts, there's little time for analysis.

    My best guess as to what's happening is that even when blocking is going on everything can resolve unless the app inappropriately aborts a thread (leaving the connection hanging) for the thread that was the winner of a deadlock. I can see clusters of request timeouts scattered throughout the day. But only occasionally does it go to the extent that the users report the db being 'hung'. Whenever that occurs I also see a thread abort record in the web app trace log categorized as a thread abort error.

    I could be way out in left field with that idea. The first day I was here the db 'hung' and we ran a version of Who Is Active. Every time you killed the blocking spid, the next spid blocked everything until after killing a dozen spids the problem still continued and we bounced the service. On that day it all had to do with writing temp tables--hence that night I split out tempdb to its own drive and multiple files. I need to catch it again and see if the behavior is the same.

    No dice on getting the consulting company to fix this. All ties have been severed and relations are ugly. The internal answer is to build a new site from the ground up. I've been down this road before and it doesn't look like they have a good idea of what the scope of that project is. Looks like a train wreck coming...

  • I've had the opportunity to see this Qure Workload Analyzer gizmo in action (http://www.dbsophic.com/qure-analyzer.html) and it [might] help your cause. I think it's still free. The one requirement that might make it difficult for you is you need a dev database loaded up with a pretty recent copy of prod. The idea is to capture all activity (a decent representation of what 'the usual' would be) then use the app to load the trace file and point it at dev. It re-runs the activity pounding the heck out of the db and it comes up with some recommendations. If you have the resources, it's worth trying. On the 'up' side, gathering [all] activity even for a brief period of time can be a valuable reference. Maybe only 15 minutes or so might be enough.

    Because you're dealing with a third party app and the vendor relationship is shot, your options are sort of limited. I really hate to sound like a butthead but sometimes hardware [is] the answer. If you're running SQL 2005 I'm certain it's 32 bit running on a 32 bit OS, which probably is only recognizing 3 Gb of your server's memory unless you have some hotshot server guy that tweaked the server with the ol' PAE trick.

    With enough horsepower you can make a brick fly and bricks don't have wings. If you could persuade management to upgrade to SQL 2008 64 bit, you'd be closer to making this pig ...sorry... "brick" fly. SQL 2005 [is] at the end of it's support life cycle... <wink><wink><nudge><nudge>

    I'll put a quarter in the "you contradicted your earlier post, Ken" jar.

  • Must be frustrating (for all concerned) to be a consultant consulting on things you aren't qualified to consult on!! :ermm: How did that happen - did they bring you in as DEV and tag you as DBA after the fact?

    For severely problematic database applications like this you REALLY need to have experience doing performance analysis and tuning. You could go back and forth for days if not weeks and still not have a functioning product. Consider having the company bring in an expert to identify/solve issues while also mentoring you on how to do the same so you can keep the trains on the tracks in the future.

    Good luck in any case.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Ah, yes, the imperfect world thing. No where on my resume do I call myself a DBA, I market only as a SQL Server TSQL Developer, and it was a point of discussion during the interview. At that time the hire was for a developer to handle the data layer for the re-write of the company's retail site. They hired a high-end consulting firm to review their data solution (actual DBA consultants). Fabulously, the only thing they could find for me was the powerpoint presentation and none of the reports with actual details on what they reviewed. I just got back from a weekend away and it looks like there is now a link to some additional reports that the consulting company left. However, their assessment seems to be that the real issue lies in how the web application is utilizing the database, rather than the database.

    Another development over the weekend is the hire of a CIO.

    Upon my arrival, here, they had a few road blocks to the development work and since production was 'hanging' and the 'fix' was to bounce the SQL Service, it seemed prudent for me to give it a try.

    It is a very small company and, as you may have guessed they are pretty disorganized and overworked.

    Looks like they had multiple episodes of high request timeouts while I was out. However the SQL Service hasn't been restarted since Thursday so perhaps they only bounced it once.

    Based on request timeout activity over the weekend and this morning I might finally capture an episode that ends up in a user request for a 'reboot' on a trace this morning, perhaps that will shed some light on things.

  • I believe the phrase is "accidental DBA" <grin>

    Hopefully the new CIO will recognize this mess for what it is and allocate resources/budget/etc. to fix it properly.

    Re: "High end consulting firm"

    <sarcasm>

    Golly. Clearly [that] was money well spent.

    </sarcasm>

    Let us know what you find. There may be others out there with the same problem that might benefit from your discoveries.

    <LOL> Maybe some of them have the same crappy app written by the same chimps...

  • An excellent term, accidental dba. An accident for me but not a surprise. There is no one on this team that I can functionally discuss the difference between a database developer and a dba. I went through that schtick at the interview and it apparently didn't sink in. I keep being in meetings where someone on the phone asks if I'm 'the new dba'.

    Perhaps its the same issue that I had when I was employed by a very small company: they need a dba but can't or don't want to pay for a real dba. However I actually think its just that they don't know or understand the difference. Even not being a dba, my knowledge of SQL Server is still years ahead of the other developers here.

    In any case they are stuck with my skills for the moment and I will train them as I go. I knew this was a potential train wreck when I signed on: development department of 2 tries to perform complete rewrite of enterprise web application in 4 months. Yeah, that sounds good cause those 2 guys were just sitting on their thumbs before this project. YIKES!!!

  • Yeah, that's the way it usually goes. You see it in the classifieds all the time - "Wanted: IT Manager. Responsibilities include administration of Exchange, SQL Server, Oracle, PostgreSQL, Windows file servers, print servers, fax servers, FTP servers, BlackBerry servers, printers, fax machines, phones, mobile devices and workstations with Windows Office. Programming experience a plus. Occasional after hours support."

    Basically a one-person-IT-department.

    <lol> Your 'interview' comment reminds me of an interview I suffered through a while ago. They were asking [really detailed] "What would you do if...?" questions. Finally I asked why they were asking these types of questions and they admitted they had some broken databases and wanted to know how to fix them. I ended the interview and left.

  • <lol> Your 'interview' comment reminds me of an interview I suffered through a while ago. They were asking [really detailed] "What would you do if...?" questions. Finally I asked why they were asking these types of questions and they admitted they had some broken databases and wanted to know how to fix them. I ended the interview and left.

    Free consulting from an interviewee!!! NICE! :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Fly Girl

    Did you get a chance to practice your DR yet? Reply if you run into any snags.

    Cheers,

    Ken

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply