SQLServerCentral Article

When Do You Pull Your Hair Out?

,

When Do You Pull Your Hair Out?

Another "In the Real World" adventure.

I'm sure that after reading about a few of my "In The Real World" adventures, there are more than a couple of you that are starting to doubt my abilities. I know, I know, I probably would too since it seems like I've had more than my share of disasters.

However, I received a few comments that state there are at least a couple of you that like them and have learned a few things, hopefully saving some late nights and angry spouses, so I'm once again opening up to look at an issue. I hope you enjoy it and please feel free to send comments, good or bad.

Which Batch Is It?

We were embarking upon a major upgrade to our intranet and extranet site. We had purchased some fancy content management / deployment/ caching solution for our dot com site and wanted to leverage that and make the internal sites more consistent, eliminating some of the one off development that had taken place over the last few years.

Naturally the site is backed by SQL Server. Or at least partially backed since quite a bit of caching occurs. Suffice it to say this is a major project, using quite a few people full time, probably a dozen, plus part timers from all facets of IT, network, infrastructure, etc.


As with more than a few projects in my career, I got called a couple weeks ago to join in. 2 weeks before it was going live. In fact, as I write this, it's supposed to go live tonight. And it's delayed.

Whew.

At least for me. However, I do get to support it and we were all set to go live until I got a call this morning. Now I've been called a few times to check things over the last few months, have found quite a few problems in various things, butusually I've been easily able to trace down the problem. Today I got "there's a problem with the database server".

Duh. Like I ever get called when the web server tips over or the application server decides to go to lunch. So I fire up Terminal Services and connect to the server. Right click the task bar, select Task Manager, and do a spot check on the Performance tab. CPUs idling nicely around 10%, memory at steady state, 100MB physical memory available. 2 (hyper-threaded) CPUs, 4GB RAM, what I expect. A quick bop to the Processes tab (sorted by CPU% descending) shows the same thing. The idle process eats up around 80-90% of the CPU, SQL Server taking the rest.

OK, the server seems fine from the ceiling, let's dive in. Open Query Analyzer (Real DBAs use QA, not that EM pig). Things not working, got a reference to the ContentLink table in email, so a quick

select top 10
   *
  from ContentLink

Lightning-fast-sub-second-quick-as-you-can-lickety-split it comes back. My baby is humming!

OK, so that isn't working. I've only been working this for a couple minutes, but already a couple of the people on the project have wandered down to my floor. Understandable since they are supposed to go live tonight, but I'd prefer they call, not stand around behind me. Ah well, the price of being popular (or in this case, a nice scapegoat), so I continue.

Let's try the query that someone was having issues with.

select top 10
   *
  from ContentLink
  where ContentLinkKey = 3452

Click ALT-X and.................

Nothing. A nice imitation of the earth, at least as nice as a 10x10 pixel image can be, is rotating and the handy dandy counter in the lower right corner of QA diligently counts off the seconds, but no data.

I hate that red "Stop" icon in QA. Much prefer it to be greyed out. Oh well, need to look intelligent, people are watching.

"That's strange, " I say thoughtfully, and hopefully it appears I know what I'm doing. I'm slightly worried I have another "Lost Data Chains" incident. That I'm worried about because it makes the product look bad and makes me look bad for not being proactive fixing it. I decide to divert their attention (on the off change they know more than how to spell S-Q-L).


"Seems like something's blocked, " stroking my chin, I think I look smart. Quickly I cancel and run an "sp_who2".

Shazam!!! I'm good. While resisting the urge to do a little "go steve, go steve, it's your birthday" dance, I scroll down and see there are numbers in the "BlkBy" column. Well actually there are always numbers, but in this case many of them are not zero.

I mutter a little as I trace back through the blocks. Of the 120 connections, something like 20 are blocked with 5 different SPIDs, but I quickly narrow it down to the offending spid. SPID 54.

"It's SPID 54, " I say as though everyone in the world knows. They should, I highlighted 54 on my screen. I look back to a series of blank zombie stares and realize I'm working too hard to impress this group. I probably had them at the Task Manager phase.

Sheep I think and run a dbcc inputbuffer(54) and a dbcc opentran to get the last batch and the time the transaction started. The batch isn't familiar and is more than 255 characters. I paste it on the screen.

begin transaction
if (select * from ContentLink where ParentKey = 2345)
   update ContentLink
      set ContentLinkKey = 5
  where ContentLinkKey = 2345
else
   insert ContentLink select 2345, 5

"Well that's not right. That's a syntax error". I hear a voice behind me.

I'm glad someone is paying attention. I quickly realized the problem, but I hate pointing out issues in other people's code if I can avoid it. Nice to have them see their own mistakes. We pull up the code from a test server and see

begin transaction
if (select * from ContentLink where ParentKey = 2345)
   update ContentLink
      set ContentLinkKey = 5
  where ContentLinkKey = 2345
else
   insert ContentLink select 2345, 5
commit transaction

We make the change and things are working again. A little testing and we are looking good.

Conclusions

The lack of a closed transaction was causing locks to stack up. What was incredibly difficult in finding this problem was the large amount of code in this system. There are many places were embedded SQL is built by processes for use in migrating data from one server to another. And a tremendous number of read queries for the content application. Wading through all this to find them problem transaction (there were dozens of correctly structured transactions) took quite a bit of time.


However, being able to "guess" that there was an open transaction helped in setting Profiler filters and narrowing the list of queries down. Hopefully none of you run into this, but if you see locking issues, check for transactions.


As always I welcome your feedback using the "Your Opinion" tab below.

Steve Jones

©dkRanch.net July 2003


Return to Steve Jones Home

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating