Super long running queries (bad users)

  • I agree that killing connections is certainly not a good idea but have you thought about creating separate workload groups and classifications in Resource Governor - the idea is to assign all the "cowboys" to a limited workload group and assign your maintenance group to a highly rsourceful workload group. This way, you can ensure resources for your high value tasks, while limiting resources for the "cowboys". When the system isn't under stress, all the groups can potentially work well but when there's too much stress on the system, then your maintenance tasks will be surely prioritized as they will have more resources assigned to them. I still think this is just a temporary solution and the ideal one is to train the rookies and maybe a few "chalk and talk" sessions can get things back on track. Just a thought.

  • I like the idea of the wall of shame rather than the kill list. You may want to make it a little closer to home though and have it automatically fire off an email to the offender showing their runtime and the jobs it is blocking. You may also note in the emails that repeat offenders after X times will be cc'd to their manager as well. It's one thing to leave a mess in the sandbox, but dumping concrete in deserves a special kind of mutual respect.

  • sstanek 27058 (2/26/2015)


    I like the idea of the wall of shame rather than the kill list. You may want to make it a little closer to home though and have it automatically fire off an email to the offender showing their runtime and the jobs it is blocking. You may also note in the emails that repeat offenders after X times will be cc'd to their manager as well. It's one thing to leave a mess in the sandbox, but dumping concrete in deserves a special kind of mutual respect.

    A followup post seemed to indicate that the ad-hoc SQL users are employed by external clients. So a "wall of shame" or sending emails to the user's manager would potentially create political problems for which we as DBAs are not equipped to handle.

    Also, if you tell the user "I killed your process because ...", then that just opens a can of worms. First, they take it personally that YOU took action to kill THEIR process. Second, it implies that killing processes is done at the DBA's discretion, so they will attempt to argue with you about it and will ultimately escalate the issue to your manager.

    However, if you tell the user that there is this thing called a "resource governor" which is policy based, and you as the DBA have essentially no room to make exceptions, then that implies the only solution to the problem is for the user to optimize their SQL.

    Taking this approach is probably in the best interests of everyone involved:

    "Sorry, dude, I feel your pain... but these processes that run late into the night just won't work, and you're going to have to find a way to make them run in less than an hour. Maybe I can help. I've been looking at the SQL and here are few a few suggestions ..."

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Maybe the post was unclear. I agreed to not killing, and suggested an email to alert the users that their queries were excessively long duration to help with process improvement. After all, if no one says anything they assume all is well.

    If it is external users repeatedly hitting this then routing the emails to their 'handler' would be appropriate, as well as a link to the documented indexes suggested for large data queries ...

    Yes, we should not incite a flame war. Yes, to lessening of drama. But also yes, to pointing out issues as issues. Sometimes you have to tell them to put on their big boy britches when their work causes extra work for others.

  • As the main question asked is if there is a way to automatically blow off into oblivion queries running over n minutes:

    I have to agree that is not the way to go about it. I am not a fan of a wall of shame either for it delivers a message on how you want to react to adverse situations. This is not something I would highlight on my resume for what I see so far is a lack of effective communication. This needs to be brought up to your client (whome you of course report to ) and make them aware of the costs involved ASAP. Then a meeting should probably be held where you explain how you managed to optimise badly written code (though I wouldn't use those words) and how to have more fast coordinated stuff going forward. We all learn as we go along.

    ----------------------------------------------------

  • Eric M Russell (2/12/2015)


    Jeff Moden (2/12/2015)


    Killing even the most ridiculous users sessions is a really good way to need an updated resume. I know it takes a whole lot longer to do but proper use of the resource governor and training the cowboys will be a whole lot more effective over time. Management buy-in for all of it in the form of a company policy would help protect you and to set expectations.

    Just killing stuff is going to perpetuate and maybe make the problem worse.

    I agree 95%, but in a data warehouse environment, insuring a successful DDL script deployment or critical data reload often times requires a window where the DBA has exclusive use of the database. That's impossible if ad-hoc users have SQL brain farts running 24/7. The DBA should schedule maintenance for a time when management approves, and then give users due notice, but when these maintenance windows arrive, it's time to kick the kids out of the pool.

    Real mission critical querying takes place during regular business hours.

    I've never seen a DBA lose their job because they aborted zombie SQL processes that ran for half and day and into the night. In fact, often times it's management who orders the kill (behind the scenes with a wink and a nod). But I have seen a DBA lose their job because the database was generally no longer available to executive level users due to missed deployments or runaway resource consumption caused by lower level script kiddies.

    In an organization where SQL coders act like cowboys, the DBA has to play the role of Sheriff, and that sometimes means (trouble)shooting... to kill.

    I actually have to disagree about the timing of mission critical querying. Well meaning folks that need to get something big done know that the hinnies will be pasted to the wall if they seriously interfere with "normal" daylight hours and so will run their "critical" queries at night. And, make no mistake about it, some of them run queries that are very important to the business.

    I do, however, absolutely agree with you that it's just as or more mission critical for the DBA to have some windows of opportunity. As you said, the best thing would be to publish when those windows are with all the appropriate warnings if you intend to kill SPIDs.

    Shifting gears a bit, if it's truly a 24/7 shop, it's not like you can set the database to single user or kick anyone out of the database to get something done. That's when the "sheriff" finds a way to get all the Indians and Cowboys across the river to the other side using just one canoe. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • How about creating a readonly snapshot of the database and giving having them run their ownernight queries against the snapshot. That should give them what they need and give you a database without blocking queries. Lots of things to consider like how/when to create the snapshot and performance, disk space, alter connecting strings and other things but just throwing it out.

    https://msdn.microsoft.com/en-ie/library/ms175158(v=sql.100).aspx

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply