query filling up tempdb

Question

query filling up tempdb

Jeffrey Williams

SSC Guru

Points: 90305
More actions
October 3, 2023 at 8:36 pm
Answer
#4300784

If the dev is stating they only want the latest 30 minutes of data - then the code is wrong. That code is pulling all of the data except the last 30 minutes - which means any results they are getting are invalid.
Start there - once that has been fixed, then worry about performance impact.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16167 More actions · Answer 1

vsamantha35 wrote:

I also see, dev team is using DISTINCT operator as a common practice. Is there any proper ways to re-write to the queries to avoid the usage of distinct operator but still get the desired results as expected? any good articles please suggest. I will share it with the dev team. I see this DISTINCT usage in almost all the join queries.
Regards,
Sam

whenever I see a DISTINCT I always consider the following options

bad data
bad code
bad habits (developers fault)

which makes me look really well at what the code is doing and query the developer on the WHY it needs a distinct.

same applies for when I see a GROUP BY without some aggregation being done on the select (some developers ty to outsmart the DBA's by using a group by to replace a distinct... bad bad developers!!)

there are valid uses of it though and not all are wrong. the case above may be a "valid" one.

cases where most likely isn't a valid one is those where the group by columns include "amounts".. in these 99.99% of the cases the distinct should not be used.

regarding your "NO LOCK" - with very very few exceptions it should not be used. and only after signoff from the business acknowledging that their data can be inaccurate and that they accept the risk.

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 2

frederico_fonseca wrote:

vsamantha35 wrote:
Jeff Moden wrote:
You said in your narrative in your original post that "it reads last 30 mins of data from the tables involved" and yet ALL of your queries have the following criteria.
 AND C1.LAST_UPDATE_DATE <= DATEADD(MINUTE, -30, GETDATE())
Do you see anything wrong with that criteria??? 😉
Jeff, could you please elaborate. I didn't get it. do you for see any problem?? meaning, except 30 min its reading the remaining hell lot of data? you mean, they should be adding more meaningful filter to reduce number of rows?? is that what you are asking?
Regards,
Sam
Jeff is right - I missed that detail when I asked you about what this was used for.
What Jeff is saying is that the code retrieves everything that is OLDER than 30 mins (e.g. the <= (less than or equal) on LAST_UPDATE_DATE <= DATEADD(MINUTE, -30, GETDATE())) while what you stated is the "it reads LAST 30 mins of data from the tables involved
so either the code is wrong or your statement is wrong (both me and Jeff think its the code)

@Sam... Frederico's assessment about what I meant is spot on. So is his later assessment of DISTINCT and using GROUP BY for the sole purpose of making a distinct return.

Also, if you really are using RCSI, then using (NOLOCK) is a complete waste of code realestate. Also, using (NOLOCK) by itself was deprecated years ago. MS says the proper form is WITH(NOLOCK) now. I'm pretty sure that Frederico and the others will join me in saying that the proper form is to actually NOT use it for production work.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeffrey Williams SSC Guru Points: 90305 More actions · Answer 3

Jeff Moden wrote:

Also, if you really are using RCSI, then using (NOLOCK) is a complete waste of code realestate. Also, using (NOLOCK) by itself was deprecated years ago. MS says the proper form is WITH(NOLOCK) now. I'm pretty sure that Frederico and the others will join me in saying that the proper form is to actually NOT use it for production work.

I think it is worse than just a complete waste of code - using NOLOCK with RCSI essentially disables RCSI and switches the isolation to read uncommitted instead of the database defined read committed snapshot isolation.

I am actually surprised the code works - normally table aliases are defined prior to table hints.

If you really want to use read uncommitted instead of the database defined RCSI then it would be much better to set the isolation level directly using: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 4

Jeffrey Williams wrote:

Jeff Moden wrote:
Also, if you really are using RCSI, then using (NOLOCK) is a complete waste of code realestate. Also, using (NOLOCK) by itself was deprecated years ago. MS says the proper form is WITH(NOLOCK) now. I'm pretty sure that Frederico and the others will join me in saying that the proper form is to actually NOT use it for production work.
I think it is worse than just a complete waste of code - using NOLOCK with RCSI essentially disables RCSI and switches the isolation to read uncommitted instead of the database defined read committed snapshot isolation.
I am actually surprised the code works - normally table aliases are defined prior to table hints.
If you really want to use read uncommitted instead of the database defined RCSI then it would be much better to set the isolation level directly using: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

I've not yet got a database that uses RCSI but will in the very near future so you have my immense thanks for the info in your first paragraph above.

And, yea.. totally agreed on the rest. The first thing they need to do though is fix that bloody criteria that I pointed out.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

vsamantha35 SSChampion Points: 11551 More actions · Answer 5

frederico_fonseca wrote:

vsamantha35 wrote:

Jeff Moden wrote:

so either the code is wrong or your statement is wrong (both me and Jeff think its the code)

The developer whom I spoke to mentioned that they need to process only last 30 mins data.

Regards,

Sam

vsamantha35 SSChampion Points: 11551 More actions · Answer 6

Hi frederico/Jeff Moden/Jeffrey,

I don't know how to ask for help but asking. This is eating my brain. Please forgive me.

We see these kind of poorly written queries every now and then with the agile process. Being from db operations background seeking some help from all of you.

I need small help in building up a case/story on this function call on how bad it is and what kind of affect will it have on the overall system performance with a server with 256GB RAM and 8 logical processors. If it is processing 40-50 Million rows every 30 mins (even though the first execution isn't finished they are making another function call). I can see high cpu condition and all other queries are waiting in runnable queue whenever this function is run but I an unable to explain to dev team and management of how bad it is. Want to put some numbers and explain it to layman/developer/manager on how bad it is to run on prod. Right now, after raising a concern the disabled it on sub-prod but sooner or later they will move it prod.

Again, thanks for taking time and providing those useful meaningful recommendations. Very thankful for that.

Regards,

Sam

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 7

vsamantha35 wrote:

frederico_fonseca wrote:

vsamantha35 wrote:

Jeff Moden wrote:
so either the code is wrong or your statement is wrong (both me and Jeff think its the code)
The developer whom I spoke to mentioned that they need to process only last 30 mins data.
Regards,
Sam

So tell the developer to fix the code. All they need to do on the bad criteria that I pointed out is to change the <= to a >= and Bob's your uncle because as the others have stated, the current criteria is finding all BUT the last 30 minutes. Do the flip around like I just said and most of your issues will simply vanish on the problem because then you'll only be pulling the last 30 minutes, as desired.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

vsamantha35 SSChampion Points: 11551 More actions · Answer 8

Thank you Jeffery and Jeff. I got it. I can sense that I shouldn't get into all sort unnecessary things rather focus on the problem.

I have informed them to make those code changes and let us know. Thank you for being patient with me. Sometimes I am little anxious and get nervous. Thank you very much.

Feeling gratitude!

Regards,

Sam