October 2, 2023 at 12:06 pm
I also see, dev team is using DISTINCT operator as a common practice. Is there any proper ways to re-write to the queries to avoid the usage of distinct operator but still get the desired results as expected? any good articles please suggest. I will share it with the dev team. I see this DISTINCT usage in almost all the join queries.
Regards,
Sam
whenever I see a DISTINCT I always consider the following options
which makes me look really well at what the code is doing and query the developer on the WHY it needs a distinct.
same applies for when I see a GROUP BY without some aggregation being done on the select (some developers ty to outsmart the DBA's by using a group by to replace a distinct... bad bad developers!!)
there are valid uses of it though and not all are wrong. the case above may be a "valid" one.
cases where most likely isn't a valid one is those where the group by columns include "amounts".. in these 99.99% of the cases the distinct should not be used.
regarding your "NO LOCK" - with very very few exceptions it should not be used. and only after signoff from the business acknowledging that their data can be inaccurate and that they accept the risk.
October 2, 2023 at 4:16 pm
vsamantha35 wrote:Jeff Moden wrote:You said in your narrative in your original post that "it reads last 30 mins of data from the tables involved" and yet ALL of your queries have the following criteria.
AND C1.LAST_UPDATE_DATE <= DATEADD(MINUTE, -30, GETDATE())Do you see anything wrong with that criteria??? 😉
Jeff, could you please elaborate. I didn't get it. do you for see any problem?? meaning, except 30 min its reading the remaining hell lot of data? you mean, they should be adding more meaningful filter to reduce number of rows?? is that what you are asking?
Regards,
Sam
Jeff is right - I missed that detail when I asked you about what this was used for.
What Jeff is saying is that the code retrieves everything that is OLDER than 30 mins (e.g. the <= (less than or equal) on LAST_UPDATE_DATE <= DATEADD(MINUTE, -30, GETDATE())) while what you stated is the "it reads LAST 30 mins of data from the tables involved
so either the code is wrong or your statement is wrong (both me and Jeff think its the code)
@Sam... Frederico's assessment about what I meant is spot on. So is his later assessment of DISTINCT and using GROUP BY for the sole purpose of making a distinct return.
Also, if you really are using RCSI, then using (NOLOCK) is a complete waste of code realestate. Also, using (NOLOCK) by itself was deprecated years ago. MS says the proper form is WITH(NOLOCK) now. I'm pretty sure that Frederico and the others will join me in saying that the proper form is to actually NOT use it for production work.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 2, 2023 at 7:01 pm
Also, if you really are using RCSI, then using (NOLOCK) is a complete waste of code realestate. Also, using (NOLOCK) by itself was deprecated years ago. MS says the proper form is WITH(NOLOCK) now. I'm pretty sure that Frederico and the others will join me in saying that the proper form is to actually NOT use it for production work.
I think it is worse than just a complete waste of code - using NOLOCK with RCSI essentially disables RCSI and switches the isolation to read uncommitted instead of the database defined read committed snapshot isolation.
I am actually surprised the code works - normally table aliases are defined prior to table hints.
If you really want to use read uncommitted instead of the database defined RCSI then it would be much better to set the isolation level directly using: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
October 2, 2023 at 7:08 pm
Jeff Moden wrote:Also, if you really are using RCSI, then using (NOLOCK) is a complete waste of code realestate. Also, using (NOLOCK) by itself was deprecated years ago. MS says the proper form is WITH(NOLOCK) now. I'm pretty sure that Frederico and the others will join me in saying that the proper form is to actually NOT use it for production work.
I think it is worse than just a complete waste of code - using NOLOCK with RCSI essentially disables RCSI and switches the isolation to read uncommitted instead of the database defined read committed snapshot isolation.
I am actually surprised the code works - normally table aliases are defined prior to table hints.
If you really want to use read uncommitted instead of the database defined RCSI then it would be much better to set the isolation level directly using: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
I've not yet got a database that uses RCSI but will in the very near future so you have my immense thanks for the info in your first paragraph above.
And, yea.. totally agreed on the rest. The first thing they need to do though is fix that bloody criteria that I pointed out.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2023 at 3:23 pm
Hi frederico/Jeff Moden/Jeffrey,
I don't know how to ask for help but asking. This is eating my brain. Please forgive me.
We see these kind of poorly written queries every now and then with the agile process. Being from db operations background seeking some help from all of you.
I need small help in building up a case/story on this function call on how bad it is and what kind of affect will it have on the overall system performance with a server with 256GB RAM and 8 logical processors. If it is processing 40-50 Million rows every 30 mins (even though the first execution isn't finished they are making another function call). I can see high cpu condition and all other queries are waiting in runnable queue whenever this function is run but I an unable to explain to dev team and management of how bad it is. Want to put some numbers and explain it to layman/developer/manager on how bad it is to run on prod. Right now, after raising a concern the disabled it on sub-prod but sooner or later they will move it prod.
Again, thanks for taking time and providing those useful meaningful recommendations. Very thankful for that.
Regards,
Sam
If the dev is stating they only want the latest 30 minutes of data - then the code is wrong. That code is pulling all of the data except the last 30 minutes - which means any results they are getting are invalid.
Start there - once that has been fixed, then worry about performance impact.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
October 4, 2023 at 12:07 am
Jeff Moden wrote:so either the code is wrong or your statement is wrong (both me and Jeff think its the code)
The developer whom I spoke to mentioned that they need to process only last 30 mins data.
Regards,
Sam
So tell the developer to fix the code. All they need to do on the bad criteria that I pointed out is to change the <= to a >= and Bob's your uncle because as the others have stated, the current criteria is finding all BUT the last 30 minutes. Do the flip around like I just said and most of your issues will simply vanish on the problem because then you'll only be pulling the last 30 minutes, as desired.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 4, 2023 at 8:22 am
Thank you Jeffery and Jeff. I got it. I can sense that I shouldn't get into all sort unnecessary things rather focus on the problem.
I have informed them to make those code changes and let us know. Thank you for being patient with me. Sometimes I am little anxious and get nervous. Thank you very much.
Feeling gratitude!
Regards,
Sam
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply