December 6, 2022 at 7:27 pm
Thank you Grant Sir & Jeff.
No trace flags has been enabled on the instance. Also, we are rebuilding the indexes every 30 mins. still we are seeing this issue of SELECT on this table taking more time because of which some API calls are failing. There is no blocking , waittype =null when I used sp_Whoisactive.
Few followup questions.
=========================
Looking at the plan, how to can say it is scanning version rows? do we need check anything in xml plan and search for any keyword?
Is there a way to tell, there are AG unfinished transactions in which this particular table is involved?
Currently, AG is configured as readable SYNCRONOUS COMMIT? will there be any difference if we change it to ASYNC commit mode? What if we turn off RCSI on the database? they have set it 6-7 years back. They kept it because this application has huge BLOCKING as major bottleneck.
Other thing is, if ghost record count=0 which means that GHOST CLEANUP Task is working fine for every ~5 secs.
But what are these ghost_version_records? I am not able to visualize this situation on how these versions are maintained. This piece I am not getting it. Can you please elaborate with any small example which can lead to ghost_version_records / versions at different timestamps?
Gone this post which gave me idea on how these deleted records are ghosted.
http://www.sqlskills.com/BLOGS/PAUL/post/Inside-the-Storage-Engine-Ghost-cleanup-in-depth.aspx
Little background for the AG.
=====================
The database is around 4 TB size and tempdb of 2.5TB in size.
The databases data files and log files are separated.
Tempdb is located in separate drives.
1. On Primary replica, all major writes/reads happen w.r.t specific INFORMATICA application.
2. On Secondary replica, all the full backups and downstream applications which does reporting will hit the secondary replica with ApplicationReadIntent=True;
Note: readable secondaries was very recently setup 1 month before and from last week the application team is complaining that simple SELECT is taking 7-20 secs.
This specific application has major issues with BLOCKING as major bottleneck. They have cascading deletes happening. Several times told dev team to do deletes in small batches, but still we see heavy blocking caused by the deletes during weekends. They say they are using batch deletes, but deletes takes a lot of time.
I am having the suspicion that since they are doing cascade delete operations, for each parent table value (lets 101) and this value is present in 10 different child tables with 100 million rows in each child table, then specifying batch size of 1000 while performing delete on parent table, will it guarantee delete of only 1000 rows in child table as well when doing cascading delete? I am not sure that is happening now. Whenever this happens, we see high cpu as well.
If anyone has any inputs/ solution on handling cascading deletes in batches to keep the txn(s) short to lessen blocking please let me know.
Little background on the table in question
========================================================
The table is an INFORMATICA metadata table. It's kind of holds locking info such as on which tables what lock is taken and so n so forth. On this table there will be fast inserts/deletes happening all the time. If we keep doing a SELECT in a indefinite loop (1=1) with a delay , we hardly see 3-4 records and they are deleted from time to time very fastly.
Not really sure if this table is becoming a contention point/HOT Spot? don't know how we can prove it.
Will check with our manager if we can open an MS case by any chance.
Thanks for help everyone.
It's not traditional ghost rows. Those are from a simple delete and you'd never see what you're seeing (I stand by my very first post in this thread on that count). These are version rows because of the read committed snapshot isolation mode. This is further exacerbated by you synchronous commit. That means that ALL transactions must complete on the AG, however long it takes, until clean up occurs on the original server. Combine that with RCSI and your server must maintain versions of the data that actually exists (even though it has been deleted), and the stuff that has been deleted, the ghost version rows, not ghost rows, ghost version, which is what you're seeing scans of, because, that's how all this combined works.
Yeah, it's a mess, but it is expected. Follow the logic through and it'll make sense.
Also, Paul's discussion of ghost rows, well, I'd never argue with Paul about most things, and won't on this. However, that doesn't take into account the synchronous AG and RCSI. Both those came out after Paul's post was written. Nothing there is wrong. It's just not complete because it predates two additional behaviors within SQL Server.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 6, 2022 at 11:11 pm
Awesome info, Grant. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2022 at 9:36 am
Thanks everyone.
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply