June 3, 2021 at 11:06 am
We've made the switch to SQL Server 2019 and have implemented Always On Availability Groups. We have two nodes (Primary and secondary SQL Servers) and a file share witness. Both primary and secondary SQL Servers are on VMs and they are the only thing on those VMs.
Everything seems to be going great until we migrated our Datawarehouse SSIS project. There's one step that randomly causes the following error to appear:
SQL Server hosting availability group 'xxxxx' did not receive a process event signal from the Windows Server Failover Cluster within the lease timeout period.
The step in question loads over 17 million records and can take 12 or more minutes to run. Due to 3rd party constraints, we have to reload this piece of the warehouse in it's entirety every night. It's not possible to get only updates, inserts and deleted records. Sometimes this piece runs fine and when it does fail, we are able to re-run the package from the point of failure without issue so it runs fine after that.
We've increased the session timeout from 10 seconds to 20 seconds and last night we increased the Lease timeout from 20 to 30 seconds and the Heath Check Timeout from 30 to 40 seconds after if bombed again.
We are checking with our network team to see if anything else is running on the VMs for backups etc in the timeframe that this step generally runs in.
What else should we be looking at?
June 3, 2021 at 3:47 pm
There are a lot more questions that need to be researched and answered:
Additionally, extracting and loading 17 million rows shouldn't take 12 minutes on a decently sized system. I would focus on identifying what is taking so long to either extract that data - or load it. Some options to review:
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 3, 2021 at 5:46 pm
Secondary is non-readable. Sole purpose is for fail over and making it the primary.
Source and desitination are SQL Server.
Package is running on Server A which is the primary server.
Yes, packages are in the Integration Catalog.
We are using a SELECT command (INSERT INTO ... SELECT FROM Staging_Table A etc
I'm not aware of setting a batch size when doing INSERT INTO... SELECT FROM
June 5, 2021 at 11:15 am
Hi, check resource utilization especially IO (disk latency, disk queue length), CPU or SQL dumps. It is often caused by SQL extremely high IO resource utlizaiton, so high that Windows has problems with other processes (eg. can't rdp to server or respond from cluster). Most time it is disk queue read length overloaded by SQL during eg. backup configured to too many files, dbcc checkdb, or query that is doing many reads.
If so then check if there are any other heavy processes that are running same time or find some way to optimize this process.
Whole process is just load stg_tag and insert into tab from stg_tab no data validation etc.?
June 7, 2021 at 3:13 pm
Secondary is non-readable. Sole purpose is for fail over and making it the primary.
Source and desitination are SQL Server.
Package is running on Server A which is the primary server.
Yes, packages are in the Integration Catalog.
We are using a SELECT command (INSERT INTO ... SELECT FROM Staging_Table A etc
I'm not aware of setting a batch size when doing INSERT INTO... SELECT FROM
Is the cluster the source system - or the destination?
Are the source/destination systems in the same data center - same VM farm?
Now, you stated you are using an INSERT INTO? If that is correct - why are you doing that? You should be using an OLEDB Source - connected directly to the source system and an OLEDB destination, again connected directly. On the OLEDB destination, you need to set the batch/commit sizes and not use the default values.
If you are actually using INSERT INTO - it seems you are using SSIS to run code that pulls data across a linked server. That just doesn't make any sense - if you are going to do that then why involve SSIS?
Since you are using the catalog - you can review the reports for the project and see what step is taking the most time. Once you have that you can start reviewing the project to determine what can be done to improve the performance.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply