June 12, 2019 at 3:22 pm
We recently upgraded to SQL Server 2016 sp2 from 2008r2. We left the compatibility level 100(sql server 2008r2). We are now experiencing some application timeouts which we almost never did previously. Now the application(believe it or not) is still in VB6. We are not experiencing timeouts with sql server agent jobs or any sprocs that are running within SQL Server. Just when the application calls the database. I'm thinking of creating an extended event(and if you have one already please recommend) to capture the timeout with the hopes that I can gleen the sql out of it.
Any recommendations would be greatly appreciated.
June 12, 2019 at 7:33 pm
My opinion, that is one of those "it depends" problems. Is the timeout in code execution OR is the timeout in the connection?
If the timeout is in code execution, do you know which stored procedures are misbehaving?
If it is in the connection, is your firewall configured properly for the end users?
One fun one we ran into recently was with a stored procedure that had existed for ages. Having VB6 call it we got no errors or warnings or anything. But when the C# app being designed to replace the VB6 one was called we got errors. The problem was with data type implicit conversions resulting in data being truncated. There are a few easy fixes to our problem, we went with explicit conversion of the data type to a smaller one.
If it is in code execution AND you still have access to the 2008 R2 instance, might not hurt to look at what the remote query execution time was set to. If you don't, might not hurt to try increasing the default timeout.
That being said, it is more of a "band-aid" solution. When you say the query doesn't time out when run from SSMS, how long does it take to run? Are there improvements you can make to get better performance out of the query?
Extending the timeout might help get the application working, but as data grows the performance is likely to get worse and worse. Improving query performance may be a better starting point to fix this rather than extending the timeout.
Now, if it is timing out on connecting to the SQL instance, I would expect the problem to be more on the network or configuration side of things. When you upgraded, did you do an in-place upgrade or did you install a new instance and migrate the database over? There are benefits and risks to both methods, but I personally prefer the new instance migration method as it gives me a chance to validate the instance level settings and review the logins plus it gives you a very quick emergency rollback. It will take more time in the long run as you then need to map the users up to the logins for each database, but that isn't that tricky to do. You will also likely have the SQL instance running on a different port or different machine. This means that the application would need to be updated to handle connecting to the new machine/instance.
Either way, did you update statistics after performing the upgrade? That could be hurting query performance...
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
June 13, 2019 at 9:39 am
Thanks for your reply. Yes we did a side by side install. The timeouts are execution timeouts not connection timeouts and it happens sporadically. I did update statistics with full scan. I'm going to create an extended event to capture the sql that is timing out cause I can't always get the development group to give me the sql code. At this point being the dba, all I can do is try and insure the sql is optimized(i.e. recommend rewriting or enhancing indexes(i.e. epecially for key lookups). It's important to note that we left the compatibility level 100(2008r2) cause we didn't have a chance to test the compatibility level 130 for 2016.
June 13, 2019 at 9:40 pm
You could try turning on the legacy cardinality estimator. It'll almost be like running it without doing the upgrade unfortunately, but might be able to narrow things down a bit.
Since the timeouts are inconsistent it does make troubleshooting more difficult too. If it was always with a specific stored procedure call with specific arguments, you could look into why that is happening. It may be an easy fix in the one stored procedure (such as parameter sniffing), or it could be some weird complicated thing.
Since it is sporadically happening, it makes me think it is either a parameter sniffing problem or a blocking problem... or the absolute worst problem that I have run across - the "we didn't change our code so it must be the database" problem. That problem is they did change their code that day and were convinced it would not cause any problems. I got the call on that one and had asked about it and was told "no code was changed", so I dug into the database for a few hours tracing everything I could think of... then thought to check the timestamp on the file...
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply