December 27, 2013 at 9:15 am
I'll capture the wait types the next time this issue occurs, currently it's not happening on any servers, but that will surely change within a day or so. I'll also capture the actual execution plan from when it's happening.
Is there any other information that you would like me to have?
I read up on parameter sniffing, it sounds reasonable except for the idea that we would be getting "out of the ordinary" parameters, the parameter being passed is always a bigint, would this occur from one bigint to another? From the sql technet articles I read through, it looks like this mostly happens for like statements.
December 27, 2013 at 9:20 am
Banks1850 (12/27/2013)
I read up on parameter sniffing, it sounds reasonable except for the idea that we would be getting "out of the ordinary" parameters, the parameter being passed is always a bigint, would this occur from one bigint to another? From the sql technet articles I read through, it looks like this mostly happens for like statements.
Parameter sniffing has nothing to do with data types and is not restricted to LIKE comparisons.
btw, why the multiple redundant indexes?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 27, 2013 at 9:31 am
they're not redundant, 1st post is for the versionfileinfo table
and second is for the userfileinfo table
in terms of our application, one is for our block storage, and the other is for the user files that are backed up.
the query in question is nests a versionfileinfo query inside a userfileinfo query so I thought I should add both tables and all the indexes for both.
the reason why I mentioned the "Like" part of parameter sniffing was because the Technet article referenced it as especially prone to this activity.
http://blogs.technet.com/b/mdegre/archive/2012/03/19/what-is-parameter-sniffing.aspx
December 27, 2013 at 9:37 am
Banks1850 (12/27/2013)
they're not redundant, 1st post is for the versionfileinfo tableand second is for the userfileinfo table
No, you have redundant indexes on each table
CREATE CLUSTERED INDEX [ixc_BackupSetId] ON [dbo].[VersionFileInfo]
(
[BackupSetId] ASC
)
CREATE NONCLUSTERED INDEX [ix_Cleanup] ON [dbo].[VersionFileInfo]
(
[BackupSetId] ASC,
[VersionFileInfoId] ASC
)
INCLUDE ( [VersionFileLocalNameId],
[FirstUserFileInfoId],
[LastUserFileInfoId])
CREATE NONCLUSTERED INDEX [ix_JoinColumns] ON [dbo].[VersionFileInfo]
(
[BackupSetId] ASC,
[VersionFileLocalNameId] ASC,
[FirstUserFileInfoId] ASC,
[BlockNumber] ASC
)
INCLUDE ( [VersionFileInfoId],
[LastUserFileInfoId],
[FileInfoId])
Two nonclustered indexes both with the clustered index key as the leading column.There are very few cases where that's useful. Are the queries that use the nonclustered indexes so time-sensitive (to the ms level) that they can't rather seek on the clustered index?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 27, 2013 at 10:18 am
oh, I see, yes, that's not my call, I'm not allowed to actually change the database schema at all, I'd have to go to our lead engineer and put in a ticket to see if they are using it or planning to use it in the future. To be honest, I haven't worried about it that much because of all the issues. This is such a new schema that I haven't even wrapped my head around it completely yet. It's less than 3 months old and we've had these issues for 2 plus months now so...
December 27, 2013 at 1:17 pm
oh, forgot to add, the wait info is null on those statements.
December 27, 2013 at 2:20 pm
And the last_wait_type column?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 28, 2013 at 5:54 am
Banks, do you have a detailed description what has been changed compared to the previous version (that seems to have performed as expected)? Especially in terms of communication between app and database...
Maybe this will help to narrow down the issue.
As already suggested: can you reproduce the long running code using SSMS while the app shows such a slow response?
What did the Profiler show in terms of the queries in question? Especially in terms of duration? Can you confirm that the duration is significantly higher than "usual"?
December 28, 2013 at 7:48 am
LutzM (12/28/2013)
Banks, do you have a detailed description what has been changed compared to the previous version (that seems to have performed as expected)? Especially in terms of communication between app and database...Maybe this will help to narrow down the issue.
As already suggested: can you reproduce the long running code using SSMS while the app shows such a slow response?
What did the Profiler show in terms of the queries in question? Especially in terms of duration? Can you confirm that the duration is significantly higher than "usual"?
pretty much everything has changed, they completely rebuilt the main tables (the two I listed) and now use a whole different schema. They also rebuilt the application and how it connects. It's essentially a brand new program connecting to a brand new schema but a schema with full tables and indexes already in it. That's the problem, the engineering department is small, no ability to "test" the product with high stress, add to that the fact that we are connecting to client machines which we have no control over, and you pretty much run into every possible connectivity issue imaginable.
I am of the belief that the SQL batches are waiting for a request from the application, but the application is stuck in a loop because of a socket disconnect, which won't resolve itself until that socket times out. But I have no way to prove this.
the traces show the individual queries in the stored procedures that are sitting there, are running very fast, but I don't know how to look at the connection and see what's wrong. If I could somehow prove that sql is not actually doing any work, but is in fact just waiting on an external thread to finish, then I could hand this off to engineering and wash my hands of this whole mess. Unfortunately I have no idea of how to do this. Which was why my initial request was a way to look at the appdomain.
December 28, 2013 at 7:51 am
GilaMonster (12/27/2013)
And the last_wait_type column?
Gail, last wait type for all of them is SOS_SCHEDULER_YIELD
December 28, 2013 at 9:02 am
Banks1850 (12/28/2013)
GilaMonster (12/27/2013)
And the last_wait_type column?Gail, last wait type for all of them is SOS_SCHEDULER_YIELD
Interesting.....
Definitely need to see an actual plan for that procedure when things are going badly.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 29, 2013 at 4:29 am
Lowell (12/27/2013)
Banks1850 (12/27/2013)
statistics are updated daily and automatically for all tables right after the index maintenance is performed.and there are none that are listed as out of date when I check.
I'm speaking from experience here: once a day is sometimes not enough for frequently updated tables.
stats get auto updated if 20% +500 rows of data are inserted/updated/deleted.
20% on a MillionBillion row table is a huge amount and probably won't happen before the next stats scheduled job, and it only takes a fraction(ie half a percent to 1%) for the stats to throw of existing execution plans out of whack.
especially since you mentioned the issue goes away when you bounce the service, because new execution plans are built that now take into consideration the current stats, I'm more than sure that your issue is stale statistics.
So it looks like SSChampion was right (thanks!), in my checklist of things to check when the issue began occuring again, I ran the "exec sp_updatestats" sp, and low and behold, within 5 minutes, the server calmed down and the queries began performing correctly again. Even though the statistics were updated the afternoon before. I guess the query I was using to check for stats out of date had a 1 day minimum, which was why it wasn't showing them as out of date, even though they were.
So Im going to do some digging into how fast these statistics go out of date and set up a job to check and update them much more frequently if required.
Thanks Guys.
Oh, question, does anyone have a good script that records statistics metadata (so I can show the proof to my bosses)? I'd like to 1) be able to show what I'm doing and why and 2) it would be nice if I could log some of this stuff to tell when these kick off and for what reason).
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply