September 7, 2012 at 12:24 pm
I am running Data Collector from my SQL 2008 R2 and SQL 2012 servers to a SQL 2012 Data Collector Warehouse database.
I noticed that it's getting unbearably slow when I try to drill down to get query detail information (after you look at the summary report and it lists the top 10 queries, then click on a particular query).
It seems that one particular part of the execution plan of the snapshots.rpt_query_stats procedure seems to be the culprit. (see screenshot at Query Plan)
The collector warehouse database is currently at about 60 GB and the table for the query stats has about 30 GB of data.
I am only keeping about 7 days worth of data. I am hesitant to reduce the amount of data so I would like to know if there is a fix for this issue before reducing the data retention.
Thank you,
John
September 18, 2012 at 1:57 pm
Ok, no one got back to me on this and I was finally able to pinpoint the problem. It's definitely an issue with SQL Server 2012's version of the data collector warehouse DB schema (as of version 11.0.2325 at least).
I compared the schema of the 2008 R2 and the 2012 warehouse databases. I found out that the snapshots.rpt_query_stats procedure definition has changed, for the worse.
If you look at around line 73, you'll see that the FROM definition changed
SQL 2008 R2
FROM snapshots.query_stats AS stat
SQL 2012
FROM
(
SELECT *, DENSE_RANK() OVER (ORDER BY plan_handle, creation_time) AS plan_number
FROM snapshots.query_stats
) AS stat
Cost of query relative to batch in SQL 2008R2 is 57%.
Cost of query relative to batch in SQL 2012 is 100%.
The results of this is that when I click on a query in the query stats report to get the detail information, the query that took only a few seconds on my 60 GB SQL 2008 R2 database now takes about 8 minutes on my SQL 2012 database. Even when I reduce the data retention to 3 days dropping the database size to about 10 GB, the query is still taking 1.5 minutes.
Can someone please confirm my results?
Can someone also please tell me how to get this reported as a bug for 2012? (or if it's already reported as a bug)
September 18, 2012 at 2:17 pm
Could be an indexing issue as well. Not using MDW so only a guess.
September 18, 2012 at 2:57 pm
Yeah, it is doing a scan. I don't want to touch it because it's a Microsoft database schema. I'd like to know how to report this in if someone can confirm it's actually an issue and I'm not mis-diagnosing this.
Thanks,
John
September 18, 2012 at 3:38 pm
May be MS, but that shouldn't stop you from adding an index if it helps. I worked ith PeopleSoft developers and added several indexes to PeopleSoft systems to support system performance. We didn't wait for Oracle to make those changes (and sometimes the did and others they didn't). We simply supported them as enhancements if they didn't.
September 19, 2012 at 12:53 pm
I captured the query that is causing the issue and ran it individually so I could get the statistics IO and the actual execution plan.
Here is the query I ran (with the server name replaced):
SET STATISTICS IO ON
exec snapshots.rpt_query_stats
@instance_name=N'MyServer'
,@end_time='2012-09-19 18:20:02'
,@time_window_size=240
,@sql_handle_str=N'0x030009007c46a5375dfbc80082a000000100000000000000'
,@statement_start_offset=4852
,@statement_end_offset=9230
Here is what I'm seeing. I don't see any quick indexing fix for this. Can you take a look and see what you think?
I/O Stats:
Table 'snapshots_internal'. Scan count 1, logical reads 712, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'snapshot_timetable_internal'. Scan count 1, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'snapshots_internal'. Scan count 35, logical reads 110, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'snapshot_timetable_internal'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#A6609559'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'source_info_internal'. Scan count 0, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'snapshot_timetable_internal'. Scan count 0, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'snapshots_internal'. Scan count 0, logical reads 678, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'query_stats'. Scan count 1, logical reads 326201, physical reads 2, read-ahead reads 3690, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Execution Plan Screenshot:
Sort Warning Screenshot from Execution Plan:
Clustered Index Scan from Execution Plan:
September 21, 2012 at 8:21 am
My fix to this was just to replace the FROM statement that has the ranking function with the original FROM statement. I know it was probably a bug fix, but it was a matter of potentially having the plan count wrong and being able to generate the report in 5 seconds vs having the correct plan count right and waiting 8 minutes.
Or, here is the 2008 R2 version of the procedure that you could just run on your 2012 Management Data Warehouse database (make sure you make a copy of the original just in case):
ALTER PROCEDURE [snapshots].[rpt_query_stats]
@instance_name sysname,
@start_time datetime = NULL,
@end_time datetime = NULL,
@time_window_size smallint,
@time_interval_min smallint = 1,
@sql_handle_str varchar(130),
@statement_start_offset int,
@statement_end_offset int
AS
BEGIN
SET NOCOUNT ON;
-- @end_time should never be NULL when we are called from the Query Stats report
-- Convert snapshot_time (datetimeoffset) to a UTC datetime
IF (@end_time IS NULL)
SET @end_time = CONVERT (datetime, SWITCHOFFSET (CAST ((SELECT MAX(snapshot_time) FROM core.snapshots) AS datetimeoffset(7)), '+00:00'));
IF (@start_time IS NULL)
BEGIN
-- If time_window_size and time_interval_min are set use them
-- to determine the start time
-- Otherwise use the earliest available snapshot_time
IF @time_window_size IS NOT NULL AND @time_interval_min IS NOT NULL
BEGIN
SET @start_time = DATEADD(minute, @time_window_size * @time_interval_min * -1.0, @end_time);
END
ELSE
BEGIN
-- Convert min snapshot_time (datetimeoffset) to a UTC datetime
SET @start_time = CONVERT (datetime, SWITCHOFFSET (CAST ((SELECT MIN(snapshot_time) FROM core.snapshots) AS datetimeoffset(7)), '+00:00'));
END
END
DECLARE @end_snapshot_time_id int;
SELECT @end_snapshot_time_id = MAX(snapshot_time_id) FROM core.snapshots WHERE snapshot_time <= @end_time;
DECLARE @start_snapshot_time_id int;
SELECT @start_snapshot_time_id = MIN(snapshot_time_id) FROM core.snapshots WHERE snapshot_time >= @start_time;
DECLARE @interval_sec int;
SET @interval_sec = DATEDIFF (s, @start_time, @end_time);
DECLARE @sql_handle-2 varbinary(64)
SET @sql_handle-2 = snapshots.fn_hexstrtovarbin (@sql_handle_str)
SELECT
REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (
LEFT (LTRIM (stmtsql.query_text), 100)
, CHAR(9), ' '), CHAR(10), ' '), CHAR(13), ' '), ' ', ' '), ' ', ' '), ' ', ' ') AS flat_query_text,
t.*,
master.dbo.fn_varbintohexstr (t.sql_handle) AS sql_handle_str,
stmtsql.*
FROM
(
SELECT
stat.sql_handle, stat.statement_start_offset, stat.statement_end_offset, snap.source_id,
SUM (stat.snapshot_execution_count) AS execution_count,
SUM (stat.snapshot_execution_count) / (@interval_sec / 60) AS executions_per_min,
SUM (stat.snapshot_worker_time / 1000) AS total_cpu,
SUM (stat.snapshot_worker_time / 1000) / @interval_sec AS avg_cpu_per_sec,
SUM (stat.snapshot_worker_time / 1000.0) / CASE SUM (stat.snapshot_execution_count) WHEN 0 THEN 1 ELSE SUM (stat.snapshot_execution_count) END AS avg_cpu_per_exec,
SUM (stat.snapshot_physical_reads) AS total_physical_reads,
SUM (stat.snapshot_physical_reads) / @interval_sec AS avg_physical_reads_per_sec,
SUM (stat.snapshot_physical_reads) / CASE SUM (stat.snapshot_execution_count) WHEN 0 THEN 1 ELSE SUM (stat.snapshot_execution_count) END AS avg_physical_reads_per_exec,
SUM (stat.snapshot_logical_writes) AS total_logical_writes,
SUM (stat.snapshot_logical_writes) / @interval_sec AS avg_logical_writes_per_sec,
SUM (stat.snapshot_logical_writes) / CASE SUM (stat.snapshot_execution_count) WHEN 0 THEN 1 ELSE SUM (stat.snapshot_execution_count) END AS avg_logical_writes_per_exec,
SUM (stat.snapshot_elapsed_time / 1000) AS total_elapsed_time,
SUM (stat.snapshot_elapsed_time / 1000) / @interval_sec AS avg_elapsed_time_per_sec,
SUM (stat.snapshot_elapsed_time / 1000.0) / CASE SUM (stat.snapshot_execution_count) WHEN 0 THEN 1 ELSE SUM (stat.snapshot_execution_count) END AS avg_elapsed_time_per_exec,
COUNT(*) AS row_count, COUNT (DISTINCT stat.creation_time) AS plan_count
FROM snapshots.query_stats AS stat
INNER JOIN core.snapshots snap ON stat.snapshot_id = snap.snapshot_id
WHERE
snap.instance_name = @instance_name
AND stat.sql_handle = @sql_handle-2
AND stat.statement_start_offset = @statement_start_offset
AND stat.statement_end_offset = @statement_end_offset
AND snap.snapshot_time_id BETWEEN @start_snapshot_time_id AND @end_snapshot_time_id
GROUP BY stat.sql_handle, stat.statement_start_offset, stat.statement_end_offset, snap.source_id
) t
LEFT OUTER JOIN snapshots.notable_query_text sql ON t.sql_handle = sql.sql_handle and sql.source_id = t.source_id
OUTER APPLY snapshots.fn_get_query_text (t.source_id, t.sql_handle, t.statement_start_offset, t.statement_end_offset) AS stmtsql
-- These trace flags are necessary for a good plan, due to the join on ascending PK w/range filter
OPTION (QUERYTRACEON 2389, QUERYTRACEON 2390)
END
GO
Hopefully this will get fixed in an upcoming patch of 2012.
June 2, 2014 at 5:32 am
Thank you for providing us this solution. I was looking to this issue for several weeks. My query_stats table is also very big: 30,000,000 rows for only 1 managed instance with 7 days data retention.
I put page compression on the table, and changed the frequency of the data collections to 120 sec.
Tried to add indexes from the DMV missing indexes, but nothing helped. I faced the same problem as you, having to wait more than 10 minutes to get the query details report. Now with the 2008R2 version of rpt_query_stats, I get it in 5 seconds.
You make me very very happy today. FYI, my SQL 2012 version is 11.0.3393.
June 4, 2015 at 4:42 am
You also made my day a bg thnaks... :o)..., I had actuallly made a MS Support case, but they couldn't help me...
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply