August 11, 2017 at 4:43 pm
Below is one of the reporting query that is taking almost like more than 9 minutes to complete and as per the business they want me to look into tweaking or optimizing the code maybe by adding indexes or rewriting the query. Unfortunately, I am not that well versed in dealing with complex T-SQL. I have attached all the necessary schema and indexes that are on the table. Let me know if I need to provide anything else.
August 12, 2017 at 3:17 pm
ffarouqi - Friday, August 11, 2017 4:43 PMBelow is one of the reporting query that is taking almost like more than 9 minutes to complete and as per the business they want me to look into tweaking or optimizing the code maybe by adding indexes or rewriting the query. Unfortunately, I am not that well versed in dealing with complex T-SQL. I have attached all the necessary schema and indexes that are on the table. Let me know if I need to provide anything else.
In looking at your execution plan, I noticed that the actual number of rows is quite different than the estimated number of rows. That begs the question - how old are the statistics on those tables? The optimizer makes decisions based on the statistics, so if they're off, the odds of getting an optimal plan go down. I'd check this first. Please let us know if that does the trick.
August 13, 2017 at 12:28 pm
There are umpteen things at play here and given the way you phrased the OP and the blatantly obvious issues with the query itself I will state right up front that you have no hope of tuning this query given your knowledge. That isn't a personal condemnation, btw, just a VERY experienced observation. You need to get a professional involved. And quite possibly need some bigger hardware - which is what, btw?
1) You are scanning a 47.5M row table that is 184GB in size
2) You are SORTING said 184GB of data. That's a MASSIVE temp object operation.
3) You are carrying 1.8GB of data in 439K rows all the way to the client. That alone could take minutes of time AFTER the query itself is completed on SQL Server.
4) You are doing two HASH joins on said 439K rows
5) You are doing a second SORT on the 439K rows that due to the 1/4 actual estimated row count is spooling a bit to disk. This estimate being off is NOT even a rounding error in the grand scheme of things however.
6) You query wanted 239 GIGABYTES OF MEMORY, but was able to get "only" 24GB.
As for making it faster, hardware? Not trying to send 439K rows/1.8GB back to the client? Look for way to get the "top 1" without the full row number mechanism (with it's SORT). Given 439K total rows other methods such as correlated TOP 1 would likely suck too, but quite possibly not nearly as badly. Indexing could well help here (to avoid the row number SORT), but would bloat that massive table even further.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 14, 2017 at 6:22 am
I've had a lot of success rewriting these type of queries (where it's selecting ROWNUMBER = 1) as a cte with a select distinct followed by another nested query with a "cross apply(select top(1) " on the first query.
It depends on the data distribution but if you have a lot of rows with the same ObjectType, ObjectID then it should help
You will need to make sure the indexes specified at the end are on the tables.
;WITH cte AS
(
SELECT DISTINCT
ObjectType,
ObjectID
FROM dbo.SfQueuedTxUpdates
),
cte_MaxStatus AS
(
SELECT s.QueuedTxUpdateID,
cte.ObjectID,
s.QueuedAt,
s.ErrorMessage,
s.fkSfUpdateStatus,
s.fkSfUpdateAction,
cte.ObjectType,
s.ProcessedAt
FROM cte
CROSS APPLY(SELECT TOP(1)
QueuedTxUpdateID,
QueuedAt,
ErrorMessage,
fkSfUpdateStatus,
fkSfUpdateAction,
ProcessedAt
FROM dbo.SfQueuedTxUpdates s
WHERE s.ObjectID = cte.ObjectID
AND s.ObjectType = cte.ObjectType
ORDER BY QueuedAt DESC) AS s
)
SELECT MOT.ObjectName, /*mot.Description, */ ObjectID,
QueuedAt,
ErrorMessage,
ua.SfUpdateAction,
us.SfUpdateStatus,
cte_MaxStatus.ProcessedAt
FROM cte_MaxStatus
LEFT JOIN dbo.SfUpdateAction AS UA
ON cte_MaxStatus.fkSfUpdateAction = ua.SfUpdateActionID
LEFT JOIN dbo.SfUpdateStatus AS US
ON cte_MaxStatus.fkSfUpdateStatus = US.SfUpdateStatusID
LEFT JOIN Applications_HFF_30.dbo.ModuleObjectTypes AS MOT
ON cte_MaxStatus.ObjectType = MOT.ObjectTypeID
WHERE cte_MaxStatus.fkSfUpdateStatus <> 3
Indexes on: CREATE INDEX IX_SfQueuedTxUpdates_1 ON SfQueuedTxUpdates(ObjectID,ObjectType,QueuedAt)
CREATE INDEX IX_SfUpdateAction_1 ON SfUpdateAction(SfUpdateActionID)
CREATE INDEX IX_SfUpdateStatus_1 ON SfUpdateStatus(SfUpdateStatusID)
CREATE INDEX IX_ModuleObjectTypes_1 ON ModuleObjectTypes(ObjectTypeID)
August 14, 2017 at 6:41 am
Ed Wagner - Saturday, August 12, 2017 3:17 PMffarouqi - Friday, August 11, 2017 4:43 PMBelow is one of the reporting query that is taking almost like more than 9 minutes to complete and as per the business they want me to look into tweaking or optimizing the code maybe by adding indexes or rewriting the query. Unfortunately, I am not that well versed in dealing with complex T-SQL. I have attached all the necessary schema and indexes that are on the table. Let me know if I need to provide anything else.In looking at your execution plan, I noticed that the actual number of rows is quite different than the estimated number of rows. That begs the question - how old are the statistics on those tables? The optimizer makes decisions based on the statistics, so if they're off, the odds of getting an optimal plan go down. I'd check this first. Please let us know if that does the trick.
Where are you seeing that? For all the tables that appear in the execution plan, I'm seeing an exact match except on the largest table, which only has a difference of 40 rows compared to more than 47 million.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 14, 2017 at 7:44 am
Jonathan AC Roberts - Monday, August 14, 2017 6:22 AMI've had a lot of success rewriting these type of queries (where it's selecting ROWNUMBER = 1) as a cte with a select distinct followed by another nested query with a "cross apply(select top(1) " on the first query.
It depends on the data distribution but if you have a lot of rows with the same ObjectType, ObjectID then it should help
You will need to make sure the indexes specified at the end are on the tables.
;WITH cte AS
(
SELECT DISTINCT
ObjectType,
ObjectID
FROM dbo.SfQueuedTxUpdates
),
cte_MaxStatus AS
(
SELECT s.QueuedTxUpdateID,
cte.ObjectID,
s.QueuedAt,
s.ErrorMessage,
s.fkSfUpdateStatus,
s.fkSfUpdateAction,
cte.ObjectType,
s.ProcessedAt
FROM cte
CROSS APPLY(SELECT TOP(1)
QueuedTxUpdateID,
QueuedAt,
ErrorMessage,
fkSfUpdateStatus,
fkSfUpdateAction,
ProcessedAt
FROM dbo.SfQueuedTxUpdates s
WHERE s.ObjectID = cte.ObjectID
AND s.ObjectType = cte.ObjectType
ORDER BY QueuedAt DESC) AS s
)
SELECT MOT.ObjectName, /*mot.Description, */ ObjectID,
QueuedAt,
ErrorMessage,
ua.SfUpdateAction,
us.SfUpdateStatus,
cte_MaxStatus.ProcessedAt
FROM cte_MaxStatus
LEFT JOIN dbo.SfUpdateAction AS UA
ON cte_MaxStatus.fkSfUpdateAction = ua.SfUpdateActionID
LEFT JOIN dbo.SfUpdateStatus AS US
ON cte_MaxStatus.fkSfUpdateStatus = US.SfUpdateStatusID
LEFT JOIN Applications_HFF_30.dbo.ModuleObjectTypes AS MOT
ON cte_MaxStatus.ObjectType = MOT.ObjectTypeID
WHERE cte_MaxStatus.fkSfUpdateStatus <> 3Indexes on:
CREATE INDEX IX_SfQueuedTxUpdates_1 ON SfQueuedTxUpdates(ObjectID,ObjectType,QueuedAt)
CREATE INDEX IX_SfUpdateAction_1 ON SfUpdateAction(SfUpdateActionID)
CREATE INDEX IX_SfUpdateStatus_1 ON SfUpdateStatus(SfUpdateStatusID)
CREATE INDEX IX_ModuleObjectTypes_1 ON ModuleObjectTypes(ObjectTypeID)
Do you think the index on QueuedAt would benefit if I do a descending and avoid the expensive sort operation.
August 14, 2017 at 7:47 am
Jeff Moden - Monday, August 14, 2017 6:41 AMEd Wagner - Saturday, August 12, 2017 3:17 PMffarouqi - Friday, August 11, 2017 4:43 PMBelow is one of the reporting query that is taking almost like more than 9 minutes to complete and as per the business they want me to look into tweaking or optimizing the code maybe by adding indexes or rewriting the query. Unfortunately, I am not that well versed in dealing with complex T-SQL. I have attached all the necessary schema and indexes that are on the table. Let me know if I need to provide anything else.In looking at your execution plan, I noticed that the actual number of rows is quite different than the estimated number of rows. That begs the question - how old are the statistics on those tables? The optimizer makes decisions based on the statistics, so if they're off, the odds of getting an optimal plan go down. I'd check this first. Please let us know if that does the trick.
Where are you seeing that? For all the tables that appear in the execution plan, I'm seeing an exact match except on the largest table, which only has a difference of 40 rows compared to more than 47 million.
In the Sort operator that spilled to disk, 438,837 actual versus 138,641 estimated. Same thing in the downstream Hash Match. Looking at it again, the disparity started in the Filter operator and everything downstream from that was equal, so I just didn't look far enough. I had a whole slew of stuff typed up in that post before I narrowed it down...mistakenly. Thanks for pointing it out.
August 14, 2017 at 8:42 am
ffarouqi - Monday, August 14, 2017 7:44 AMJonathan AC Roberts - Monday, August 14, 2017 6:22 AMCREATE INDEX IX_SfQueuedTxUpdates_1 ON SfQueuedTxUpdates(ObjectID,ObjectType,QueuedAt)
Do you think the index on QueuedAt would benefit if I do a descending and avoid the expensive sort operation.
I'm not sure, if it does make a difference it would be a very small one.
The best thing to do is try it and see which one is faster.
There is another issue which is index maintenance, if QueuedAt is inserted in ascending numerical order and you have DESC on the column, the index will become fragmented faster than if you had ASC.
August 14, 2017 at 10:52 pm
Jonathan AC Roberts - Monday, August 14, 2017 8:42 AMffarouqi - Monday, August 14, 2017 7:44 AMJonathan AC Roberts - Monday, August 14, 2017 6:22 AMCREATE INDEX IX_SfQueuedTxUpdates_1 ON SfQueuedTxUpdates(ObjectID,ObjectType,QueuedAt)
Do you think the index on QueuedAt would benefit if I do a descending and avoid the expensive sort operation.
I'm not sure, if it does make a difference it would be a very small one.
The best thing to do is try it and see which one is faster.
Thanks! a ton Jonathan. You really rock. You rewritten code really helped to reduce the execution time way below to 1 hr and 51 secs.
August 15, 2017 at 11:36 am
ffarouqi - Monday, August 14, 2017 10:52 PMThanks! a ton Jonathan. You really rock. You rewritten code really helped to reduce the execution time way below to 1 hr and 51 secs.
Thanks, glad it worked, hope it wasn't 1 hr and 51 seconds though!
August 15, 2017 at 6:14 pm
Jonathan AC Roberts - Tuesday, August 15, 2017 11:36 AMffarouqi - Monday, August 14, 2017 10:52 PMThanks! a ton Jonathan. You really rock. You rewritten code really helped to reduce the execution time way below to 1 hr and 51 secs.Thanks, glad it worked, hope it wasn't 1 hr and 51 seconds though!
Good catch...my bad...sorry for the typo. It was obviously 1 hr and 51 minutes.
Thanks! once again for your help.
August 15, 2017 at 7:43 pm
ffarouqi - Monday, August 14, 2017 10:52 PMJonathan AC Roberts - Monday, August 14, 2017 8:42 AMffarouqi - Monday, August 14, 2017 7:44 AMJonathan AC Roberts - Monday, August 14, 2017 6:22 AMCREATE INDEX IX_SfQueuedTxUpdates_1 ON SfQueuedTxUpdates(ObjectID,ObjectType,QueuedAt)
Do you think the index on QueuedAt would benefit if I do a descending and avoid the expensive sort operation.
I'm not sure, if it does make a difference it would be a very small one.
The best thing to do is try it and see which one is faster.Thanks! a ton Jonathan. You really rock. You rewritten code really helped to reduce the execution time way below to 1 hr and 51 secs.
I'm.... confused. Didn't your original code only take 9 minutes according to your original post?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 15, 2017 at 8:25 pm
Jeff Moden - Tuesday, August 15, 2017 7:43 PMffarouqi - Monday, August 14, 2017 10:52 PMJonathan AC Roberts - Monday, August 14, 2017 8:42 AMffarouqi - Monday, August 14, 2017 7:44 AMJonathan AC Roberts - Monday, August 14, 2017 6:22 AMCREATE INDEX IX_SfQueuedTxUpdates_1 ON SfQueuedTxUpdates(ObjectID,ObjectType,QueuedAt)
Do you think the index on QueuedAt would benefit if I do a descending and avoid the expensive sort operation.
I'm not sure, if it does make a difference it would be a very small one.
The best thing to do is try it and see which one is faster.Thanks! a ton Jonathan. You really rock. You rewritten code really helped to reduce the execution time way below to 1 hr and 51 secs.
I'm.... confused. Didn't your original code only take 9 minutes according to your original post?
Oh yeah...I totally missed it once again. Yeah it took around 1 minute 51 secs...sorry about the confusion.
August 16, 2017 at 4:13 am
SELECT ObjectType, ObjectID, MAX_QueuedAt = MAX(QueuedAt)
INTO #SfQueuedTxUpdates
FROM dbo.SfQueuedTxUpdates
GROUP BY ObjectType, ObjectID
SELECT MOT.ObjectName, ms.ObjectID, ms.QueuedAt, ms.ErrorMessage, For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 16, 2017 at 5:28 am
ChrisM@Work - Wednesday, August 16, 2017 4:13 AMPersonally, I'd opt for a phase separation between the aggregation and the rest of the query, like this:SELECT ObjectType, ObjectID, MAX_QueuedAt = MAX(QueuedAt)
INTO #SfQueuedTxUpdates
FROM dbo.SfQueuedTxUpdates
GROUP BY ObjectType, ObjectIDSELECT MOT.ObjectName, ms.ObjectID, ms.QueuedAt, ms.ErrorMessage,
ua.SfUpdateAction, us.SfUpdateStatus, ms.ProcessedAt
FROM #SfQueuedTxUpdates t
CROSS APPLY (
SELECT TOP(1) -- might not be required
q.QueuedTxUpdateID, q.ObjectID, q.QueuedAt, q.ErrorMessage,
q.fkSfUpdateStatus, q.fkSfUpdateAction, q.ObjectType, q.ProcessedAt
FROM dbo.SfQueuedTxUpdates q
WHERE q.ObjectType = t.ObjectType
AND q.ObjectID = t.ObjectID
AND q.QueuedAt = t.MAX_QueuedAt
) x
LEFT JOIN dbo.SfUpdateAction UA
ON x.fkSfUpdateAction = ua.SfUpdateActionID
LEFT JOIN dbo.SfUpdateStatus US
ON x.fkSfUpdateStatus = US.SfUpdateStatusID
LEFT JOIN Applications_HFF_30.dbo.ModuleObjectTypes MOT
ON x.ObjectType = MOT.ObjectTypeID
WHERE x.fkSfUpdateStatus <> 3
On my machine with the test data I created the method I suggested is a bit faster. Maybe that method is faster for different data distributions.set statistics io,time off
if object_id('dbo.Test1','U') is not null
drop table dbo.Test1
go
create table dbo.Test1(ObjectType int not null,ObjectId int not null,QueuedAt int not null,myText nvarchar(100))
go
insert into dbo.Test1 (ObjectType,ObjectId,QueuedAt,myText)
select top(10000000) t1.n,t2.n,t3.n,newid()
from dbo.tally t1
inner join dbo.tally t2
on t2.n between 1 and 100
inner join dbo.tally t3
on t3.n between 1 and 200
where t1.n between 1 and 2000
order by 1,2,3
go
create index IX_Text1_1 on dbo.Test1(ObjectType,ObjectId,QueuedAt )
go
if object_id('tempdb..#tResult0','U') is not null
drop table #tResult0
if object_id('tempdb..#tResult1','U') is not null
drop table #tResult1
if object_id('tempdb..#t','U') is not null
drop table #t
if object_id('tempdb..#tResult2','U') is not null
drop table #tResult2
go
set statistics io,time on
PRINT '*********** Original ***********'
;with cte_MaxStatus as
(
select ROW_NUMBER() OVER (PARTITION BY ObjectType,ObjectId ORDER BY QueuedAt DESC) as RowNum,
ObjectType,ObjectId,QueuedAt,myText
from dbo.Test1
)
select ObjectType,ObjectId, QueuedAt, myText
into #tResult0
from cte_MaxStatus
where RowNum =1
go
PRINT '*********** CTE with CROSS APPLY ***********'
;with cte as
(
select distinct ObjectType,ObjectId
from dbo.Test1
)
select cte.ObjectType,cte.ObjectId, t2.QueuedAt,t2.myText
into #tResult1
from cte
cross apply(select top(1) QueuedAt,myText
from dbo.Test1 t2
where t2.ObjectId=cte.ObjectId
and t2.ObjectType =cte.ObjectType
order by QueuedAt DESC) as t2
go
print '*********** Start phase separation ***********'
select t.ObjectType,t.ObjectId ,MAX(t.QueuedAt) QueuedAt
into #t
from dbo.Test1 t
group by t.ObjectType,t.ObjectId
select t1.ObjectType,t1.ObjectId, t1.QueuedAt, t2.myText
into #tResult2
from #t t1
cross apply(select top(1) myText
from dbo.Test1 t2
where t2.ObjectId=t1.ObjectId
and t2.ObjectType =t1.ObjectType
and t2.QueuedAt =t1.QueuedAt ) as t2
print '*********** END ***********'
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply