June 10, 2013 at 11:46 pm
Hey folks,
I'm looking for some help in understanding why my query is behaving differently under certain circumstances. The query is as follows:
with blah_index As
(Select ROW_NUMBER() Over (Order By blahdate desc, blahid asc) As RowNum
, *
from blah
where blahcode not in ('D')
and (blahstatus is null or blahstatus in ('A','B','C'))
and blahtype not in ('type1,'type2')
and blahcheck = 1
and ((blahtray = 1) or (blahtray = 2))
and blahname in (select filterValue
from blah_filters
where colName = 'blahname'
and comparitor = 'and')
)
select*
fromblah_index
whereRowNum between 1 and 1000
order by RowNum asc;
Now, if the filter value is sufficiently restrictive (ie. the column values are highly varied), the query works great and is nice and fast, particularly with the indexes that have been set up on the table. However, if the filter value is on a column that only contains a couple of values, the execution plan changes and performs considerably worse. I'm noticing there's a difference in where it's placing the 'distinct sort' for the in clause in the execution plan. In cases where it executes well, the sort is almost at the end of the chain. In cases where it's no good, it's further up the chain after the nested joins.
The odd thing is though, if I manually insert the filter values as a string into that query, it's super fast again. So, what's causing the execution plan to be so bad?
I've attached a couple of images of the differences between the plans. The 'goodrun' takes a matter of ms to run. The 'badrun' takes some 30seconds to execute.
June 11, 2013 at 12:11 am
Instead of posting pictures of the execution plans, how about attaching the actual execution plans (as .sqlplan files) instead. The pictures don't tell as much as the execution plans will.
June 11, 2013 at 1:07 am
Lynn Pettis (6/11/2013)
Instead of posting pictures of the execution plans, how about attaching the actual execution plans (as .sqlplan files) instead. The pictures don't tell as much as the execution plans will.
Hi Lynn....sorry, I'm not sure if I can/should do that without having the table/proc names redacted (dunno what my company policy would be regarding that). I'll chase it up tomorrow and post the actual plans if it's not an issue.
June 11, 2013 at 3:19 am
June 11, 2013 at 6:17 pm
Ok, I'll modify my original question.
Why does this first query perform so much worse than the second one?
First query:
with doc_index As
(Select ROW_NUMBER() Over (Order By docdate desc, docid asc) As RowNum
, a.*
from doclist a
inner join DocList_Filters b
on b.colName = 'STATUS'
and a.Status in ( select distinct filterValue
from DocList_Filters
where colName = 'STATUS'
and comparitor = 'and')
where doccode not in ('D')
and (statuscode is null or statuscode in ('A','I','D'))
and doctype not in ('type1','type2')
and docchecked = 1
and ((doctray = 1) or (doctray = 2))
)
select *
from doc_index
where RowNum between 1 and 1000
order by RowNum asc;
Second query:
with doc_index As
(Select ROW_NUMBER() Over (Order By docdate desc, docid asc) As RowNum
, a.*
from doclist a
where doccode not in ('D')
and (statuscode is null or statuscode in ('A','I','D'))
and doctype not in ('type1','type2')
and docchecked = 1
and ((doctray = 1) or (doctray = 2))
and a.status in ('active','inactive')
)
select *
from doc_index
where RowNum between 1 and 1000
order by RowNum asc;
The filters table only contains the 2 records that are manually used in the second query.
The doclist table contains 66 columns and about a million rows.
Is that enough info?
June 12, 2013 at 3:22 am
June 12, 2013 at 7:28 am
mrbonk (6/11/2013)
Ok, I'll modify my original question.Why does this first query perform so much worse than the second one?
First query:
with doc_index As
(Select ROW_NUMBER() Over (Order By docdate desc, docid asc) As RowNum
, a.*
from doclist a
inner join DocList_Filters b
on b.colName = 'STATUS'
and a.Status in ( select distinct filterValue
from DocList_Filters
where colName = 'STATUS'
and comparitor = 'and')
where doccode not in ('D')
and (statuscode is null or statuscode in ('A','I','D'))
and doctype not in ('type1','type2')
and docchecked = 1
and ((doctray = 1) or (doctray = 2))
)
select *
from doc_index
where RowNum between 1 and 1000
order by RowNum asc;
Second query:
with doc_index As
(Select ROW_NUMBER() Over (Order By docdate desc, docid asc) As RowNum
, a.*
from doclist a
where doccode not in ('D')
and (statuscode is null or statuscode in ('A','I','D'))
and doctype not in ('type1','type2')
and docchecked = 1
and ((doctray = 1) or (doctray = 2))
and a.status in ('active','inactive')
)
select *
from doc_index
where RowNum between 1 and 1000
order by RowNum asc;
The filters table only contains the 2 records that are manually used in the second query.
The doclist table contains 66 columns and about a million rows.
Is that enough info?
First, here is a rewrite of both of your queries in reverse order (second query first):
Select top 1000
a.*
from
doclist a
where
a.doccode not in ('D')
and (a.statuscode is null or a.statuscode in ('A','I','D'))
and a.doctype not in ('type1','type2')
and a.docchecked = 1
and ((a.doctray = 1) or (a.doctray = 2))
and a.status in ('active','inactive')
order by
a.docdate desc,
a.docid asc;
Select top 1000
a.*
from
doclist a
inner join DocList_Filters b
on a.Status = b.filterValue
where
a.doccode not in ('D')
and (a.statuscode is null or statuscode in ('A','I','D'))
and a.doctype not in ('type1','type2')
and a.docchecked = 1
and ((a.doctray = 1) or (a.doctray = 2))
and b.colName = 'Status'
and b.comparitor = 'and'
order by
a.docdate desc,
a.docid asc;
Now, why is your first query less performant than the first? Here is what I saw in it:
1. Accessing 2 tables, one of then twice; one in a join and again as a subquery in that join.
2. The second query was only reading data from a single table.
Now, questions:
1. Do both queries return the same results sets?
2. Let us know if the my rewrites do the same, please.
June 12, 2013 at 7:57 am
I just rewrote the first query a second time:
Select top 1000
a.*
from
doclist a
where
a.doccode not in ('D')
and (a.statuscode is null or statuscode in ('A','I','D'))
and a.doctype not in ('type1','type2')
and a.docchecked = 1
and ((a.doctray = 1) or (a.doctray = 2))
and a.Status in (select
b.filterValue
from
DocList_Filters b
where
b.colName = 'Status'
and b.comparitor = 'and')
order by
a.docdate desc,
a.docid asc;
June 12, 2013 at 5:09 pm
Hi Lynn,
Firstly, thank you so much for spending the time helping out!
The rewritten queries perform approx the same as my original ones. As soon as the subquery is present to get the filter values from the second table, it takes considerably longer to execute compared to the other method. Something else that I have to consider is that this forms part of a JIT loading implementation, so I can't just use top1000 to get the results. I need to be able to specify arbitrary start/end rows, hence my usage of the rownumber() call to produce a unique index from a sort column that's passed into the proc.
I've been working on this for about a week now and it's starting to look like there's no real viable solution that will work within both the software architecture and database structure that I'm unfortunately stuck with for the time being. Something that *does* work is to use 2 CTEs, with the first one being used to perform a gross paring of the starting set, so that second CTE and remaining select query don't have such a volume of data to deal with. It's not the preferred solution, but it's the one I'm going to have to run with until I have the ability to restructure the woeful database structure I'm presented with on this on!
For the sake of completeness, I'll post the queries I end up with when this is finalised.
June 12, 2013 at 5:24 pm
What is also missing is the DDL for the tables you are using along with the current indexes defined on those tables.
Part of the problem you are having may also be attributed to poor indexing of the tables.
June 12, 2013 at 6:09 pm
Lynn Pettis (6/12/2013)
What is also missing is the DDL for the tables you are using along with the current indexes defined on those tables.Part of the problem you are having may also be attributed to poor indexing of the tables.
Part of the requirement here is that records be presented in a default sort order of date desc, then docid asc. Additionally, the user can specify an arbitrary third sort column, which can be asc or desc. Indexing the primary table with the default sort in mind results in reasonable performance (not great, but acceptable). However, allowing any one of another 65 columns to be specified in the sort makes the server work overtime trying to achieve this. A less than ideal solution to this has been to add suitable indexes to specifically handle the sort orders that might be specified (where the index would be restrictive enough to be used), but obviously this has resulted in more space being consumed in indexes than there is actual data in the table. However, it makes the queries perform at an acceptable level again, so has to remain.
I'm sure a suitably normalised table structure with appropriate indexes would outperform this monstrosity hands down, but it's not an option in the short term :crazy:
June 12, 2013 at 6:14 pm
mrbonk (6/12/2013)
Lynn Pettis (6/12/2013)
What is also missing is the DDL for the tables you are using along with the current indexes defined on those tables.Part of the problem you are having may also be attributed to poor indexing of the tables.
Part of the requirement here is that records be presented in a default sort order of date desc, then docid asc. Additionally, the user can specify an arbitrary third sort column, which can be asc or desc. Indexing the primary table with the default sort in mind results in reasonable performance (not great, but acceptable). However, allowing any one of another 65 columns to be specified in the sort makes the server work overtime trying to achieve this. A less than ideal solution to this has been to add suitable indexes to specifically handle the sort orders that might be specified (where the index would be restrictive enough to be used), but obviously this has resulted in more space being consumed in indexes than there is actual data in the table. However, it makes the queries perform at an acceptable level again, so has to remain.
I'm sure a suitably normalised table structure with appropriate indexes would outperform this monstrosity hands down, but it's not an option in the short term :crazy:
This arbitrary 3rd column is only for the final sort or does it also impact the select of the rows?
Is this query contained in a stored procedure?
June 12, 2013 at 6:16 pm
And how is this third column passed to the query?
June 12, 2013 at 8:50 pm
Lynn Pettis (6/12/2013)
This arbitrary 3rd column is only for the final sort or does it also impact the select of the rows?Is this query contained in a stored procedure?
Unfortunately no.....it forms part of the index.
This is all in a stored proc which has to be dynamically generated (for a variety of reasons). The user selected sort column name is passed in as a parameter, along with a number of other parameters.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply