improve the query performance

Question

improve the query performance

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply

lnoland SSCommitted Points: 1753 More actions · Answer 1

dlam 18073 (1/20/2011)
DateCall >=@callFrom and DateCall < @callTo <<< does this part of code already set the range of dateCall
without
not exists
(
select 1
FROM tblCallLog
WHERE phNu=r1.phNu
AND DateCall > @callTo
)
also it select all of the max date call with in the range over all of the max date call
for example:
test data
402111, 2010-01-10 15:15:00.000, n, 32226001, b smith
402222, 2010-03-21 15:16:00.000, n, 32226001, b smith
402333, 2010-10-19 15:17:00.000, n, 32226001, b smith
402444, 2010-11-21 16:01:00.000, n, 32226001, b smith
with this query
select *
from tblCallLog r
Where DateCall =
(
select max(DateCall)
from tblCall r1
where r1.phNu=r.phNu
and logStatus in (@a, @b-2, @b2,@c,@d,@d2,@d3,@n,@n2,@n3,@n4,@r,@r2,@s,@s2,@u,@w,@n5)
and DateCall >=@callFrom and DateCall < @callTo
group by phNu
)
call date range
DateCall >='2010-10-01' and DateCall < '2010-11-30 23:59'
the outcome is >> 402444, 2010-11-21 16:01:00.000, n, 32226001, b smith
but is there any different between puting the where clause in the sub-query and out side the sub-query

I am not sure I fully understand what you are trying to tell me here, but I can tell you that in your example you did not include any data which the clause you removed was designed to handle since the latest call in your example also fell within the date range you set -- what does the query return if you add a call for that same number but with the call taking place in the month of December 2010? Without the "not exists" clause, it should continue to return the same record it is returning now, but as I understand your stated requirements, it should not return anything. If you add back in the "not exists" clause that should change it so it won't return anything.

As for the question of putting the where clause in the subquery or outside of the subquery, I am not sure how to answer as the code I gave you had two subqueries and three where clauses so I don't know which where clause you are talking about relocating nor to where you are contemplating relocating it. Your general question, however, seems to be due to a lack of understanding of the function of the "not exists" subquery. Its purpose is to eliminate any numbers which have calls which were made after the upper bound of your range. Without it, all of the calls reported will be within the range you have set, but all of the calls reported will not necessarily be the latest calls for the respective number. You should be able to prove that to yourself by adding in the additional data as I indicated above. In your original query, you accomplished this goal by first finding the latest call for every number and then looking only at those to see which ones were in range -- but you wanted to improve the performance. I tried to improve the performance by not looking at every single number to find the latest call but, instead, finding the latest of the calls within the stated range and then eliminating any of them for which the related number has calls later than the stated range. To do the elimination, you need the "not exists" clause.

- Les

dlam 18073 Mr or Mrs. 500 Points: 548 More actions · Answer 2

I understand your query now, I know its abit late to say that.

Related question about the code below

select *

from tblCallLog r

Where DateCall =

(

select max(DateCall)

from tblCall r1

where r1.phNu=r.phNu

and logStatus in (@a, @b-2, @b2,@c,@d,@d2,@d3,@n,@n2,@n3,@n4,@r,

@R2,@s,@s2,@u,@w,@n5)

and DateCall >=@callFrom and DateCall < @callTo

and not exists

(

select 1

FROM tblCallLog

WHERE phNu=r1.phNu

AND DateCall > @callTo

)

group by phNu

)

my query below

select *

from tblCallLog r

where DateCall=

(select max(DateCall)

from tblCallLog

where phNu=r.phNu

group by phNu) and logStatus in(@a, @b-2, @b2,@c,@d,@d2,@d3,@n,@n2,@n3,@n4,

@r,@r2,@s,@s2,@u,@w,@n5)

and DateCall >= @callFrom and DateCall < @callTo

order by DateCall

why those where conditions put in the subquery is more efficient,

both query also checks the same table more than 1 time,

yes, I asked the similar question on my previous post, but still not get it

lnoland SSCommitted Points: 1753 More actions · Answer 3

dlam 18073 (1/23/2011)
I understand your query now, I know its abit late to say that.
Related question about the code below
select *
from tblCallLog r
Where DateCall =
(
select max(DateCall)
from tblCall r1
where r1.phNu=r.phNu
and logStatus in (@a, @b-2, @b2,@c,@d,@d2,@d3,@n,@n2,@n3,@n4,@r,
@R2,@s,@s2,@u,@w,@n5)
and DateCall >=@callFrom and DateCall < @callTo
and not exists
(
select 1
FROM tblCallLog
WHERE phNu=r1.phNu
AND DateCall > @callTo
)
group by phNu
)
my query below
select *
from tblCallLog r
where DateCall=
(select max(DateCall)
from tblCallLog
where phNu=r.phNu
group by phNu) and logStatus in(@a, @b-2, @b2,@c,@d,@d2,@d3,@n,@n2,@n3,@n4,
@r,@r2,@s,@s2,@u,@w,@n5)
and DateCall >= @callFrom and DateCall < @callTo
order by DateCall
why those where conditions put in the subquery is more efficient,
both query also checks the same table more than 1 time,
yes, I asked the similar question on my previous post, but still not get it

First of all, I can't say definitively that my query is any more efficient than yours -- even if I had your table definitions, index definitions, test data, etc., the query optimizer on my server might compile the queries differently than yours does, though if we had everything the same, similar software releases, etc., they likely would be pretty similar. But from an intuitive view of what tasks need to be done in the different cases, I would expect it to be somewhat more efficient. The reason for that is that, your query explicitly indicates that the latest call for all numbers should be determined, and then that list of calls should be filtered to eliminate any which don't have the correct status or for which the call is not within the range of dates specified. Let's say that the database has 2,000,000,000 calls and 5,000,000 phone numbers with calls -- that means that there are 5,000,000 latest calls to look at, but most likely, very few of those calls meet your other criteria. For the moment, let's ignore the call status since it is pretty much the same in both queries, and just say that:

all of the 2,000,000,000 calls have the correct status;
300 numbers have 1000 calls in the time period specified;
but only 50 of them are the latest call in that time period (which would mean that 250 of the numbers had their latest call after the time period);

That means that, if your query is done exactly as written, SQL Server is going to look at 2,000,000,000 calls, determine the 5,000,000 latest calls, and then pick the 50 of those which are within the range.

In my query, if done pretty much as written, it would look at 2,000,000,000 calls, find the 1000 which are within the range, then for each of the 300 numbers check to see if they have a single call later than the time period (it doesn't have to be the latest call for the number -- just one call which is later than the time period), eliminating 250 numbers, and then for the remaining 50, find the latest call for each number from within 1000 calls in the time period. (Actually, it doesn't even have to look at 1000 -- it only has to look at calls for the 50 numbers involved which given the numbers suggests that we would be looking at fewer than 200 calls)

Even with the right index, finding the latest call for a number from within a group will likely involve a few comparisons per number -- doing so for 50 numbers and looking over a limited range which includes 1000 calls is apt to be considerably quicker than doing so for 5,000,000 numbers over all times possible involving 2,000,000,000 calls. With the right index, finding the calls within a range should be relatively quick, as will determining if there are any calls outside of the range. So, pretty much the most expensive thing here is finding the latest call but you start off doing that for every number in the database even though you aren't interested in the vast majority of them.

The big unknown here is: "how does the query optimizer see the problem?" It might be possible that the query optimizer has sufficient logic to look at your query and break down the requirements in the same way I did and conclude that it could do the job by approaching it the same way I did in writing my query, in which case, the execution plans would likely be the same -- but I seriously doubt that it could actually do that. So, my guess is that my approach will produce a more efficient execution plan (presuming, of course, that you also have reasonable indexes defined on the tables so that it doesn't wind up in either case just looping through all 2,000,000,000 calls over and over). Only you can verify whether that is true, however -- it's your database.

- Les

dlam 18073 Mr or Mrs. 500 Points: 548 More actions · Answer 4

The thing I didnt do at the beginning, was test the query in a very large database, was large enough I thought, then the process timeout in the application. This time, I will put it into a large enough database to test every query that I got.

Thank so much to you to answer the question in detail, and give me many ideas in how to see the problem.:-)

lnoland SSCommitted Points: 1753 More actions · Answer 5

dlam 18073 (1/24/2011)
The thing I didnt do at the beginning, was test the query in a very large database, was large enough I thought, then the process timeout in the application. This time, I will put it into a large enough database to test every query that I got.
Thank so much to you to answer the question in detail, and give me many ideas in how to see the problem.:-)

You're welcome. I hope it proves helpful.

- Les