Why does this take so long?

Question

Post reply

Why does this take so long?

Michael Miller-334399

SSC Journeyman

Points: 96
More actions
June 1, 2006 at 3:38 pm

#190810

Here's my query:
SELECT VenueID
FROM VirtualVenues V
WHERE IncludeRandom = 1
AND Inactive = 0
AND SiteID=39 AND
(VenueID NOT IN
(SELECT M.VirtualVenue
FROM Meetings M LEFT JOIN Reservations R ON ReservationMeetingID=MeetingID
WHERE Is_Master=0
AND M.Cancelled=0
AND R.Cancelled=0
AND ( (DATEADD(mi, (ResSetupMin*-1), DateTimeStart)
BETWEEN CAST('6/2/2006 4:00:00 PM' AS datetime)
AND CAST('6/2/2006 5:00:00 PM' AS datetime))
OR (CAST('6/2/2006 4:00:00 PM' AS datetime)
BETWEEN DATEADD(mi, (ResSetupMin*-1), DateTimeStart)
AND DATEADD(mi, (ResClosureMin-1), DateTimeEnd)))))
ORDER BY VenueID
This query takes about a minute to complete. If I set:
IncludeRandom = 0
Inactive = 0
It completes in about 3 seconds but that's not the data I want. If I set:
IncludeRandom = 0
Inactive = 1
It completes in about 3 seconds but that's not the data I want. If I set:
IncludeRandom = 1
Inactive = 1
The query returns no data. That's fine, but not what I need.
The problem here is that the first query will return the data I need but takes way too long and my connection times out. The other variations finish quickly enough but are not the data sets I'm looking for. Why does changing the true/false have such a great effect on how long the query takes to run?
I've looked at this in MS SQL Query Analyzer and found that the execution plan is very different for my query as opposed to the variations. I'm not a SQL expert, so if I'm doing something stupid, please let me know.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

SELECT M.VirtualVenue

FROM Meetings M LEFT JOIN Reservations R ON ReservationMeetingID=MeetingID

WHERE Is_Master=0

AND M.Cancelled=0

AND R.Cancelled=0

If you LEFT JOIN a table, but then reference one of that table's columns in the WHERE, it implicitly becomes an INNER JOIN.

NOT IN can usually be replaced by a correlated NOT EXISTS sub-query which performs better, but it's not clear how to do that in your case when there is dodgy LEFT JOIN logic that needs to be fixed first.

stax68 SSChampion Points: 11711 More actions · Answer 2

I would guess that the difference in your query times is just because of the difference in number of rows processed. Do the other combinations return very few rows?

The left join should probably be inner anyway. PW: I didn't think that a NOT EXISTS was treated differently from a NOT IN?

I think the problem is most likely within the subquery. You have functions around the date columns which will prevent indexes from being used. You should consider having an indexed calculated column to hold the adjusted start and end times. You really need usable indexes on those times I think.

I'm assuming that the date logic is correct: you want to ensure that no reservation is ongoing at 4pm, and that no reservation is due to start in the hour commencing then (using adjusted start and end times).

Can you confirm which table each of the fields comes from?

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Sergiy SSC Guru Points: 110208 More actions · Answer 3

You've got whole bunch of big NO's in your query. Every one of them will kill performance because of full table scans and huge datasets being copied to tempdb.

1. Checking flags. Because there are only 2 values SQL Server cannot make any use of indexes and has to scan entire table.

2. "VenueID NOT IN (SELECT..."

Server has to create derived table in tempdb, populate it and, because there are no indexes on this table, commit full HASH JOIN to VirtualVenues with following filtering not matched rows out. Very expensive operation.

3. DATEADD(mi, (ResSetupMin*-1), DateTimeStart) BETWEEN ...

Because the checked value is calculated in WHERE clause SQL Server cannot use indexes ad must perform full table scan.

4. ... OR (CAST('6/2/2006 4:00:00 PM' AS datetime) BETWEEN ...

OR in WHERE clause is not recommended because it forces repeating selection, it's actually 2 queries with different execution plans in one. Because both parts of your WHERE clause are causing table scan OR will cause double table scan.

5. Because you use values from table Reservations in WHERE clause without handling NULLS you can replace LEFT join with INNER join. Resultset will be the same, but query will perform faster.

Enough?

_____________
Code for TallyGenerator

Michael Miller-334399 SSC Journeyman Points: 96 More actions · Answer 4

Thanks everyone, I take all these comments to heart. I'm not sure I understand the advantages/disadvantages of various joins, is there an online resource that discusses the related concepts? It's tough to find things like that as most info I find is, "This is this thing and here's what it does, here's how to do it.." but there's rarely an explanation as to -why- you'd do one thing vs. another.

It may be interesting to note that this by it self:

SELECT M.VirtualVenue

FROM Meetings M LEFT JOIN Reservations R ON ReservationMeetingID=MeetingID

WHERE Is_Master=0

AND M.Cancelled=0

AND R.Cancelled=0

AND ( (DATEADD(mi, (ResSetupMin*-1), DateTimeStart)

BETWEEN CAST('6/2/2006 4:00:00 PM' AS datetime)

AND CAST('6/2/2006 5:00:00 PM' AS datetime))

OR (CAST('6/2/2006 4:00:00 PM' AS datetime)

BETWEEN DATEADD(mi, (ResSetupMin*-1), DateTimeStart)

AND DATEADD(mi, (ResClosureMin-1), DateTimeEnd)))

The query returns the data within a second. So while I'll adjust things based on comments above, I'm really wondering why changing the values in the other part of the query causes such a great difference. I would think that the amount of work would be the same whether you are looking for something that is true or look for it to be false.

pics of the execution plans can be found here:

http://www.ncsa.uiuc.edu/People/mimiller/querypics/query.html

Perhaps this sheds some light on what's going on. If it's just a matter of following the advice above, let me know, but I'm open to additional suggestions as well.

Thanks for all your help so far.

myk

stax68 SSChampion Points: 11711 More actions · Answer 5

you can use SET SHOWPLAN or SET STATISTICS to generate a textual query plan. Have a look at BOL.

sql-server-performance.com might have some stuff on optimisation. For database design theory, you need a good book on datamodelling or databse design.

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

stax68 SSChampion Points: 11711 More actions · Answer 6

Sergiy, when you give directions I bet you say 'No, I wouldn't start from here'. Let's try and be a bit constructive.

>1. Checking flags. Because there are only 2 values SQL Server cannot make any use of indexes and has to scan entire table.

So is the moral don't ever check flags? Or perhaps, if you want to check flags, you might want to put them in a composite index with some other search args, or even data (covering index).

>2. "VenueID NOT IN (SELECT..." Server has to create derived table in tempdb, populate it and, because there are no indexes on this table, commit full HASH JOIN to VirtualVenues with following filtering not matched rows out. Very expensive operation.

I'd agree that these sort of subqueries (if large) are usually better replaced by joins - in theis case, that would mean something like two left joins with a DISTINCT and a WHERE...NOT NULL. Depends on the data. If the subquery you are checking is small, like a list of 12 venues which are booked up at a given hour, then you might want to encourage the optimiser to generate this rowset before peforming other joins. I'd still use a left join rather than a NOT IN, but I'm not sure how much of a perfomance difference it would make, if any.

>3. DATEADD(mi, (ResSetupMin*-1), DateTimeStart) BETWEEN ... Because the checked value is calculated in WHERE clause SQL Server cannot use indexes ad must perform full table scan.

Yes, so index the calculated values as suggested above.

>4. ... OR (CAST('6/2/2006 4:00:00 PM' AS datetime) BETWEEN ... OR in WHERE clause is not recommended because it forces repeating selection, it's actually 2 queries with different execution plans in one. Because both parts of your WHERE clause are causing table scan OR will cause double table scan.

Another unacceptable starting point eh? The need for OR in where clauses. Two passes through the data? Even if the two disjucts refer to the same column(s)? Have the optimiser dev team been on holiday for the last 10 years?

>5. Because you use values from table Reservations in WHERE clause without handling NULLS you can replace LEFT join with INNER join. Resultset will be the same, but query will perform faster.

Then we are all agreed on that, anyway.

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Michael Miller-334399 SSC Journeyman Points: 96 More actions · Answer 7

So I came up with this:

SELECT Distinct VenueID

FROM VirtualVenues V join (Meetings M join Reservations R ON ReservationMeetingID=MeetingID) on M.VirtualVenue=VenueID

WHERE V.IncludeRandom = 1

AND V.Inactive = 0

AND V.SiteID=39

and M.Is_Master=0

AND M.Cancelled=0

AND R.Cancelled=0

AND ( (DATEADD(mi, (R.ResSetupMin*-1), M.DateTimeStart)

NOT BETWEEN CAST('6/2/2006 4:00:00 PM' AS datetime)

AND CAST('6/2/2006 5:00:00 PM' AS datetime))

OR (CAST('6/2/2006 4:00:00 PM' AS datetime)

NOT BETWEEN DATEADD(mi, (R.ResSetupMin*-1), M.DateTimeStart)

AND DATEADD(mi, (R.ResClosureMin-1), M.DateTimeEnd)))

ORDER BY VenueID

This completes in less than a second and the execution plan is very different. More complicated, but I guess it helps. I'll have to read more about using indexes as several questions come to mind and I need to keep this DB in production as I make changes.

Please let me know if you see additional improvements that could be made.

thanks again,

myk

stax68 SSChampion Points: 11711 More actions · Answer 8

Good plan - but I think the OR should be an AND. Shame the date and minutes fields aren't in the same table.

Here's a couple of alternative versions:

declare

@seekstart datetime

set @seekstart = '6/2/2006 4:00:00 PM'

declare

@seekend = dateadd(h,@seekdatetime,1)

SELECT

Distinct VenueID

from

VirtualVenues V

join

Meetings M

on

M.VirtualVenue=VenueID

join

Reservations R

on

ReservationMeetingID=MeetingID

WHERE

V.IncludeRandom = 1

AND

V.Inactive = 0

AND

V.SiteID=39

and

M.Is_Master=0

AND

M.Cancelled=0

AND

R.Cancelled=0

AND

M.DateTimeStart NOT BETWEEN dateadd(mi,R.ResSetupMin,@seekstart)

AND dateadd(mi,R.ResSetupMin,@seekend)

--version 1-----

AND

@seekstart NOT BETWEEN DATEADD(mi, (R.ResSetupMin*-1), M.DateTimeStart)

AND DATEADD(mi, (R.ResClosureMin-1), M.DateTimeEnd)

----------------

--version 2-----

AND

(DATEADD(mi, (R.ResSetupMin), @seekstart) < M.DateTimeStart

OR DATEADD(mi, (R.ResClosureMin-1)*-1,@seekend) > M.DateTimeEnd)

----------------

ORDER

BY VenueID

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

stax68 SSChampion Points: 11711 More actions · Answer 9

Should have added that your joins would need to be left joins, and the where clauses amended to handle NULLs, otherwise new venues withut any reservations would never show up...though perhaps it might be more efficient just to do a UNION ALL to add the 'virgin' records.

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Michael Miller-334399 SSC Journeyman Points: 96 More actions · Answer 10

thanks, I guess I don't understand why OR's take more time than AND's, but I'll take your word for it and investigate more on my own. I understand what you mean about NULLs causing new venues to be left out. But I'm not sure how to express the idea that venueA doesn't have any reservations and therefore can be included. Can you elaborate a little more? I'll look up UNION ALL and be checking back here as well. Thanks for all your help.

Sergiy SSC Guru Points: 110208 More actions · Answer 11

Second AND is applied to subset returned by first AND.

Second OR is applied to the same original set of rows as first one. Results of both OR's are united.

_____________
Code for TallyGenerator