June 1, 2006 at 3:38 pm
Here's my query:
SELECT VenueID
FROM VirtualVenues V
WHERE IncludeRandom = 1
AND Inactive = 0
AND SiteID=39 AND
(VenueID NOT IN
(SELECT M.VirtualVenue
FROM Meetings M LEFT JOIN Reservations R ON ReservationMeetingID=MeetingID
WHERE Is_Master=0
AND M.Cancelled=0
AND R.Cancelled=0
AND ( (DATEADD(mi, (ResSetupMin*-1), DateTimeStart)
BETWEEN CAST('6/2/2006 4:00:00 PM' AS datetime)
AND CAST('6/2/2006 5:00:00 PM' AS datetime))
OR (CAST('6/2/2006 4:00:00 PM' AS datetime)
BETWEEN DATEADD(mi, (ResSetupMin*-1), DateTimeStart)
AND DATEADD(mi, (ResClosureMin-1), DateTimeEnd)))))
ORDER BY VenueID
This query takes about a minute to complete. If I set:
IncludeRandom = 0
Inactive = 0
It completes in about 3 seconds but that's not the data I want. If I set:
IncludeRandom = 0
Inactive = 1
It completes in about 3 seconds but that's not the data I want. If I set:
IncludeRandom = 1
Inactive = 1
The query returns no data. That's fine, but not what I need.
The problem here is that the first query will return the data I need but takes way too long and my connection times out. The other variations finish quickly enough but are not the data sets I'm looking for. Why does changing the true/false have such a great effect on how long the query takes to run?
I've looked at this in MS SQL Query Analyzer and found that the execution plan is very different for my query as opposed to the variations. I'm not a SQL expert, so if I'm doing something stupid, please let me know.
June 1, 2006 at 4:03 pm
SELECT M.VirtualVenue
FROM Meetings M LEFT JOIN Reservations R ON ReservationMeetingID=MeetingID
WHERE Is_Master=0
AND M.Cancelled=0
AND R.Cancelled=0
If you LEFT JOIN a table, but then reference one of that table's columns in the WHERE, it implicitly becomes an INNER JOIN.
NOT IN can usually be replaced by a correlated NOT EXISTS sub-query which performs better, but it's not clear how to do that in your case when there is dodgy LEFT JOIN logic that needs to be fixed first.
June 1, 2006 at 5:13 pm
I would guess that the difference in your query times is just because of the difference in number of rows processed. Do the other combinations return very few rows?
The left join should probably be inner anyway. PW: I didn't think that a NOT EXISTS was treated differently from a NOT IN?
I think the problem is most likely within the subquery. You have functions around the date columns which will prevent indexes from being used. You should consider having an indexed calculated column to hold the adjusted start and end times. You really need usable indexes on those times I think.
I'm assuming that the date logic is correct: you want to ensure that no reservation is ongoing at 4pm, and that no reservation is due to start in the hour commencing then (using adjusted start and end times).
Can you confirm which table each of the fields comes from?
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 1, 2006 at 7:59 pm
You've got whole bunch of big NO's in your query. Every one of them will kill performance because of full table scans and huge datasets being copied to tempdb.
1. Checking flags. Because there are only 2 values SQL Server cannot make any use of indexes and has to scan entire table.
2. "VenueID NOT IN (SELECT..."
Server has to create derived table in tempdb, populate it and, because there are no indexes on this table, commit full HASH JOIN to VirtualVenues with following filtering not matched rows out. Very expensive operation.
3. DATEADD(mi, (ResSetupMin*-1), DateTimeStart) BETWEEN ...
Because the checked value is calculated in WHERE clause SQL Server cannot use indexes ad must perform full table scan.
4. ... OR (CAST('6/2/2006 4:00:00 PM' AS datetime) BETWEEN ...
OR in WHERE clause is not recommended because it forces repeating selection, it's actually 2 queries with different execution plans in one. Because both parts of your WHERE clause are causing table scan OR will cause double table scan.
5. Because you use values from table Reservations in WHERE clause without handling NULLS you can replace LEFT join with INNER join. Resultset will be the same, but query will perform faster.
Enough?
_____________
Code for TallyGenerator
June 2, 2006 at 7:27 am
Thanks everyone, I take all these comments to heart. I'm not sure I understand the advantages/disadvantages of various joins, is there an online resource that discusses the related concepts? It's tough to find things like that as most info I find is, "This is this thing and here's what it does, here's how to do it.." but there's rarely an explanation as to -why- you'd do one thing vs. another.
It may be interesting to note that this by it self:
SELECT M.VirtualVenue
FROM Meetings M LEFT JOIN Reservations R ON ReservationMeetingID=MeetingID
WHERE Is_Master=0
AND M.Cancelled=0
AND R.Cancelled=0
AND ( (DATEADD(mi, (ResSetupMin*-1), DateTimeStart)
BETWEEN CAST('6/2/2006 4:00:00 PM' AS datetime)
AND CAST('6/2/2006 5:00:00 PM' AS datetime))
OR (CAST('6/2/2006 4:00:00 PM' AS datetime)
BETWEEN DATEADD(mi, (ResSetupMin*-1), DateTimeStart)
AND DATEADD(mi, (ResClosureMin-1), DateTimeEnd)))
The query returns the data within a second. So while I'll adjust things based on comments above, I'm really wondering why changing the values in the other part of the query causes such a great difference. I would think that the amount of work would be the same whether you are looking for something that is true or look for it to be false.
pics of the execution plans can be found here:
http://www.ncsa.uiuc.edu/People/mimiller/querypics/query.html
Perhaps this sheds some light on what's going on. If it's just a matter of following the advice above, let me know, but I'm open to additional suggestions as well.
Thanks for all your help so far.
myk
June 2, 2006 at 8:03 am
you can use SET SHOWPLAN or SET STATISTICS to generate a textual query plan. Have a look at BOL.
sql-server-performance.com might have some stuff on optimisation. For database design theory, you need a good book on datamodelling or databse design.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 2, 2006 at 8:31 am
Server has to create derived table in tempdb, populate it and, because there are no indexes on this table, commit full HASH JOIN to VirtualVenues with following filtering not matched rows out. Very expensive operation.
I'd agree that these sort of subqueries (if large) are usually better replaced by joins - in theis case, that would mean something like two left joins with a DISTINCT and a WHERE...NOT NULL. Depends on the data. If the subquery you are checking is small, like a list of 12 venues which are booked up at a given hour, then you might want to encourage the optimiser to generate this rowset before peforming other joins. I'd still use a left join rather than a NOT IN, but I'm not sure how much of a perfomance difference it would make, if any.
Because the checked value is calculated in WHERE clause SQL Server cannot use indexes ad must perform full table scan.
Yes, so index the calculated values as suggested above.
OR in WHERE clause is not recommended because it forces repeating selection, it's actually 2 queries with different execution plans in one. Because both parts of your WHERE clause are causing table scan OR will cause double table scan.
Another unacceptable starting point eh? The need for OR in where clauses. Two passes through the data? Even if the two disjucts refer to the same column(s)? Have the optimiser dev team been on holiday for the last 10 years?
Then we are all agreed on that, anyway.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 2, 2006 at 10:56 am
So I came up with this:
SELECT Distinct VenueID
FROM VirtualVenues V join (Meetings M join Reservations R ON ReservationMeetingID=MeetingID) on M.VirtualVenue=VenueID
WHERE V.IncludeRandom = 1
AND V.Inactive = 0
AND V.SiteID=39
and M.Is_Master=0
AND M.Cancelled=0
AND R.Cancelled=0
AND ( (DATEADD(mi, (R.ResSetupMin*-1), M.DateTimeStart)
NOT BETWEEN CAST('6/2/2006 4:00:00 PM' AS datetime)
AND CAST('6/2/2006 5:00:00 PM' AS datetime))
OR (CAST('6/2/2006 4:00:00 PM' AS datetime)
NOT BETWEEN DATEADD(mi, (R.ResSetupMin*-1), M.DateTimeStart)
AND DATEADD(mi, (R.ResClosureMin-1), M.DateTimeEnd)))
ORDER BY VenueID
This completes in less than a second and the execution plan is very different. More complicated, but I guess it helps. I'll have to read more about using indexes as several questions come to mind and I need to keep this DB in production as I make changes.
Please let me know if you see additional improvements that could be made.
thanks again,
myk
June 2, 2006 at 11:28 am
Good plan - but I think the OR should be an AND. Shame the date and minutes fields aren't in the same table.
Here's a couple of alternative versions:
@seekstart datetime
@seekend = dateadd(h,@seekdatetime,1)
Distinct VenueID
VirtualVenues V
Meetings M
M.VirtualVenue=VenueID
Reservations R
ReservationMeetingID=MeetingID
V.IncludeRandom = 1
V.Inactive = 0
V.SiteID=39
M.Is_Master=0
M.Cancelled=0
R.Cancelled=0
M.DateTimeStart NOT BETWEEN dateadd(mi,R.ResSetupMin,@seekstart)
@seekstart NOT BETWEEN DATEADD(mi, (R.ResSetupMin*-1), M.DateTimeStart)
AND DATEADD(mi, (R.ResClosureMin-1), M.DateTimeEnd)
(DATEADD(mi, (R.ResSetupMin), @seekstart) < M.DateTimeStart
BY VenueID
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 3, 2006 at 3:28 am
Should have added that your joins would need to be left joins, and the where clauses amended to handle NULLs, otherwise new venues withut any reservations would never show up...though perhaps it might be more efficient just to do a UNION ALL to add the 'virgin' records.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 5, 2006 at 7:15 am
thanks, I guess I don't understand why OR's take more time than AND's, but I'll take your word for it and investigate more on my own. I understand what you mean about NULLs causing new venues to be left out. But I'm not sure how to express the idea that venueA doesn't have any reservations and therefore can be included. Can you elaborate a little more? I'll look up UNION ALL and be checking back here as well. Thanks for all your help.
June 5, 2006 at 7:34 am
Second AND is applied to subset returned by first AND.
Second OR is applied to the same original set of rows as first one. Results of both OR's are united.
_____________
Code for TallyGenerator
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply