January 23, 2004 at 2:11 pm
I was asked to help a developer with a slow query. While making changes, I generated a query with a GROUP BY that produces several records for each group. In addition, the where clause no longer limits the records included in the resultset. Can somebody explain why having a derived table in the sub query causes this behavior? I'm using SQL Server 2000.
-- Create and load a test table.
CREATE TABLE #temp (
[calendar_sid] [int] NOT NULL,
[case_sid] [int] NOT NULL,
[defendant_sid] [int] NULL,
[calendar_time] [datetime] NULL ,
CONSTRAINT [PK_case_calendar] PRIMARY KEY CLUSTERED
(
[calendar_sid]
) ON [PRIMARY] ,
) ON [PRIMARY]
SET NOCOUNT ON
INSERT INTO #temp
SELECT calendar_sid, case_sid, defendant_sid, calendar_time
FROM case_calendar
SET NOCOUNT OFF
-- This query works as expected - one record per c.defendant_sid
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
( SELECT CAST(MIN(t.calendar_time) as char(20))
FROM #temp t
WHERE t.defendant_sid = c.defendant_sid
) as [Min2]
FROM #temp c
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid
ORDER BY c.defendant_sid
-- This query does not work as expected - many records
-- per c.defendant_sid and c.defendant_sid < 10 not true
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
(SELECT CAST(MIN(a.calendar_time) as char(20))
FROM (
SELECT t.calendar_time
FROM #temp t
WHERE t.defendant_sid = c.defendant_sid
) as a
) as [Min2]
FROM #temp c
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid
ORDER BY c.defendant_sid
drop table #temp
Results:
defendant_sid count MIN(calendar_time) Min2
------------- ----------- -------------------- --------------------
3 1 May 1 2003 10:00AM May 1 2003 10:00AM
6 2 Mar 1 2003 10:00AM Mar 1 2003 10:00AM
9 2 Feb 1 2003 1:00PM Feb 1 2003 1:00PM
(3 row(s) affected)
defendant_sid count MIN(calendar_time) Min2
------------- ----------- -------------------- --------------------
3 1 May 1 2003 10:00AM May 1 2003 10:00AM
3 2 Mar 1 2003 10:00AM May 1 2003 10:00AM
3 2 Feb 1 2003 1:00PM May 1 2003 10:00AM
6 1 May 1 2003 10:00AM Mar 1 2003 10:00AM
6 2 Mar 1 2003 10:00AM Mar 1 2003 10:00AM
6 2 Feb 1 2003 1:00PM Mar 1 2003 10:00AM
6 1 May 1 2003 10:00AM Mar 1 2003 10:00AM
6 2 Mar 1 2003 10:00AM Mar 1 2003 10:00AM
6 2 Feb 1 2003 1:00PM Mar 1 2003 10:00AM
9 1 May 1 2003 10:00AM Feb 1 2003 1:00PM
9 2 Mar 1 2003 10:00AM Feb 1 2003 1:00PM
9 2 Feb 1 2003 1:00PM Feb 1 2003 1:00PM
9 1 May 1 2003 10:00AM Feb 1 2003 1:00PM
9 2 Mar 1 2003 10:00AM Feb 1 2003 1:00PM
9 2 Feb 1 2003 1:00PM Feb 1 2003 1:00PM
13 1 May 1 2003 10:00AM Mar 27 2003 9:00AM
13 2 Mar 1 2003 10:00AM Mar 27 2003 9:00AM
13 2 Feb 1 2003 1:00PM Mar 27 2003 9:00AM
13 1 May 1 2003 10:00AM Mar 27 2003 9:00AM
13 2 Mar 1 2003 10:00AM Mar 27 2003 9:00AM
13 2 Feb 1 2003 1:00PM Mar 27 2003 9:00AM
14 1 May 1 2003 10:00AM Apr 28 2003 1:00PM
...
1149 2 Feb 1 2003 1:00PM Jan 22 2004 1:30PM
1150 1 May 1 2003 10:00AM Jan 22 2004 1:00PM
1150 2 Mar 1 2003 10:00AM Jan 22 2004 1:00PM
1150 2 Feb 1 2003 1:00PM Jan 22 2004 1:00PM
1150 1 May 1 2003 10:00AM Jan 22 2004 1:00PM
1150 2 Mar 1 2003 10:00AM Jan 22 2004 1:00PM
1150 2 Feb 1 2003 1:00PM Jan 22 2004 1:00PM
(1236 row(s) affected)
PS I probably will not use a subquery in the final result, but I'm curious about this.
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
January 23, 2004 at 3:44 pm
It looks like the sort of bug where the parser designers never tested a correlated derived table in a select list used with aggregates. This should instead give you an error message, as derived tables consisting of correlated subqueries are generally not allowed. I think it's performing the correlation before aggregating, essentially overriding the aggregation of the first MIN(). Your first formulation is better, anyway, and you could also (as you like derived tables ) try:
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
CAST(x.xMin) as char(20)) as [Min2]
FROM #temp c JOIN
(SELECT Defendent_SId, MIN(Calendar_Time) xMin
FROM #temp
GROUP BY Defendant_SId) x ON x.Defendant_SId = c.Defendant_SId
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid, x.xMin
ORDER BY c.defendant_sid
--Jonathan
January 23, 2004 at 4:03 pm
I think I'll skip the derived table. I can never tell what I'm going to get from the developers . The original query is worse as this stuff was in the on clause of a join . It does get interesting from time to time to see a query that works, but is not compatible with sanity. BTW, the select from a correlated subquery in the derived table did work in the on clause of the join.
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
January 23, 2004 at 4:21 pm
Could you post the example of the correlated subquery used as a derived table? I thought derived tables are not evaluated dependently... The only time I've seen a correlation work in a derived table was when it was self-contained; i.e. no tables outside the derived table were referenced.
--Jonathan
January 26, 2004 at 2:19 pm
I took a look at the execution plans for the example code included above. They both use parallelism, but the second one (with the problem) is more complicated and has a "Nested Loops/Inner Join" step which the first query does not. I'm no expert when it comes to plans - guess it's time to find a good reference, if there is one. Recommendations?
*** I'll put the SHOWPLAN output in the next post because of it's ugly format. ***
Below is the original query that led me to experiment with the above. It does work - the correlated derived table is in the on clause, so I guess different behavior is expected. I added an index on Calendar_time and pre-calculated datetimes (no cast or dateadd) which reduced the query time from 30 to 2 seconds.
In the final version, I got rid of the correlation and used nested derived tables to select one calendar_sid for each defendant. I also moved two inner joins to the WHERE as EXISTS clauses. The result now takes 10ms.
DECLARE @StartDate datetime
SET @StartDate = '2004-01-23'
-- Proc code follows
DECLARE @StringStartDate varchar(10)
SET @StringStartDate = CONVERT(varchar(10), @StartDate, 101)
SELECT DISTINCT
n.formatted_name,
d.booking_number,
d.booking_date,
dbo.fn_ListReferredCharges(d.defendant_sid) as [Booking Charges],
dbo.fn_ListFiledCharges(d.defendant_sid) as [Filed Charges],
fc.calendar_time,
fc.hearing_description,
s.status_name,
c.RA_case_numbers,
d.defendant_sid
FROM dbo.vw_defendant_unrestricted d
LEFT OUTER JOIN dbo.vw_person_name_unrestricted n ON
n.name_sid = d.name_sid
LEFT OUTER JOIN dbo.vw_cases_unrestricted c ON
c.case_sid = d.case_sid
LEFT OUTER JOIN dbo.ref_case_status s ON
s.status_id = d.defendant_status
LEFT OUTER JOIN dbo.case_calendar fc ON
fc.defendant_sid = d.defendant_sid AND
fc.calendar_time = (
SELECT MIN(calendar_time)
FROM (
SELECT calendar_time
FROM case_calendar
WHERE defendant_sid = d.defendant_sid AND
calendar_time > cast((@StringStartDate + ' 1:30 PM') as datetime)
  as a
 
INNER JOIN dbo.case_calendar cc ON
cc.defendant_sid = d.defendant_sid AND
cc.hearing_type IN ('FAR', 'MAR') AND
cc.court_department = 'M' AND
DATEADD(mi, -810, cc.calendar_time) = @StringStartDate
INNER JOIN dbo.case_activity ca ON
ca.case_sid = c.case_sid AND
ca.activity_id = 'INTK' AND
ca.priority = 1
WHERE d.custody_code = 'IC'
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
January 26, 2004 at 2:22 pm
StmtText
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
( SELECT CAST(MIN(t.calendar_time) as char(20))
FROM #temp t
WHERE t.defendant_sid = c.defendant_sid
) as [Min2]
FROM #temp c
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid
ORDER BY c.defendant_sid
(1 row(s) affected)
StmtText
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Compute Scalar(DEFINE[Expr1003]=Convert([Expr1002]), [Expr1007]=Convert([Expr1005])))
|--Stream Aggregate(GROUP BY[c].[defendant_sid]) DEFINE[Expr1005]=MIN([t].[calendar_time]), [Expr1001]=ANY([Expr1001]), [Expr1002]=ANY([Expr1002])))
|--Sort(ORDER BY[c].[defendant_sid] ASC))
|--Parallelism(Gather Streams)
|--Hash Match(Left Outer Join, HASH[c].[defendant_sid])=([t].[defendant_sid]), RESIDUAL[t].[defendant_sid]=[c].[defendant_sid]))
|--Bitmap(HASH[c].[defendant_sid]), DEFINE[Bitmap1012]))
| |--Compute Scalar(DEFINE[Expr1001]=Convert([globalagg1009])))
| |--Stream Aggregate(GROUP BY[c].[defendant_sid]) DEFINE[globalagg1009]=SUM([partialagg1008]), [Expr1002]=MIN([partialagg1010])))
| |--Parallelism(Repartition Streams, PARTITION COLUMNS[c].[defendant_sid]), ORDER BY[c].[defendant_sid] ASC))
| |--Stream Aggregate(GROUP BY[c].[defendant_sid]) DEFINE[partialagg1008]=Count(*), [partialagg1010]=MIN([c].[calendar_time])))
| |--Sort(ORDER BY[c].[defendant_sid] ASC))
| |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________00000007A633].[PK_case_calendar] AS [c]), WHERE[c].[defendant_sid]<10))
|--Parallelism(Repartition Streams, PARTITION COLUMNS[t].[defendant_sid]), WHEREPROBE([Bitmap1012])=TRUE))
|--Clustered Index Scan(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________00000007A633].[PK_case_calendar] AS [t]))
(14 row(s) affected)
StmtText
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
(SELECT CAST(MIN(a.calendar_time) as char(20))
FROM (
SELECT t.calendar_time
FROM #temp t
WHERE t.defendant_sid = c.defendant_sid
) as a
) as [Min2]
FROM #temp c
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid
ORDER BY c.defendant_sid
(1 row(s) affected)
StmtText
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Compute Scalar(DEFINE[Expr1004]=Convert([Expr1002]), [Expr1008]=Convert([Expr1006])))
|--Nested Loops(Inner Join)
|--Parallelism(Gather Streams, ORDER BY[c].[defendant_sid] ASC))
| |--Sort(ORDER BY[c].[defendant_sid] ASC))
| |--Hash Match(Right Outer Join, HASH[t].[defendant_sid])=([c].[defendant_sid]), RESIDUAL[t].[defendant_sid]=[c].[defendant_sid]))
| |--Hash Match(Aggregate, HASH[t].[defendant_sid]), RESIDUAL[t].[defendant_sid]=[t].[defendant_sid]) DEFINE[Expr1006]=MIN([t].[calendar_time])))
| | |--Parallelism(Repartition Streams, PARTITION COLUMNS[t].[defendant_sid]))
| | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________00000007A633].[PK_case_calendar] AS [t]))
| |--Parallelism(Repartition Streams, PARTITION COLUMNS[c].[defendant_sid]))
| |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________00000007A633].[PK_case_calendar] AS [c]))
|--Table Spool
|--Compute Scalar(DEFINE[Expr1001]=Convert([Expr1015])))
|--Stream Aggregate(GROUP BY[c].[defendant_sid]) DEFINE[Expr1015]=Count(*), [Expr1002]=MIN([c].[calendar_time])))
|--Sort(ORDER BY[c].[defendant_sid] ASC))
|--Clustered Index Scan(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________00000007A633].[PK_case_calendar] AS [c]), WHERE[c].[defendant_sid]<10))
(15 row(s) affected)
PS I used cut and paste from QA. I do not know why the "Sad" and "Wink" faces are displayed in the preview - I did not add them. A bug?
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
January 26, 2004 at 2:53 pm
Okay, I see why that "correlated derived table" works. It doesn't need to be a derived table, and the query optimizer automatically reduces it to a simple correlated subquery.
--Jonathan
January 26, 2004 at 2:56 pm
Simply turn on Query|Show Execution Plan in QA, run both queries in the same batch, and then compare the relative times in the Execution Plan window. Try my variant with the subquery joined, too.
--Jonathan
January 26, 2004 at 4:49 pm
Here it is one more time - my post gets lost from time to time....
The developer simplified the problem by removing the derived table. I think this is the best form.
LEFT OUTER JOIN dbo.case_calendar fc ON
fc.calendar_sid = (
SELECT TOP 1 calendar_sid
FROM case_calendar
WHERE defendant_sid = d.defendant_sid AND
calendar_time > @StringStartDateTime
ORDER BY calendar_time
)
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
January 26, 2004 at 5:09 pm
-- Query 1: Query Cost (relative to batch): 48.89%
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
( SELECT CAST(MIN(t.calendar_time) as char(20))
FROM #temp t
WHERE t.defendant_sid = c.defendant_sid
) as [Min2]
FROM #temp c
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid
ORDER BY c.defendant_sid
-- Query 2: Query Cost (relative to batch): 51.11%
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
CAST(x.xMin as char(20)) as [Min2]
FROM #temp c JOIN
(SELECT defendant_sid, MIN(calendar_time) xMin
FROM #temp
GROUP BY Defendant_sid) x ON
x.Defendant_sid = c.Defendant_sid
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid, x.xMin
ORDER BY c.defendant_sid
Not much different.
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
January 26, 2004 at 5:11 pm
...and thanks for the tip. These queries are fast enough that comparing relative times in one batch via the execution plan window is nice.
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
January 26, 2004 at 5:28 pm
Here is another batch that includes the buggy result. In the last query, the output has the same number of records as in #temp (776937) rather than 1. This is despite the where and group by clauses.
-- Query 1: Query Cost (relative to batch): 0.02%
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
( SELECT CAST(MIN(t.calendar_time) as char(20))
FROM #temp t
WHERE t.defendant_sid = c.defendant_sid
) as [Min2]
FROM #temp c
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid
ORDER BY c.defendant_sid
-- Query 2: Query Cost (relative to batch): 0.02%
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
CAST(x.xMin as char(20)) as [Min2]
FROM #temp c JOIN
(SELECT defendant_sid, MIN(calendar_time) xMin
FROM #temp
GROUP BY Defendant_sid) x ON
x.Defendant_sid = c.Defendant_sid
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid, x.xMin
ORDER BY c.defendant_sid
-- Query 3: Query Cost (relative to batch): 0.13%
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
( SELECT TOP 1 t.calendar_time
FROM #temp t
WHERE t.defendant_sid = c.defendant_sid
ORDER BY t.calendar_time
) as [Min2]
FROM #temp c
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid
ORDER BY c.defendant_sid
-- Query 4: Query Cost (relative to batch): 99.82%
--
-- (776937 row(s) affected)!!! -- should be 1
SELECT c.defendant_sid,
count(*) as [count],
CAST(MIN(c.calendar_time) as char(20)) as [MIN(calendar_time)],
(SELECT CAST(MIN(a.calendar_time) as char(20))
FROM (
SELECT t.calendar_time
FROM #temp t
WHERE t.defendant_sid = c.defendant_sid
) as a
) as [Min2]
FROM #temp c
WHERE c.defendant_sid < 10
GROUP BY c.defendant_sid
ORDER BY c.defendant_sid
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply