March 12, 2008 at 3:19 am
For this particular case of continuous values, 'NOT BETWEEN 214 AND 217' runs twice as faster as the best solution (NOT EXISTS) !
March 12, 2008 at 4:28 am
This article, and its many replies, is a great example of why I like this site so much! Valuable points are made, with time taken to describe exactly how answers were derived, and comments flesh out the original discussion quite nicely.
Overall, thanks very much for this - Sharon.
March 12, 2008 at 10:14 am
Paulo G., that was a great catch!
BETWEEN didn't even cross my mind when looking at this originally. I was looking more at the generalities of which techniques worked best with a multi-value filter and totally missed the fact that my filter data were sequential.
If we change our sample data to something like (8, 214, 271, 310), NOT EXISTS keeps its performance edge, but the best solution should take into account the filter data, and the BETWEEN solution crushed the NOT EXISTS solution for the given sample (214-217).
Thanks for the new angle!
KenJ
March 17, 2008 at 2:09 am
as an aside set base generation is a tad quicker
set statistics io on
set statistics time on
CREATE TABLE tbl_IN_VS_AND (filterCriterion_sv int PRIMARY KEY CLUSTERED NOT NULL)
Insert tbl_IN_VS_AND
SELECT 100000 * U.I + 10000 * V.I + 1000 * W.I + 100 * X.I + 10 * Y.I + Z.I filterCriterion_sv FROM
(SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) U,
(SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) V,
(SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) W,
(SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) X,
(SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Y,
(SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Z
set statistics io off
set statistics time off
at about 10 seconds on a dual core at least.
to compare hardware the first two queries in the post on a baseline of 100 I get
43766 ms and 43750 ms elapsed
Biggest shock was the times of
SELECT @results = count(filterCriterion_sv)
FROM tbl_IN_VS_AND
WHERE filterCriterion_sv NOT IN (Select 214 I UNION SELECT 215 UNION SELECT 216 UNION SELECT 217)
SELECT @results = count(filterCriterion_sv)
FROM tbl_IN_VS_AND
WHERE filterCriterion_sv NOT IN (214,215,216,217)
were 27176 ms
and 43486 ms
while
SELECT @results = count(filterCriterion_sv)
FROM tbl_IN_VS_AND
WHERE NOT EXISTS(SELECT * FROM
(
SELECT 214 AS filterValue_val UNION ALL
SELECT 215 UNION ALL
SELECT 216 UNION ALL
SELECT 217 ) AS tbl
WHERE tbl.filterValue_val = tbl_IN_VS_AND.filterCriterion_sv )
ended up was still 14470 ms so twice as good as the union
I guess it must not construct the in predicate into a working table before starting the query. (until you tell it to via union)
July 19, 2008 at 3:58 pm
In my experience it is always faster to join with "exists" then do a painful "in". "In" is only good for several values. With ssis, you can actually use a pre-existing table to insert your values into and join aginst.
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply