October 17, 2011 at 2:33 am
Gianluca Sartori (10/17/2011)
Cadavre (10/17/2011)
Gianluca Sartori (10/14/2011)
I couldn't resist!Me either π
I already knew it was slow, but it was such a fun problem!
It's something you don't stumble upon very often.
Agreed. Which is why I set-up the original test - whenever I do come across this sort of problem the only solution that ever comes to mind is a case statement. So seeing yours and Chris' solutions was extremely interesting, a different way to consider the problem (even if in this particular case, the case method was quicker).
October 17, 2011 at 2:37 am
Another (slower) method:
SELECT id, col = CASE WHEN v1 <= ALL (SELECT v FROM ( VALUES (v2), (v3), (v4), (v5), (v6)) AS src (v))
THEN 1
WHEN v2 <= ALL (SELECT v FROM ( VALUES (v1), (v3), (v4), (v5), (v6)) AS src (v))
THEN 2
WHEN v3 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v4), (v5), (v6)) AS src (v))
THEN 3
WHEN v4 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v3), (v5), (v6)) AS src (v))
THEN 4
WHEN v5 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v3), (v4), (v6)) AS src (v))
THEN 5
ELSE 6
END
FROM #Tab1 t
-- Gianluca Sartori
October 17, 2011 at 2:55 am
Gianluca Sartori (10/17/2011)
Cadavre (10/17/2011)
Gianluca Sartori (10/14/2011)
I couldn't resist!Me either π
I already knew it was slow, but it was such a fun problem!
It's something you don't stumble upon very often.
As I noted earlier it's similar to address "crunching" or "shuffling", where your row contains say add1, add2, add3, add4 and postcode. add2 is null or blank for some rows, but you don't want an empty row on the address label, so you shift the contents of the columns leftwards from add3 to fill the space.
The number of empty address elements can vary from 0 to perhaps 4 or 5, and the number of address elements can vary too - up to add7 + postcode.
We've got functions to deal with this. One of them is a pivot type similar to Gianluca's, the other is a cross apply. When I've got a bit of time I'll set up a test to confirm which is quickest, but results so far indicate it's the cross apply+sort version.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 17, 2011 at 3:01 am
Other two (slower) methods:
-- UNPIVOT + ROW NUMBER on the "2nd dimension"
SELECT id, col
FROM (
SELECT *, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY value) AS RN
FROM #Tab1
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
) AS src
WHERE RN = 1
-- UNPIVOT + ROW NUMBER on the "3rd dimension"
SELECT id, col = (
SELECT col
FROM (
SELECT col, RN = ROW_NUMBER() OVER (ORDER BY value)
FROM (
SELECT id, v1, v2, v3, v4, v5, v6
) AS v
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
) AS src
WHERE RN = 1
)
FROM #Tab1 AS T
I think nothing can beat Lowell's "big case" here.
-- Gianluca Sartori
October 17, 2011 at 3:52 am
Another method 3x faster than original stoned snail CROSS APPLY:
SELECT t.id, t.v1, t.v2, t.v3, t.v4, t.v5, t.v6, x.ColIndex
FROM @Tab1 t
CROSS APPLY (
SELECT rn = ROW_NUMBER() OVER (ORDER BY v), ColIndex
FROM (VALUES (1,v1),(2,v2),(3,v3),(4,v4),(5,v5),(6,v6)) v (ColIndex,v)
) x (rn,ColIndex)
WHERE x.rn = 1
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 17, 2011 at 4:13 am
IF object_id('tempdb..#Tab1') IS NOT NULL
BEGIN
DROP TABLE #Tab1
END
--1,000,000 Random rows of data
SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,
v1, v2, v3, v4, v5, v6
INTO #Tab1
FROM (SELECT (ABS(CHECKSUM(NewId())) % 10) + 1 AS v1,
(ABS(CHECKSUM(NewId())) % 10) + 1 AS v2,
(ABS(CHECKSUM(NewId())) % 10) + 1 AS v3,
(ABS(CHECKSUM(NewId())) % 10) + 1 AS v4,
(ABS(CHECKSUM(NewId())) % 10) + 1 AS v5,
(ABS(CHECKSUM(NewId())) % 10) + 1 AS v6
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3) a
WHERE (v1 <> v2 AND v1 <> v3 AND v1 <> v4 AND v1 <> v5 AND v1 <> v6) AND
(v2 <> v3 AND v2 <> v4 AND v2 <> v5 AND v2 <> v6) AND (v3 <> v4 AND v3 <> v5 AND v3 <> v6) AND
(v4 <> v5 AND v4 <> v6) AND (v5 <> v6)
--Add a Primary Key
ALTER TABLE #Tab1
ADD CONSTRAINT Tab1_PK_ID
PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100
DECLARE @VAR INT
PRINT '========== BASELINE =========='
SET STATISTICS TIME ON
SELECT @VAR = ID FROM #Tab1
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== CROSS JOIN =========='
SET STATISTICS TIME ON
SELECT @VAR = x.ColIndex
FROM #Tab1 t
CROSS APPLY (
SELECT TOP 1 ColIndex
FROM (SELECT ColIndex = 1, ColName = v1 UNION ALL SELECT 2, v2
UNION ALL SELECT 3, v3 UNION ALL SELECT 4, v4
UNION ALL SELECT 5, v5 UNION ALL SELECT 6, v6) d
ORDER BY ColName) x(ColIndex)
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== CASE =========='
SET STATISTICS TIME ON
SELECT @VAR = CASE WHEN (v1 <= v2) AND (v1 <= v3) AND (v1 <= v4) AND (v1 <= v5) AND (v1 <= v6)
THEN 1
WHEN (v2 <= v1) AND (v2 <= v3) AND (v2 <= v4) AND (v2 <= v5) AND (v2 <= v6)
THEN 2
WHEN (v3 <= v1) AND (v3 <= v2) AND (v3 <= v4) AND (v3 <= v5) AND (v3 <= v6)
THEN 3
WHEN (v4 <= v1) AND (v4 <= v2) AND (v4 <= v3) AND (v4 <= v5) AND (v4 <= v6)
THEN 4
WHEN (v5 <= v1) AND (v5 <= v2) AND (v5 <= v3) AND (v5 <= v4) AND (v5 <= v6)
THEN 5
ELSE 6
END
FROM #Tab1 t
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== UNPIVOT =========='
SET STATISTICS TIME ON
SELECT @VAR = (
SELECT SUBSTRING(col,2,1)
FROM (
SELECT ID, v1, v2, v3, v4, v5, v6
) AS v
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
WHERE value = (
SELECT MIN(value)
FROM (
SELECT *
FROM (
SELECT ID, v1, v2, v3, v4, v5, v6
) AS v1
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
) AS src
)
)
FROM #Tab1 AS T
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== Another CASE =========='
SET STATISTICS TIME ON
SELECT @VAR = CASE WHEN v1 <= ALL (SELECT v FROM ( VALUES (v2), (v3), (v4), (v5), (v6)) AS src (v))
THEN 1
WHEN v2 <= ALL (SELECT v FROM ( VALUES (v1), (v3), (v4), (v5), (v6)) AS src (v))
THEN 2
WHEN v3 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v4), (v5), (v6)) AS src (v))
THEN 3
WHEN v4 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v3), (v5), (v6)) AS src (v))
THEN 4
WHEN v5 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v3), (v4), (v6)) AS src (v))
THEN 5
ELSE 6
END
FROM #Tab1 t
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== UNPIVOT + ROW NUMBER on the "2nd dimension" =========='
SET STATISTICS TIME ON
SELECT @VAR = col
FROM (
SELECT ID, SUBSTRING(col,2,1) AS col,
ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY value) AS RN
FROM #Tab1
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
) AS src
WHERE RN = 1
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== UNPIVOT + ROW NUMBER on the "3rd dimension" =========='
SET STATISTICS TIME ON
SELECT @VAR = (
SELECT col
FROM (
SELECT SUBSTRING(col,2,1) AS col,
RN = ROW_NUMBER() OVER (ORDER BY value)
FROM (
SELECT ID, v1, v2, v3, v4, v5, v6
) AS v
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
) AS src
WHERE RN = 1
)
FROM #Tab1 AS T
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== CROSS APPLY =========='
SET STATISTICS TIME ON
SELECT @VAR = x.ColIndex
FROM #Tab1 t
CROSS APPLY (
SELECT rn = ROW_NUMBER() OVER (ORDER BY v), ColIndex
FROM (VALUES (1,v1),(2,v2),(3,v3),(4,v4),(5,v5),(6,v6)) v (ColIndex,v)
) x (rn,ColIndex)
WHERE x.rn = 1
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
(1000000 row(s) affected)
========== BASELINE ==========
SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 153 ms.
================================================================================
========== CROSS JOIN ==========
SQL Server Execution Times:
CPU time = 9563 ms, elapsed time = 9580 ms.
================================================================================
========== CASE ==========
SQL Server Execution Times:
CPU time = 516 ms, elapsed time = 515 ms.
================================================================================
========== UNPIVOT ==========
SQL Server Execution Times:
CPU time = 2859 ms, elapsed time = 755 ms.
================================================================================
========== Another CASE ==========
SQL Server Execution Times:
CPU time = 2641 ms, elapsed time = 808 ms.
================================================================================
========== UNPIVOT + ROW NUMBER on the "2nd dimension" ==========
SQL Server Execution Times:
CPU time = 10717 ms, elapsed time = 5288 ms.
================================================================================
========== UNPIVOT + ROW NUMBER on the "3rd dimension" ==========
SQL Server Execution Times:
CPU time = 16624 ms, elapsed time = 4391 ms.
================================================================================
========== CROSS APPLY ==========
SQL Server Execution Times:
CPU time = 3687 ms, elapsed time = 995 ms.
================================================================================
October 18, 2011 at 9:30 am
This looks like fun, am I too late? π
I figured this looked like a classic example of a situation where a function might be useful. I'd prefer to farm this out to C# via CLR, but I guess that's breaking the rules...
So, my function is sql only and looks like this:
create function dbo.ChooseMin (
@v1 int
, @v2 int
, @v3 int
, @v4 int
, @v5 int
, @v6 int
) returns table
as
return
with cte as (
select 1 as pos,@v1 as val
union all
select 2,@v2
union all
select 3,@v3
union all
select 4,@v4
union all
select 5,@v5
union all
select 6,@v6
)
select pos, val
from cte
where val = (select min(val) from cte)
Running the test:
DECLARE @VAR INT
PRINT '========== BASELINE =========='
SET STATISTICS TIME ON
SELECT @VAR = ID FROM #Tab1
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== CROSS JOIN =========='
SET STATISTICS TIME ON
SELECT @VAR = x.ColIndex
FROM #Tab1 t
CROSS APPLY (
SELECT TOP 1 ColIndex
FROM (SELECT ColIndex = 1, ColName = v1 UNION ALL SELECT 2, v2
UNION ALL SELECT 3, v3 UNION ALL SELECT 4, v4
UNION ALL SELECT 5, v5 UNION ALL SELECT 6, v6) d
ORDER BY ColName) x(ColIndex)
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== CASE =========='
SET STATISTICS TIME ON
SELECT @VAR = CASE WHEN (v1 <= v2) AND (v1 <= v3) AND (v1 <= v4) AND (v1 <= v5) AND (v1 <= v6)
THEN 1
WHEN (v2 <= v1) AND (v2 <= v3) AND (v2 <= v4) AND (v2 <= v5) AND (v2 <= v6)
THEN 2
WHEN (v3 <= v1) AND (v3 <= v2) AND (v3 <= v4) AND (v3 <= v5) AND (v3 <= v6)
THEN 3
WHEN (v4 <= v1) AND (v4 <= v2) AND (v4 <= v3) AND (v4 <= v5) AND (v4 <= v6)
THEN 4
WHEN (v5 <= v1) AND (v5 <= v2) AND (v5 <= v3) AND (v5 <= v4) AND (v5 <= v6)
THEN 5
ELSE 6
END
FROM #Tab1 t
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== UNPIVOT =========='
SET STATISTICS TIME ON
SELECT @VAR = (
SELECT SUBSTRING(col,2,1)
FROM (
SELECT ID, v1, v2, v3, v4, v5, v6
) AS v
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
WHERE value = (
SELECT MIN(value)
FROM (
SELECT *
FROM (
SELECT ID, v1, v2, v3, v4, v5, v6
) AS v1
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
) AS src
)
)
FROM #Tab1 AS T
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== Another CASE =========='
SET STATISTICS TIME ON
SELECT @VAR = CASE WHEN v1 <= ALL (SELECT v FROM ( VALUES (v2), (v3), (v4), (v5), (v6)) AS src (v))
THEN 1
WHEN v2 <= ALL (SELECT v FROM ( VALUES (v1), (v3), (v4), (v5), (v6)) AS src (v))
THEN 2
WHEN v3 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v4), (v5), (v6)) AS src (v))
THEN 3
WHEN v4 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v3), (v5), (v6)) AS src (v))
THEN 4
WHEN v5 <= ALL (SELECT v FROM ( VALUES (v1), (v2), (v3), (v4), (v6)) AS src (v))
THEN 5
ELSE 6
END
FROM #Tab1 t
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== UNPIVOT + ROW NUMBER on the "2nd dimension" =========='
SET STATISTICS TIME ON
SELECT @VAR = col
FROM (
SELECT ID, SUBSTRING(col,2,1) AS col,
ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY value) AS RN
FROM #Tab1
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
) AS src
WHERE RN = 1
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== UNPIVOT + ROW NUMBER on the "3rd dimension" =========='
SET STATISTICS TIME ON
SELECT @VAR = (
SELECT col
FROM (
SELECT SUBSTRING(col,2,1) AS col,
RN = ROW_NUMBER() OVER (ORDER BY value)
FROM (
SELECT ID, v1, v2, v3, v4, v5, v6
) AS v
UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u
) AS src
WHERE RN = 1
)
FROM #Tab1 AS T
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== CROSS APPLY =========='
SET STATISTICS TIME ON
SELECT @VAR = x.ColIndex
FROM #Tab1 t
CROSS APPLY (
SELECT rn = ROW_NUMBER() OVER (ORDER BY v), ColIndex
FROM (VALUES (1,v1),(2,v2),(3,v3),(4,v4),(5,v5),(6,v6)) v (ColIndex,v)
) x (rn,ColIndex)
WHERE x.rn = 1
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== CROSS APPLY w FUNCTION =========='
SET STATISTICS TIME ON
SELECT @VAR = x.pos
from #tab1
cross apply dbo.ChooseMin(v1,v2,v3,v4,v5,v6) x
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
And the results:
========== BASELINE ==========
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 134 ms.
================================================================================
========== CROSS JOIN ==========
SQL Server Execution Times:
CPU time = 7909 ms, elapsed time = 7918 ms.
================================================================================
========== CASE ==========
SQL Server Execution Times:
CPU time = 406 ms, elapsed time = 406 ms.
================================================================================
========== UNPIVOT ==========
SQL Server Execution Times:
CPU time = 1981 ms, elapsed time = 1981 ms.
================================================================================
========== Another CASE ==========
SQL Server Execution Times:
CPU time = 1888 ms, elapsed time = 1882 ms.
================================================================================
========== UNPIVOT + ROW NUMBER on the "2nd dimension" ==========
SQL Server Execution Times:
CPU time = 7909 ms, elapsed time = 13626 ms.
================================================================================
========== UNPIVOT + ROW NUMBER on the "3rd dimension" ==========
SQL Server Execution Times:
CPU time = 11653 ms, elapsed time = 11686 ms.
================================================================================
========== CROSS APPLY ==========
SQL Server Execution Times:
CPU time = 2371 ms, elapsed time = 2413 ms.
================================================================================
========== CROSS APPLY w FUNCTION ==========
SQL Server Execution Times:
CPU time = 1139 ms, elapsed time = 1147 ms.
================================================================================
So, I still don't beat the case statement. But I had fun, so that's all that matters π
But, this is on my own laptop running SQL Express. I'd be interested to see whether the function based approach might parallelise a little better and run faster on SQL Server Pro/Enterprise and mutiple cores.
Unfortunately I'm in an environment where running your own code for fun isn't allowed!
Regards, Iain
October 18, 2011 at 10:16 am
irobertson (10/18/2011)
So, I still don't beat the case statement. But I had fun, so that's all that matters π
Actually, the CASE version isn't even completely optimized. It's doing a bunch of extraneous comparisons, and removing those extraneous comparisons will speed it up even more. (It takes a bit of logic to show that the following is equivalent to the original except in the case where one or more values is NULL, but the original will always output 6 if any of the fields is null.)
CASE
WHEN (v1 <= v2) AND (v1 <= v3) AND (v1 <= v4) AND (v1 <= v5) AND (v1 <= v6)
THEN 1
WHEN (v2 <= v3) AND (v2 <= v4) AND (v2 <= v5) AND (v2 <= v6)
THEN 2
WHEN (v3 <= v4) AND (v3 <= v5) AND (v3 <= v6)
THEN 3
WHEN (v4 <= v5) AND (v4 <= v6)
THEN 4
WHEN (v5 <= v6)
THEN 5
ELSE 6
END
FROM #Tab1 t
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 18, 2011 at 10:28 pm
Lowell (10/14/2011)
maybe it's just me, but both my solution and Chris's seem to produce the exact same execution plan?, but Chris's is 48% relative to the batch where mine was 17%? so the big case is better?
Just a followup on this... "Percent Relative to Batch" can have very little to do with actual resource usage and/or performance. In fact, it can actually be 180 out (ie: totally incorrect). Test both pieces of code in SQL Profiler or using SET STATISTICS ON like some of the posts have above. Never make the decision as to which code is more performant by looking at the Percent of Batch. It lies a whole lot. π
--Jeff Moden
Change is inevitable... Change for the better is not.
October 20, 2011 at 11:29 am
If this were Oracle, I could do it all with an inline expression using the DECODE() and LEAST(<value>, <value>, ....) functions.
However, after deriving a version of your table that doesn't contain repeating groups, it's just a matter of ranking and filtering the result.
select id, n from
(
select id, n, v, rank() over (partition by id order by v asc) v_rank
from
(
select id, 1 as n, v1 as v from @Tab1 union all
select id, 2 as n, v2 as v from @Tab1 union all
select id, 3 as n, v3 as v from @Tab1 union all
select id, 4 as n, v4 as v from @Tab1 union all
select id, 5 as n, v5 as v from @Tab1 union all
select id, 6 as n, v6 as v from @Tab1
) x
) x2
where v_rank = 1;
id n
----------- -----------
1 1
2 3
3 2
4 6
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
October 20, 2011 at 8:43 pm
That's essentially the same as the UNPIVOT solution already posted, Eric. I've not tested it but I'm pretty sure the CASE statement that Drew wrote will beat it fairly well as it did the UNPIVOT.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply