July 7, 2011 at 8:51 pm
Please someone tell me what I did wrong:
You had a typo in your solution (used hobby1 twice from the first table), but that wouldn't effect the performance.
But doesnt this build a test case with no nulls in it? Not sure if the performance will change at all , but if we're going to measure these solutions there should be nulls to be stripped out.
July 7, 2011 at 10:37 pm
Nevyn (7/7/2011)
Please someone tell me what I did wrong:
You had a typo in your solution (used hobby1 twice from the first table), but that wouldn't effect the performance.
<edit>I think I found it, but I agree, it probably did not affect performance. That said, there is a better effort later in this post :-)</edit>
But doesnt this build a test case with no nulls in it? Not sure if the performance will change at all , but if we're going to measure these solutions there should be nulls to be stripped out.
A fair point...I am not sure how or if it should measurably affect performance but I changed it up in the code sample below.
Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?
Here is another set of code with more varied test data, including nulls, as well as a Union/Crosstab method, a variation of the Union All/Crosstab, which removes duplicates but does not offer ordering based on column name.
> APPLY method - removes dups, can be ordered by column name
> String manipulation - preserves dups, can be ordered by column name
> Unpivot/Pivot - preserves dups, can be ordered by column name
> Union All/Crosstab - preserves dups, can be ordered by column name
> Union/Crosstab - removes dups, cannot be ordered by column name
> Parsename - preserves dups, can be ordered by column name
SET NOCOUNT ON ;
GO
SET STATISTICS TIME OFF ;
PRINT REPLICATE('-', 75) + '
build temp tables'
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby1')
AND type IN ( N'U' ) )
DROP TABLE hobby1 ;
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby2')
AND type IN ( N'U' ) )
DROP TABLE hobby2 ;
CREATE TABLE hobby1
(
id INT IDENTITY(1, 1),
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
CREATE TABLE hobby2
(
id INT,
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
INSERT INTO dbo.hobby1
(
hobby1,
hobby2
)
SELECT TOP ( 250 )
CAST(NEWID() AS VARCHAR(36)),
NULL
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
UNION ALL
SELECT TOP ( 250 )
NULL,
CAST(NEWID() AS VARCHAR(36))
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
UNION ALL
SELECT TOP ( 250 )
CAST(NEWID() AS VARCHAR(36)),
CAST(NEWID() AS VARCHAR(36))
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
UNION ALL
SELECT TOP ( 250 )
NULL,
NULL
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2 ;
INSERT INTO dbo.hobby2
(
id,
hobby1,
hobby2
)
SELECT id,
NULL,
CAST(NEWID() AS VARCHAR(36))
FROM dbo.hobby1
WHERE id <= 150
UNION ALL
SELECT id,
hobby2,
CAST(NEWID() AS VARCHAR(36))
FROM dbo.hobby1
WHERE id BETWEEN 151 AND 550
UNION ALL
SELECT id,
NULL,
CAST(NEWID() AS VARCHAR(36))
FROM dbo.hobby1
WHERE id BETWEEN 551 AND 950
UNION ALL
SELECT id,
hobby1,
CAST(NEWID() AS VARCHAR(36))
FROM dbo.hobby1
WHERE id BETWEEN 951 AND 1000 ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
APPLY method'
SELECT 'APPLY method'
SET STATISTICS TIME ON ;
-- now transform it by moving all nulls to the end without RBAR
SELECT t.id,
c1toc4 AS newc1,
newc2,
newc3,
newc4
FROM (
SELECT h1.id AS id,
h1.hobby1 AS c1,
h1.hobby2 AS c2,
h2.hobby1 AS c3,
h2.hobby2 AS c4
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t -- first CROSS APPLY is just to provide shortcuts for the
-- various COALESCE statements
CROSS APPLY (
SELECT c1toc4 = COALESCE(c1, c2, c3, c4),
c2toc4 = COALESCE(c2, c3, c4),
c3toc4 = COALESCE(c3, c4)
) cx
-- second CROSS APPLY figures out what goes in the second column
-- we know the 1st column is always c1toc4 - i.e. the first non-null
-- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4
-- if it matches either or those we need to either take column 4 or
-- null (if col4 matches c1toc4 we have already used it for new column 1)
CROSS APPLY (
SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,
c1toc4), c3toc4), c1toc4), NULLIF(c4, c1toc4))
) cx2
-- and this is for the third column
CROSS APPLY (
SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4, newc2), c1toc4),
NULLIF(NULLIF(c4, newc2), c1toc4))
) cx3
-- and this is for the fourth column
CROSS APPLY (
SELECT newc4 = NULLIF(NULLIF(NULLIF(c4, newc3), newc2), c1toc4)
) cx4 ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
String manipulation'
SELECT 'String manipulation'
SET STATISTICS TIME ON ;
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)),
RTRIM(SUBSTRING(t.s, 37, 36)),
RTRIM(SUBSTRING(t.s, 73, 36)),
RTRIM(SUBSTRING(t.s, 109, 36))
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36
- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2
+ SPACE(36- LEN(h2.hobby2)),'') s
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Unpivot/Pivot'
SELECT 'Unpivot/Pivot'
SET STATISTICS TIME ON ;
WITH OneTableCTE
AS (
SELECT a.id,
a.hobby1 AS 'h1',
a.hobby2 AS 'h2',
b.hobby1 AS 'h3',
b.hobby2 AS 'h4'
FROM hobby1 a
INNER JOIN hobby2 b ON a.id = b.id
),
UnPivotCTE
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )
AS fixedorder
FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )
AS Normalized
)
SELECT OneTableCTE.id,
[1],
[2],
[3],
[4]
FROM (
SELECT id,
hobby,
fixedorder
FROM UnPivotCTE
) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )
AS PivotTable
RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Union All/Cross tab'
SELECT 'Union All/Cross tab'
SET STATISTICS TIME ON ;
WITH cte(id, hobby, hobby_num)
AS (
SELECT h.id,
h.hobby1,
1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby1,
2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby1,
3
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
4
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
),
cte2(id, hobby, row_num)
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)
FROM cte
WHERE hobby IS NOT NULL
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
FROM cte2
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Union/Cross tab'
SELECT 'Union/Cross tab'
SET STATISTICS TIME ON ;
WITH cte(id, hobby)
AS (
SELECT h.id,
h.hobby1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby1
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby2
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
),
cte2(id, hobby, row_num)
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id)
FROM cte
WHERE hobby IS NOT NULL
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
FROM cte2
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Parsename'
SELECT 'Parsename'
SET STATISTICS TIME ON ;
SELECT h1.id,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id ;
SET STATISTICS TIME OFF ;
GO
Common results on my machine:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
APPLY method
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 260 ms.
---------------------------------------------------------------------------
String manipulation
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 191 ms.
---------------------------------------------------------------------------
Unpivot/Pivot
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 248 ms.
---------------------------------------------------------------------------
Union All/Cross tab
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 127 ms.
---------------------------------------------------------------------------
Union/Cross tab
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 131 ms.
---------------------------------------------------------------------------
Parsename
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 137 ms.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 7, 2011 at 10:43 pm
mister.magoo (7/7/2011)
mikes84 (7/6/2011)
I have a situation where I'm querying two tables, each that has up to two hobbies listed. The first table has hobby1 and hobby2, and the second table has hobby3 and hobby4. It's possible for there to be values in any of the fields. So, it could look like:hobby1, hobby2, hobby3, hobby4
basketball, tennis, NULL, NULL
or
hobby1, hobby2, hobby3, hobby4
NULL, NULL, basketball, tennis
However, it could also look like:
hobby1, hobby2, hobby3, hobby4
basketball, NULL, tennis, NULL
Ideally, I want it to look like the first example, where the values are brought to the left. I could use COALESCE in the first and second examples and have it work, but it wouldn't work on the third example. Any ideas how I could get something like the third example to look like the first one?
Mike
Hi Orlando (and other contributors),
I don't understand the test data you are using... perhaps it's me but the question was about shuffling values into the leftmost columns where there were nulls between them.
The fact that the APPLY method was removing dups was lost on me initially. I did not break anything, it was working as you had intended per the initial post all along.
The test data you generate has no NULLS and duplicate entries.
As far as I can see, the CROSS APPLY method I provided is the only one that is correctly processing this data into the two leftmost columns and removing duplicate entries (the removal of duplicates may not be a stated requirement, but it suits me to include it and it makes sense as you wouldn't want duplicates anyway)...
Or am I completely missing the point?
I agree with you. I think what you have is a practical solution...why not de-dup? and why not preserve the order of the columns as they are examined per the written query?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 7, 2011 at 11:17 pm
I decided to build a large amount of test data that more closely matches the OP's given conditions. I'll be back with a possible solution soon. Not to worry about the 100,000 rows in each table... this runs very, very fast. Also note that the column names of the Hobby Names match the OP's.
--=====================================================================================================================
-- Create and populate the test tables. This is NOT a part of the solution.
--=====================================================================================================================
--===== Declare and preset some obviously named local variables
DECLARE @RowsInFirstTable INT,
@RowsInSecondTable INT
;
SELECT @RowsInFirstTable = 100000,
@RowsInSecondTable = 100000
;
--===== Conditionally drop the test tables to make reruns in SSMS easier
IF OBJECT_ID('tempdb..#HobbyName','U') IS NOT NULL DROP TABLE #HobbyName; --Used to build a lot of test data
IF OBJECT_ID('tempdb..#Hobby1','U') IS NOT NULL DROP TABLE #Hobby1;
IF OBJECT_ID('tempdb..#Hobby2','U') IS NOT NULL DROP TABLE #Hobby2;
--===== Create a list of HobbyNames so we can randomly populate the test tables.
-- The NULL's are simply part of the possible things that could be a hobby.
SELECT HobbyNameID = IDENTITY(INT,1,1),
HobbyName = CAST(d.HobbyName AS VARCHAR(15))
INTO #HobbyName
FROM (
SELECT NULL UNION ALL
SELECT NULL UNION ALL
SELECT 'BaseBall' UNION ALL
SELECT 'Basketball' UNION ALL
SELECT 'Tennis' UNION ALL
SELECT 'FootBall' UNION ALL
SELECT 'Hockey' UNION ALL
SELECT 'Rugby' UNION ALL
SELECT 'Golf' UNION ALL
SELECT 'Bowling'
) d (HobbyName)
;
ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100
;
--===== Create the "first" table and populate it on the fly
SELECT TOP (@RowsInFirstTable)
Hobby1ID = IDENTITY(INT,1,1),
Hobby1 = (SELECT TOP 1 HobbyName FROM #HobbyName WHERE HobbyNameID = ABS(CHECKSUM(NEWID()))%10+1),
Hobby2 = (SELECT TOP 1 HobbyName FROM #HobbyName WHERE HobbyNameID = ABS(CHECKSUM(NEWID()))%10+1)
INTO #Hobby1
FROM sys.all_columns ac1,
sys.all_columns ac2
;
--===== Create the "second" table and populate it on the fly
SELECT TOP (@RowsInSecondTable)
Hobby2ID = IDENTITY(INT,1,1),
Hobby3 = (SELECT TOP 1 HobbyName FROM #HobbyName WHERE HobbyNameID = ABS(CHECKSUM(NEWID()))%10+1),
Hobby4 = (SELECT TOP 1 HobbyName FROM #HobbyName WHERE HobbyNameID = ABS(CHECKSUM(NEWID()))%10+1)
INTO #Hobby2
FROM sys.all_columns ac1,
sys.all_columns ac2
;
--===== Add PK's to both tables
ALTER TABLE #Hobby1 ADD PRIMARY KEY CLUSTERED (Hobby1ID) WITH FILLFACTOR = 100;
ALTER TABLE #Hobby2 ADD PRIMARY KEY CLUSTERED (Hobby2ID) WITH FILLFACTOR = 100;
--Jeff Moden
Change is inevitable... Change for the better is not.
July 7, 2011 at 11:41 pm
For what it's worth, here's my rendition of how this problem might be solved. It doesn't preserve dupes and it sorts the HobbyNames on each row alphabetically. As always, details are in the code. As a sidebar, it only "dips" each table once and there is no join. I also did my own form of unpivot simply because I don't care for the UNPIVOT syntax. 😀 I'll let Orlando test it on his machine rather than make any performance claims. (Orlando, the test data above is set for 100,000 rows... change it {just a couple of variable entries} to whatever you were using for the other's code. Thanks)
--=====================================================================================================================
-- Now, solve the problem. The details are, of course, in the comments.
-- This method only "dips" each table once.
--=====================================================================================================================
WITH
cteEnumerate AS
( --=== This CTE creates an EAV stacked Table removing dupes and NULLs in the process.
-- Then it numbers (enumerates) the item's "column numbers" for reassembly in the outer SELECT coming up.
-- As a pleasent side effect, the HobbyNames are sorted in alphabetical order.
-- This step is a form of "pre-aggregation"
SELECT ColNum = ROW_NUMBER() OVER (PARTITION BY eav.HobbyID ORDER BY eav.HobbyName),
eav.HobbyID,
eav.HobbyName
FROM ( --=== This does the EAV "stacking", removal of dupes, removal of NULLs as combined info from both tables
SELECT --===== Unpivot the "first" hobby table removing nulls and dupes
DISTINCT
HobbyID = h1.Hobby1ID,
HobbyName = upvt.HobbyName
FROM #Hobby1 h1
CROSS APPLY ( --=== This does a high speed "Unpivot" of the first table
SELECT Hobby1 UNION ALL
SELECT Hobby2
) upvt (HobbyName)
WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK
UNION -- We use union to get rid of dupes between tables ----------------------------------------------------
SELECT --===== Unpivot the "second" hobby table removing nulls and dupes
DISTINCT
HobbyID = h2.Hobby2ID,
HobbyName = upvt.HobbyName
FROM #Hobby2 h2
CROSS APPLY ( --=== This does a high speed "Unpivot" of the second table
SELECT Hobby3 UNION ALL
SELECT Hobby4
) upvt (HobbyName)
WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK
) eav
) --=== Now, we reassemble the data using a high-speed cross tab.
-- For convenience of anticipated queries, we also return the number of hobbies for each "HobbyID"
SELECT HobbyID,
Hobby1 = MAX(CASE WHEN ColNum = 1 THEN HobbyName ELSE '' END),
Hobby2 = MAX(CASE WHEN ColNum = 2 THEN HobbyName ELSE '' END),
Hobby3 = MAX(CASE WHEN ColNum = 3 THEN HobbyName ELSE '' END),
Hobby4 = MAX(CASE WHEN ColNum = 4 THEN HobbyName ELSE '' END),
HobbyCount = COUNT(*)
FROM cteEnumerate
GROUP BY HobbyID
;
--Jeff Moden
Change is inevitable... Change for the better is not.
July 8, 2011 at 12:11 am
I have some re-thinking to do and some more learning to do from your work Jeff...I had suspicions before but it is abundantly clear to me now just how bad my approach is to conjuring up test data :laugh:
Jeff, I modded your test data build script so we could run the existing solutions against the concrete tables we were using before. It just seemed easier than modding all the solution scripts but that means there is a risk that I munged your code in the process but I think I did OK.
> APPLY method - removes dups, can be ordered by column name
> String manipulation - preserves dups, can be ordered by column name
> Unpivot/Pivot - preserves dups, can be ordered by column name
> Union All/Crosstab - preserves dups, can be ordered by column name, does not return rows where all hobbies are null
> Union/Crosstab - removes dups, cannot be ordered by column name, does not return rows where all hobbies are null
> Parsename - preserves dups, can be ordered by column name
> Jeff's Rendition - removes dups, cannot be ordered by column name, does not return rows where all hobbies are null
Here is the complete set of tests:
SET NOCOUNT ON ;
GO
SET STATISTICS TIME OFF ;
GO
PRINT REPLICATE('-', 75) + '
build temp tables'
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby1')
AND type IN ( N'U' ) )
DROP TABLE hobby1 ;
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby2')
AND type IN ( N'U' ) )
DROP TABLE hobby2 ;
CREATE TABLE hobby1
(
id INT IDENTITY(1, 1),
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
CREATE TABLE hobby2
(
id INT IDENTITY(1, 1),
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
--===== Add PK's to both tables
ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;
ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;
GO
--=====================================================================================================================
-- Create and populate the test tables. This is NOT a part of the solution.
--=====================================================================================================================
--===== Declare and preset some obviously named local variables
DECLARE @RowsInFirstTable INT,
@RowsInSecondTable INT ;
SELECT @RowsInFirstTable = 100000,
@RowsInSecondTable = 100000 ;
IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL
DROP TABLE #HobbyName;
--===== Create a list of HobbyNames so we can randomly populate the test tables.
-- The NULL's are simply part of the possible things that could be a hobby.
SELECT HobbyNameID = IDENTITY( INT,1,1),
HobbyName = CAST(d.HobbyName AS VARCHAR(15))
INTO #HobbyName
FROM (
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT 'BaseBall'
UNION ALL
SELECT 'Basketball'
UNION ALL
SELECT 'Tennis'
UNION ALL
SELECT 'FootBall'
UNION ALL
SELECT 'Hockey'
UNION ALL
SELECT 'Rugby'
UNION ALL
SELECT 'Golf'
UNION ALL
SELECT 'Bowling'
) d ( HobbyName ) ;
ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;
--===== Create the "first" table and populate it on the fly
INSERT INTO dbo.hobby1
(
hobby1,
hobby2
)
SELECT TOP ( @RowsInFirstTable )
Hobby1 = (
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
),
Hobby2 = (
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
)
FROM sys.all_columns ac1,
sys.all_columns ac2 ;
--===== Create the "second" table and populate it on the fly
INSERT INTO dbo.hobby2
(
hobby1,
hobby2
)
SELECT TOP ( @RowsInSecondTable )
(
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
),
(
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
)
FROM sys.all_columns ac1,
sys.all_columns ac2 ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
APPLY method'
SELECT 'APPLY method'
SET STATISTICS TIME ON ;
-- now transform it by moving all nulls to the end without RBAR
SELECT t.id,
c1toc4 AS newc1,
newc2,
newc3,
newc4
FROM (
SELECT h1.id AS id,
h1.hobby1 AS c1,
h1.hobby2 AS c2,
h2.hobby1 AS c3,
h2.hobby2 AS c4
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t -- first CROSS APPLY is just to provide shortcuts for the
-- various COALESCE statements
CROSS APPLY (
SELECT c1toc4 = COALESCE(c1, c2, c3, c4),
c2toc4 = COALESCE(c2, c3, c4),
c3toc4 = COALESCE(c3, c4)
) cx
-- second CROSS APPLY figures out what goes in the second column
-- we know the 1st column is always c1toc4 - i.e. the first non-null
-- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4
-- if it matches either or those we need to either take column 4 or
-- null (if col4 matches c1toc4 we have already used it for new column 1)
CROSS APPLY (
SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,
c1toc4), c3toc4), c1toc4), NULLIF(c4, c1toc4))
) cx2
-- and this is for the third column
CROSS APPLY (
SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4, newc2), c1toc4),
NULLIF(NULLIF(c4, newc2), c1toc4))
) cx3
-- and this is for the fourth column
CROSS APPLY (
SELECT newc4 = NULLIF(NULLIF(NULLIF(c4, newc3), newc2), c1toc4)
) cx4 ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
String manipulation'
SELECT 'String manipulation'
SET STATISTICS TIME ON ;
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)),
RTRIM(SUBSTRING(t.s, 37, 36)),
RTRIM(SUBSTRING(t.s, 73, 36)),
RTRIM(SUBSTRING(t.s, 109, 36))
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36
- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2
+ SPACE(36- LEN(h2.hobby2)),'') s
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Unpivot/Pivot'
SELECT 'Unpivot/Pivot'
SET STATISTICS TIME ON ;
WITH OneTableCTE
AS (
SELECT a.id,
a.hobby1 AS 'h1',
a.hobby2 AS 'h2',
b.hobby1 AS 'h3',
b.hobby2 AS 'h4'
FROM hobby1 a
INNER JOIN hobby2 b ON a.id = b.id
),
UnPivotCTE
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )
AS fixedorder
FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )
AS Normalized
)
SELECT OneTableCTE.id,
[1],
[2],
[3],
[4]
FROM (
SELECT id,
hobby,
fixedorder
FROM UnPivotCTE
) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )
AS PivotTable
RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Union All/Cross tab'
SELECT 'Union All/Cross tab'
SET STATISTICS TIME ON ;
WITH cte(id, hobby, hobby_num)
AS (
SELECT h.id,
h.hobby1,
1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby1,
3
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
4
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
),
cte2(id, hobby, row_num)
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)
FROM cte
WHERE hobby IS NOT NULL
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
FROM cte2
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Union/Cross tab'
SELECT 'Union/Cross tab'
SET STATISTICS TIME ON ;
WITH cte(id, hobby)
AS (
SELECT h.id,
h.hobby1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby1
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby2
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
),
cte2(id, hobby, row_num)
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id)
FROM cte
WHERE hobby IS NOT NULL
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
FROM cte2
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Parsename'
SELECT 'Parsename'
SET STATISTICS TIME ON ;
SELECT h1.id,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Jeff''s Rendition'
SELECT 'Jeff''s Rendition'
SET STATISTICS TIME ON ;
--=====================================================================================================================
-- Now, solve the problem. The details are, of course, in the comments.
-- This method only "dips" each table once.
--=====================================================================================================================
--=====================================================================================================================
-- Now, solve the problem. The details are, of course, in the comments.
-- This method only "dips" each table once.
--=====================================================================================================================
WITH
cteEnumerate AS
( --=== This CTE creates an EAV stacked Table removing dupes and NULLs in the process.
-- Then it numbers (enumerates) the item's "column numbers" for reassembly in the outer SELECT coming up.
-- As a pleasent side effect, the HobbyNames are sorted in alphabetical order.
-- This step is a form of "pre-aggregation"
SELECT ColNum = ROW_NUMBER() OVER (PARTITION BY eav.HobbyID ORDER BY eav.HobbyName),
eav.HobbyID,
eav.HobbyName
FROM ( --=== This does the EAV "stacking", removal of dupes, removal of NULLs as combined info from both tables
SELECT --===== Unpivot the "first" hobby table removing nulls and dupes
DISTINCT
HobbyID = h1.id,
HobbyName = upvt.HobbyName
FROM dbo.hobby1 h1
CROSS APPLY ( --=== This does a high speed "Unpivot" of the first table
SELECT hobby1 UNION ALL
SELECT hobby2
) upvt (HobbyName)
WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK
UNION -- We use union to get rid of dupes between tables ----------------------------------------------------
SELECT --===== Unpivot the "second" hobby table removing nulls and dupes
DISTINCT
HobbyID = h2.id,
HobbyName = upvt.HobbyName
FROM dbo.hobby2 h2
CROSS APPLY ( --=== This does a high speed "Unpivot" of the second table
SELECT hobby1 UNION ALL
SELECT hobby2
) upvt (HobbyName)
WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK
) eav
) --=== Now, we reassemble the data using a high-speed cross tab.
-- For convenience of anticipated queries, we also return the number of hobbies for each "HobbyID"
SELECT HobbyID,
Hobby1 = MAX(CASE WHEN ColNum = 1 THEN HobbyName ELSE '' END),
Hobby2 = MAX(CASE WHEN ColNum = 2 THEN HobbyName ELSE '' END),
Hobby3 = MAX(CASE WHEN ColNum = 3 THEN HobbyName ELSE '' END),
Hobby4 = MAX(CASE WHEN ColNum = 4 THEN HobbyName ELSE '' END),
HobbyCount = COUNT(*)
FROM cteEnumerate
GROUP BY HobbyID
;
SET STATISTICS TIME OFF ;
GO
And here is a typical set of results on my machine after checking about a dozen or so runs:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
APPLY method
SQL Server Execution Times:
CPU time = 17799 ms, elapsed time = 18257 ms.
---------------------------------------------------------------------------
String manipulation
SQL Server Execution Times:
CPU time = 921 ms, elapsed time = 2009 ms.
---------------------------------------------------------------------------
Unpivot/Pivot
SQL Server Execution Times:
CPU time = 1546 ms, elapsed time = 2138 ms.
---------------------------------------------------------------------------
Union All/Cross tab
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 796 ms, elapsed time = 1804 ms.
---------------------------------------------------------------------------
Union/Cross tab
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 780 ms, elapsed time = 1967 ms.
---------------------------------------------------------------------------
Parsename
SQL Server Execution Times:
CPU time = 1466 ms, elapsed time = 1968 ms.
---------------------------------------------------------------------------
Jeff's Rendition
SQL Server Execution Times:
CPU time = 1435 ms, elapsed time = 2181 ms.
Edit 7/8/2011 7:20 Mtn: corrected property of Jeff's rendition, it "cannot be ordered by column name"
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 8, 2011 at 1:25 am
Wow! that shows how slow my isnull/nullif/apply method is!
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
July 8, 2011 at 1:51 am
A question which i hope someone might be able to answer about the "Union/Cross tab" solution.
Okay so lets say we have a result looking like 1, "Tennis", Null, "Golf", "Bowling".
This first CTE will convert this to
1 "Tennis"
1 null
1 "Golf"
1 "Bowling"
The second CTE adds a rownumber to each row order by id. So it should create
1, 1, "Tennis"
1, 2, null
1, 3, Golf"
1, 4, "Bowling"
But how can we be sure it actually does that? There is no order by that forces it to come in that order. Is there a feature to CTEs so that they always come in a certain way without the order by?
Have worked like 30min with CTEs (still stuck with 2000) so all very new 🙂
/T
July 8, 2011 at 6:31 am
Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?
1) I don't want duplicates in the results
2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2
If any hobbies are null, the next non-null hobby should be in its place.
In the case of the sample you provided, it would be
tennis, cricket, tennis, null
I can't remember if your XML-based solution met both of these requirements.
Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?
Mike Scalise, PMP
https://www.michaelscalise.com
July 8, 2011 at 6:45 am
Modding my code so it should do what you now want.
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)),
RTRIM(SUBSTRING(t.s, 37, 36)),
RTRIM(SUBSTRING(t.s, 73, 36)),
RTRIM(SUBSTRING(t.s, 109, 36))
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
case when h1.hobby2 in (h1.hobby1) then '' else ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') end +
case when h2.hobby1 in (h1.hobby1, h1.hobby2) then '' else ISNULL(h2.hobby1 + SPACE(36 - LEN(h2.hobby1)),'') end +
case when h2.hobby2 in (h1.hobby1, h1.hobby2, h2.hobby1) then '' else ISNULL(h2.hobby2 + SPACE(36 - LEN(h2.hobby2)),'') end s
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
Think it works atleast 😀
/T
July 8, 2011 at 6:48 am
I think you misunderstood the question, Mike. They are asking if the same hobby (Tennis in the example) appears twice in the four possible hobby records whether you want one of them eliminated.
July 8, 2011 at 6:52 am
mikes84 (7/8/2011)
Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?
1) I don't want duplicates in the results
2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2
If any hobbies are null, the next non-null hobby should be in its place.
In the case of the sample you provided, it would be
tennis, cricket, tennis, null
I can't remember if your XML-based solution met both of these requirements.
Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?
From your description I would think you wanted:
tennis, cricket, null, null
Please confirm.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 8, 2011 at 7:23 am
tommyh (7/8/2011)
A question which i hope someone might be able to answer about the "Union/Cross tab" solution.Okay so lets say we have a result looking like 1, "Tennis", Null, "Golf", "Bowling".
This first CTE will convert this to
1 "Tennis"
1 null
1 "Golf"
1 "Bowling"
The second CTE adds a rownumber to each row order by id. So it should create
1, 1, "Tennis"
1, 2, null
1, 3, Golf"
1, 4, "Bowling"
But how can we be sure it actually does that? There is no order by that forces it to come in that order. Is there a feature to CTEs so that they always come in a certain way without the order by?
We can't 😀 You cannot guarantee the order inside a derived table or CTE <edit>unless you use TOP but that does not apply in this problem case</edit>. Only the outermost query can ask for and be guaranteed ordering. That's why the "Union/Crosstab" method has the property "cannot be ordered by column name". The "Union All/Crosstab" assigns a value to each row based on the column it came from and it uses that for later pivoting so it can guarantee column ordering.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 8, 2011 at 7:38 am
Jeff Moden (7/7/2011)
For what it's worth, here's my rendition of how this problem might be solved. It doesn't preserve dupes and it sorts the HobbyNames on each row alphabetically. As always, details are in the code. As a sidebar, it only "dips" each table once and there is no join. I also did my own form of unpivot simply because I don't care for the UNPIVOT syntax. 😀 I'll let Orlando test it on his machine rather than make any performance claims. (Orlando, the test data above is set for 100,000 rows... change it {just a couple of variable entries} to whatever you were using for the other's code. Thanks)
Cool stuff, Jeff!
So do you ever use PIVOT and/or UNPIVOT, or are workarounds almost always better/faster?
July 8, 2011 at 9:30 am
Here is the code including the latest string manipulation approach from tommyh and a variation of the "Union All/Crosstab" approach I put together, both of which remove dups yet still allow for a column order:
SET STATISTICS TIME OFF ;
GO
SET NOCOUNT ON ;
GO
PRINT REPLICATE('-', 75) + '
build temp tables'
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby1')
AND type IN ( N'U' ) )
DROP TABLE hobby1 ;
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby2')
AND type IN ( N'U' ) )
DROP TABLE hobby2 ;
CREATE TABLE hobby1
(
id INT IDENTITY(1, 1) NOT NULL,
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
CREATE TABLE hobby2
(
id INT IDENTITY(1, 1) NOT NULL,
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
GO
--===== Add PK's to both tables
ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;
ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;
GO
--============================================================================
-- Create and populate the test tables. This is NOT a part of the solution.
--============================================================================
--===== Declare and preset some obviously named local variables
DECLARE @RowsInFirstTable INT,
@RowsInSecondTable INT ;
SELECT @RowsInFirstTable = 200000,
@RowsInSecondTable = 200000 ;
IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL
DROP TABLE #HobbyName;
--===== Create a list of HobbyNames so we can randomly populate the test tables.
-- The NULL's are simply part of the possible things that could be a hobby.
SELECT HobbyNameID = IDENTITY( INT,1,1),
HobbyName = CAST(d.HobbyName AS VARCHAR(15))
INTO #HobbyName
FROM (
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT 'BaseBall'
UNION ALL
SELECT 'Basketball'
UNION ALL
SELECT 'Tennis'
UNION ALL
SELECT 'FootBall'
UNION ALL
SELECT 'Hockey'
UNION ALL
SELECT 'Rugby'
UNION ALL
SELECT 'Golf'
UNION ALL
SELECT 'Bowling'
) d ( HobbyName ) ;
ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;
--===== Create the "first" table and populate it on the fly
INSERT INTO dbo.hobby1
(
hobby1,
hobby2
)
SELECT TOP ( @RowsInFirstTable )
Hobby1 = (
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
),
Hobby2 = (
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
)
FROM sys.all_columns ac1,
sys.all_columns ac2 ;
--===== Create the "second" table and populate it on the fly
INSERT INTO dbo.hobby2
(
hobby1,
hobby2
)
SELECT TOP ( @RowsInSecondTable )
(
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
),
(
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
)
FROM sys.all_columns ac1,
sys.all_columns ac2 ;
GO
------------------------------------------------------------------------------------
/*
PRINT REPLICATE('-', 75) + '
APPLY method (no dups, ordered by column)'
SELECT 'APPLY method (no dups, ordered by column)'
SET STATISTICS TIME ON ;
-- now transform it by moving all nulls to the end without RBAR
SELECT t.id,
c1toc4 AS newc1,
newc2,
newc3,
newc4
FROM (
SELECT h1.id AS id,
h1.hobby1 AS c1,
h1.hobby2 AS c2,
h2.hobby1 AS c3,
h2.hobby2 AS c4
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t -- first CROSS APPLY is just to provide shortcuts for the
-- various COALESCE statements
CROSS APPLY (
SELECT c1toc4 = COALESCE(c1, c2, c3, c4),
c2toc4 = COALESCE(c2, c3, c4),
c3toc4 = COALESCE(c3, c4)
) cx
-- second CROSS APPLY figures out what goes in the second column
-- we know the 1st column is always c1toc4 - i.e. the first non-null
-- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4
-- if it matches either or those we need to either take column 4 or
-- null (if col4 matches c1toc4 we have already used it for new column 1)
CROSS APPLY (
SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,
c1toc4), c3toc4), c1toc4), NULLIF(c4, c1toc4))
) cx2
-- and this is for the third column
CROSS APPLY (
SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4, newc2), c1toc4),
NULLIF(NULLIF(c4, newc2), c1toc4))
) cx3
-- and this is for the fourth column
CROSS APPLY (
SELECT newc4 = NULLIF(NULLIF(NULLIF(c4, newc3), newc2), c1toc4)
) cx4 ;
SET STATISTICS TIME OFF ;
GO
*/
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
String manipulation (dups, ordered by column)'
SELECT 'String manipulation (dups, ordered by column)'
SET STATISTICS TIME ON ;
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)),
RTRIM(SUBSTRING(t.s, 37, 36)),
RTRIM(SUBSTRING(t.s, 73, 36)),
RTRIM(SUBSTRING(t.s, 109, 36))
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36
- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2
+ SPACE(36- LEN(h2.hobby2)),'') s
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
String manipulation (no dups, ordered by column)'
SELECT 'String manipulation (no dups, ordered by column)'
SET STATISTICS TIME ON ;
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)),
RTRIM(SUBSTRING(t.s, 37, 36)),
RTRIM(SUBSTRING(t.s, 73, 36)),
RTRIM(SUBSTRING(t.s, 109, 36))
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
case when h1.hobby2 in (h1.hobby1) then '' else ISNULL(h1.hobby2 +
SPACE(36 - LEN(h1.hobby2)), '') end +
case when h2.hobby1 in (h1.hobby1, h1.hobby2) then '' else ISNULL(h2.hobby1 +
SPACE(36 - LEN(h2.hobby1)),'') end +
case when h2.hobby2 in (h1.hobby1, h1.hobby2, h2.hobby1) then '' else
ISNULL(h2.hobby2 + SPACE(36 - LEN(h2.hobby2)),'') end s
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Unpivot/Pivot (dups, ordered by column)'
SELECT 'Unpivot/Pivot (dups, ordered by column)'
SET STATISTICS TIME ON ;
WITH OneTableCTE
AS (
SELECT a.id,
a.hobby1 AS 'h1',
a.hobby2 AS 'h2',
b.hobby1 AS 'h3',
b.hobby2 AS 'h4'
FROM hobby1 a
INNER JOIN hobby2 b ON a.id = b.id
),
UnPivotCTE
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )
AS fixedorder
FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )
AS Normalized
)
SELECT OneTableCTE.id,
[1],
[2],
[3],
[4]
FROM (
SELECT id,
hobby,
fixedorder
FROM UnPivotCTE
) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )
AS PivotTable
RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Union All/Cross tab (dups, ordered by column)'
SELECT 'Union All/Cross tab (dups, ordered by column)'
SET STATISTICS TIME ON ;
WITH cte(id, hobby, hobby_num)
AS (
SELECT h.id,
h.hobby1,
1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby1,
3
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
4
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
),
cte2(id, hobby, row_num)
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)
FROM cte
WHERE hobby IS NOT NULL
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
FROM cte2
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Union All/Cross tab (no dups, ordered by column)'
SELECT 'Union All/Cross tab (no dups, ordered by column)'
SET STATISTICS TIME ON ;
WITH cte ( id, hobby, hobby_num )
AS (
SELECT h.id,
h.hobby1,
1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
AND h.hobby1 IS NOT NULL
UNION ALL
SELECT h.id,
h.hobby2,
2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
AND h.hobby2 IS NOT NULL
UNION ALL
SELECT h.id,
h.hobby1,
3
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
AND h.hobby1 IS NOT NULL
UNION ALL
SELECT h.id,
h.hobby2,
4
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
AND h.hobby2 IS NOT NULL
),
cte2 ( id, hobby, hobby_num, hobby_partition_num )
AS (
-- partition so we can omit dups later
SELECT id,
hobby,
hobby_num,
ROW_NUMBER() OVER ( PARTITION BY id, hobby ORDER BY hobby, hobby_num )
FROM cte
),
cte3 ( id, hobby, row_num )
AS (
-- remove dups
SELECT id,
hobby,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobby_num )
FROM cte2
WHERE hobby_partition_num = 1
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
FROM cte3
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Union/Cross tab (no dups, not ordered by column)'
SELECT 'Union/Cross tab (no dups, not ordered by column)'
SET STATISTICS TIME ON ;
WITH cte(id, hobby)
AS (
SELECT h.id,
h.hobby1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby1
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
UNION
SELECT h.id,
h.hobby2
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
),
cte2(id, hobby, row_num)
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id)
FROM cte
WHERE hobby IS NOT NULL
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
FROM cte2
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Parsename (dups, ordered by column)'
SELECT 'Parsename (dups, ordered by column)'
SET STATISTICS TIME ON ;
SELECT h1.id,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Jeff''s Rendition (no dups, not ordered by column)'
SELECT 'Jeff''s Rendition (no dups, not ordered by column)'
SET STATISTICS TIME ON ;
--======================================================================
-- Now, solve the problem. The details are, of course, in the comments.
-- This method only "dips" each table once.
--======================================================================
--======================================================================
-- Now, solve the problem. The details are, of course, in the comments.
-- This method only "dips" each table once.
--======================================================================
WITH
cteEnumerate AS
( --=== This CTE creates an EAV stacked Table removing dupes and NULLs in the process.
-- Then it numbers (enumerates) the item's "column numbers" for reassembly in the
-- outer SELECT coming up.
-- As a pleasent side effect, the HobbyNames are sorted in alphabetical order.
-- This step is a form of "pre-aggregation"
SELECT ColNum = ROW_NUMBER() OVER (PARTITION BY eav.HobbyID ORDER BY eav.HobbyName),
eav.HobbyID,
eav.HobbyName
FROM (
--=== This does the EAV "stacking", removal of dupes, removal of NULLs as combined info
-- from both tables
SELECT --===== Unpivot the "first" hobby table removing nulls and dupes
DISTINCT
HobbyID = h1.id,
HobbyName = upvt.HobbyName
FROM dbo.hobby1 h1
CROSS APPLY ( --=== This does a high speed "Unpivot" of the first table
SELECT hobby1 UNION ALL
SELECT hobby2
) upvt (HobbyName)
WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK
UNION -- We use union to get rid of dupes between tables -----------------------------
SELECT --===== Unpivot the "second" hobby table removing nulls and dupes
DISTINCT
HobbyID = h2.id,
HobbyName = upvt.HobbyName
FROM dbo.hobby2 h2
CROSS APPLY ( --=== This does a high speed "Unpivot" of the second table
SELECT hobby1 UNION ALL
SELECT hobby2
) upvt (HobbyName)
WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK
) eav
) --=== Now, we reassemble the data using a high-speed cross tab.
-- For convenience of anticipated queries, we also return the number of hobbies for each "HobbyID"
SELECT HobbyID,
Hobby1 = MAX(CASE WHEN ColNum = 1 THEN HobbyName ELSE '' END),
Hobby2 = MAX(CASE WHEN ColNum = 2 THEN HobbyName ELSE '' END),
Hobby3 = MAX(CASE WHEN ColNum = 3 THEN HobbyName ELSE '' END),
Hobby4 = MAX(CASE WHEN ColNum = 4 THEN HobbyName ELSE '' END),
HobbyCount = COUNT(*)
FROM cteEnumerate
GROUP BY HobbyID
;
SET STATISTICS TIME OFF ;
GO
Typical results when processing 100,000 rows in each table:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 891 ms, elapsed time = 1875 ms.
---------------------------------------------------------------------------
String manipulation (no dups, ordered by column)
SQL Server Execution Times:
CPU time = 1031 ms, elapsed time = 1506 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1218 ms, elapsed time = 1641 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 625 ms, elapsed time = 1618 ms.
---------------------------------------------------------------------------
Union All/Cross tab (no dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1000 ms, elapsed time = 2384 ms.
---------------------------------------------------------------------------
Union/Cross tab (no dups, not ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 610 ms, elapsed time = 1582 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1062 ms, elapsed time = 1513 ms.
---------------------------------------------------------------------------
Jeff's Rendition (no dups, not ordered by column)
SQL Server Execution Times:
CPU time = 1156 ms, elapsed time = 1785 ms.
Typical results when processing 200,000 rows in each table:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1485 ms, elapsed time = 2860 ms.
---------------------------------------------------------------------------
String manipulation (no dups, ordered by column)
SQL Server Execution Times:
CPU time = 2125 ms, elapsed time = 3133 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2157 ms, elapsed time = 3166 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1281 ms, elapsed time = 2864 ms.
---------------------------------------------------------------------------
Union All/Cross tab (no dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 4827 ms.
---------------------------------------------------------------------------
Union/Cross tab (no dups, not ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1250 ms, elapsed time = 2863 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2141 ms, elapsed time = 2931 ms.
---------------------------------------------------------------------------
Jeff's Rendition (no dups, not ordered by column)
SQL Server Execution Times:
CPU time = 2297 ms, elapsed time = 3598 ms.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 15 posts - 31 through 45 (of 92 total)
You must be logged in to reply to this topic. Login to reply