July 8, 2011 at 9:57 am
opc.three (7/8/2011)
mikes84 (7/8/2011)
Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?
1) I don't want duplicates in the results
2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2
If any hobbies are null, the next non-null hobby should be in its place.
In the case of the sample you provided, it would be
tennis, cricket, tennis, null
I can't remember if your XML-based solution met both of these requirements.
Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?
From your description I would think you wanted:
tennis, cricket, null, null
Please confirm.
PS I forgot to mention...you can ditch my XML solution for this problem case. I only threw it out there because I thought that making use of the // notation in XQuery was an interesting way to simulate the T-SQL UNION ALL but it would take someone with a lot more knowledge about XQuery than I to make that method viable...that thing will chew through CPU cycles as if it were it's only job in life the way it sits now 😀
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 8, 2011 at 10:02 am
tommyh (7/8/2011)
Modding my code so it should do what you now want.
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)),
RTRIM(SUBSTRING(t.s, 37, 36)),
RTRIM(SUBSTRING(t.s, 73, 36)),
RTRIM(SUBSTRING(t.s, 109, 36))
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
case when h1.hobby2 in (h1.hobby1) then '' else ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') end +
case when h2.hobby1 in (h1.hobby1, h1.hobby2) then '' else ISNULL(h2.hobby1 + SPACE(36 - LEN(h2.hobby1)),'') end +
case when h2.hobby2 in (h1.hobby1, h1.hobby2, h2.hobby1) then '' else ISNULL(h2.hobby2 + SPACE(36 - LEN(h2.hobby2)),'') end s
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
Think it works atleast 😀
/T
Thanks, tommy.
Mike Scalise, PMP
https://www.michaelscalise.com
July 8, 2011 at 10:03 am
opc.three (7/8/2011)
mikes84 (7/8/2011)
Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?
1) I don't want duplicates in the results
2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2
If any hobbies are null, the next non-null hobby should be in its place.
In the case of the sample you provided, it would be
tennis, cricket, tennis, null
I can't remember if your XML-based solution met both of these requirements.
Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?
From your description I would think you wanted:
tennis, cricket, null, null
Please confirm.
I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.
Mike Scalise, PMP
https://www.michaelscalise.com
July 8, 2011 at 10:24 am
mikes84 (7/8/2011)
...truncated quotes of quotes, etc....
I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.
Thanks for confirming. If you said you did not want dups I was going to ask why you provided the PARSENAME solution 😀
Here are the solutions presented so far that would get you there:
SET STATISTICS TIME OFF ;
GO
SET NOCOUNT ON ;
GO
PRINT REPLICATE('-', 75) + '
build temp tables'
SELECT 'build temp tables'
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby1')
AND type IN ( N'U' ) )
DROP TABLE hobby1 ;
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby2')
AND type IN ( N'U' ) )
DROP TABLE hobby2 ;
CREATE TABLE hobby1
(
id INT IDENTITY(1, 1) NOT NULL,
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
CREATE TABLE hobby2
(
id INT IDENTITY(1, 1) NOT NULL,
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
GO
--===== Add PK's to both tables
ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;
ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;
GO
--=====================================================================================================================
-- Create and populate the test tables. This is NOT a part of the solution.
--=====================================================================================================================
--===== Declare and preset some obviously named local variables
DECLARE @RowsInFirstTable INT,
@RowsInSecondTable INT ;
SELECT @RowsInFirstTable = 200000,
@RowsInSecondTable = 200000 ;
IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL
DROP TABLE #HobbyName;
--===== Create a list of HobbyNames so we can randomly populate the test tables.
-- The NULL's are simply part of the possible things that could be a hobby.
SELECT HobbyNameID = IDENTITY( INT,1,1),
HobbyName = CAST(d.HobbyName AS VARCHAR(15))
INTO #HobbyName
FROM (
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT 'BaseBall'
UNION ALL
SELECT 'Basketball'
UNION ALL
SELECT 'Tennis'
UNION ALL
SELECT 'FootBall'
UNION ALL
SELECT 'Hockey'
UNION ALL
SELECT 'Rugby'
UNION ALL
SELECT 'Golf'
UNION ALL
SELECT 'Bowling'
) d ( HobbyName ) ;
ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;
--===== Create the "first" table and populate it on the fly
INSERT INTO dbo.hobby1
(
hobby1,
hobby2
)
SELECT TOP ( @RowsInFirstTable )
Hobby1 = (
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
),
Hobby2 = (
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
)
FROM sys.all_columns ac1,
sys.all_columns ac2 ;
--===== Create the "second" table and populate it on the fly
INSERT INTO dbo.hobby2
(
hobby1,
hobby2
)
SELECT TOP ( @RowsInSecondTable )
(
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
),
(
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
)
FROM sys.all_columns ac1,
sys.all_columns ac2 ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
String manipulation (dups, ordered by column)'
SELECT 'String manipulation (dups, ordered by column)'
SET STATISTICS TIME ON ;
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)),
RTRIM(SUBSTRING(t.s, 37, 36)),
RTRIM(SUBSTRING(t.s, 73, 36)),
RTRIM(SUBSTRING(t.s, 109, 36))
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36
- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2
+ SPACE(36- LEN(h2.hobby2)),'') s
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Unpivot/Pivot (dups, ordered by column)'
SELECT 'Unpivot/Pivot (dups, ordered by column)'
SET STATISTICS TIME ON ;
WITH OneTableCTE
AS (
SELECT a.id,
a.hobby1 AS 'h1',
a.hobby2 AS 'h2',
b.hobby1 AS 'h3',
b.hobby2 AS 'h4'
FROM hobby1 a
INNER JOIN hobby2 b ON a.id = b.id
),
UnPivotCTE
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )
AS fixedorder
FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )
AS Normalized
)
SELECT OneTableCTE.id,
[1],
[2],
[3],
[4]
FROM (
SELECT id,
hobby,
fixedorder
FROM UnPivotCTE
) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )
AS PivotTable
RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Union All/Cross tab (dups, ordered by column)'
SELECT 'Union All/Cross tab (dups, ordered by column)'
SET STATISTICS TIME ON ;
WITH cte(id, hobby, hobby_num)
AS (
SELECT h.id,
h.hobby1,
1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby1,
3
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
4
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
),
cte2(id, hobby, row_num)
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)
FROM cte
WHERE hobby IS NOT NULL
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
FROM cte2
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
PRINT REPLICATE('-', 75) + '
Parsename (dups, ordered by column)'
SELECT 'Parsename (dups, ordered by column)'
SET STATISTICS TIME ON ;
SELECT h1.id,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id ;
SET STATISTICS TIME OFF ;
GO
And some typical results on my machine for 100K rows:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 719 ms, elapsed time = 1717 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1188 ms, elapsed time = 1601 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 546 ms, elapsed time = 1571 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1172 ms, elapsed time = 1507 ms.
And 200K:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1625 ms, elapsed time = 2917 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2189 ms, elapsed time = 3343 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1141 ms, elapsed time = 2932 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2109 ms, elapsed time = 2973 ms.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 8, 2011 at 10:26 am
I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.
Do you want to show 4 nulls if there is a record but no hobbies?
If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).
One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.
July 8, 2011 at 10:37 am
Nevyn (7/8/2011)
I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.
Do you want to show 4 nulls if there is a record but no hobbies?
If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).
One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.
I did not see a measurable performance difference when changing the JOIN type in the "Unpivot/Pivot". You can toggle whether to return rows with all NULL hobbies in the "Union All/Crosstab" by commenting out "WHERE hobby IS NOT NULL" in cte2. I did not see any noticeable difference in performance when toggling that option either.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 8, 2011 at 11:06 am
opc, thanks for laying all of that out.
Nevyn, yes, I'd like to see all NULLs if wherever there aren't hobbies (even if there's 0/4 hobbies populated I'd like to see 4 NULLs).
Good catch on the other possibility -- however, there will never be a record in one table and not in the other. At the very least, there will be a record with 2 NULL fields in each table.
Mike Scalise, PMP
https://www.michaelscalise.com
July 8, 2011 at 11:17 am
It strikes me that the elapsed time of the tests is limited by the time it takes to display the results - consider using SELECT INTO to write the results to a temporary table.
For fun, I played with the string solution a bit to generate a parallel plan. On my machine, the following code runs in 700ms for 200,000 rows (further improvements may be possible, I didn't touch Tommy's core algorithm).
SELECT
t.id,
oa.h1,
oa.h2,
oa.h3,
oa.h4
INTO #dummy
FROM
(
SELECT
h.id,
h1h1 = h.hobby1,
h1h2 = h.hobby2,
h2h1 = h2.hobby1,
h2h2 = h2.hobby2
FROM dbo.hobby1 AS h
JOIN dbo.hobby2 AS h2 ON
h2.id = h.id
) AS x
CROSS APPLY
(
SELECT
x.id,
s =
ISNULL(x.h1h1 + SPACE(36 - LEN(x.h1h1)), SPACE(0)) +
CASE
WHEN x.h1h2 = x.h1h1 THEN SPACE(0)
ELSE ISNULL(x.h1h2 + SPACE(36 - LEN(x.h1h2)), SPACE(0))
END +
CASE
WHEN x.h2h1 IN (x.h1h1, x.h1h2) THEN SPACE(0)
ELSE ISNULL(x.h2h1 + SPACE(36 - LEN(x.h2h1)), SPACE(0))
END +
CASE
WHEN x.h2h2 IN (x.h1h1, x.h1h2, x.h2h1) THEN SPACE(0)
ELSE ISNULL(x.h2h2 + SPACE(36 - LEN(x.h2h2)), SPACE(0))
END
) AS t
OUTER APPLY
(
SELECT
h1 = RTRIM(SUBSTRING(t.s, 1, 36)),
h2 = RTRIM(SUBSTRING(t.s, 37, 36)),
h3 = RTRIM(SUBSTRING(t.s, 73, 36)),
h4 = RTRIM(SUBSTRING(t.s, 109, 36))
) AS oa
OPTION (HASH JOIN, LOOP JOIN);
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 8, 2011 at 11:47 am
I love this site 😀
SQLkiwi (7/8/2011)
It strikes me that the elapsed time of the tests is limited by the time it takes to display the results -
That's why I typically ignore elapsed time. CPU clock ticks is what always see considered the key indicator to go by.
consider using SELECT INTO to write the results to a temporary table.
For fun, I played with the string solution a bit to generate a parallel plan. On my machine, the following code runs in 700ms for 200,000 rows (further improvements may be possible, I didn't touch Tommy's core algorithm).
It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments below 😉
I also added the "INTO #dummy" to each solution and here are the results:
SET STATISTICS TIME OFF ;
GO
SET NOCOUNT ON ;
GO
PRINT REPLICATE('-', 75) + '
build temp tables'
SELECT 'build temp tables'
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby1')
AND type IN ( N'U' ) )
DROP TABLE hobby1 ;
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'hobby2')
AND type IN ( N'U' ) )
DROP TABLE hobby2 ;
CREATE TABLE hobby1
(
id INT IDENTITY(1, 1) NOT NULL,
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
CREATE TABLE hobby2
(
id INT IDENTITY(1, 1) NOT NULL,
hobby1 VARCHAR(36),
hobby2 VARCHAR(36)
) ;
GO
--===== Add PK's to both tables
ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;
ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;
GO
--=====================================================================================================================
-- Create and populate the test tables. This is NOT a part of the solution.
--=====================================================================================================================
--===== Declare and preset some obviously named local variables
DECLARE @RowsInFirstTable INT,
@RowsInSecondTable INT ;
SELECT @RowsInFirstTable = 200000,
@RowsInSecondTable = 200000 ;
IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL
DROP TABLE #HobbyName;
--===== Create a list of HobbyNames so we can randomly populate the test tables.
-- The NULL's are simply part of the possible things that could be a hobby.
SELECT HobbyNameID = IDENTITY( INT,1,1),
HobbyName = CAST(d.HobbyName AS VARCHAR(15))
INTO #HobbyName
FROM (
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT 'BaseBall'
UNION ALL
SELECT 'Basketball'
UNION ALL
SELECT 'Tennis'
UNION ALL
SELECT 'FootBall'
UNION ALL
SELECT 'Hockey'
UNION ALL
SELECT 'Rugby'
UNION ALL
SELECT 'Golf'
UNION ALL
SELECT 'Bowling'
) d ( HobbyName ) ;
ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;
--===== Create the "first" table and populate it on the fly
INSERT INTO dbo.hobby1
(
hobby1,
hobby2
)
SELECT TOP ( @RowsInFirstTable )
Hobby1 = (
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
),
Hobby2 = (
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
)
FROM sys.all_columns ac1,
sys.all_columns ac2 ;
--===== Create the "second" table and populate it on the fly
INSERT INTO dbo.hobby2
(
hobby1,
hobby2
)
SELECT TOP ( @RowsInSecondTable )
(
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
),
(
SELECT TOP 1
HobbyName
FROM #HobbyName
WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1
)
FROM sys.all_columns ac1,
sys.all_columns ac2 ;
GO
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
IF OBJECT_ID(N'tempdb..#dummy') > 0
DROP TABLE #dummy;
GO
PRINT REPLICATE('-', 75) + '
String manipulation (dups, ordered by column)'
SELECT 'String manipulation (dups, ordered by column)'
SET STATISTICS TIME ON ;
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)) AS h1,
RTRIM(SUBSTRING(t.s, 37, 36)) AS h2,
RTRIM(SUBSTRING(t.s, 73, 36)) AS h3,
RTRIM(SUBSTRING(t.s, 109, 36)) AS h4
INTO #dummy
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36
- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2
+ SPACE(36- LEN(h2.hobby2)),'') s
FROM dbo.hobby1 h1
LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
IF OBJECT_ID(N'tempdb..#dummy') > 0
DROP TABLE #dummy;
GO
PRINT REPLICATE('-', 75) + '
Unpivot/Pivot (dups, ordered by column)'
SELECT 'Unpivot/Pivot (dups, ordered by column)'
SET STATISTICS TIME ON ;
WITH OneTableCTE
AS (
SELECT a.id,
a.hobby1 AS 'h1',
a.hobby2 AS 'h2',
b.hobby1 AS 'h3',
b.hobby2 AS 'h4'
FROM hobby1 a
LEFT JOIN hobby2 b ON a.id = b.id
),
UnPivotCTE
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )
AS fixedorder
FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )
AS Normalized
)
SELECT OneTableCTE.id,
[1],
[2],
[3],
[4]
INTO #dummy
FROM (
SELECT id,
hobby,
fixedorder
FROM UnPivotCTE
) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )
AS PivotTable
RIGHT JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
IF OBJECT_ID(N'tempdb..#dummy') > 0
DROP TABLE #dummy;
GO
PRINT REPLICATE('-', 75) + '
Union All/Cross tab (dups, ordered by column)'
SELECT 'Union All/Cross tab (dups, ordered by column)'
SET STATISTICS TIME ON ;
WITH cte(id, hobby, hobby_num)
AS (
SELECT h.id,
h.hobby1,
1
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
2
FROM dbo.hobby1 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby1,
3
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby1
WHERE id = h.id )
UNION ALL
SELECT h.id,
h.hobby2,
4
FROM dbo.hobby2 h
WHERE EXISTS ( SELECT *
FROM dbo.hobby2
WHERE id = h.id )
),
cte2(id, hobby, row_num)
AS (
SELECT id,
hobby,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)
FROM cte
--WHERE hobby IS NOT NULL
)
SELECT id,
MAX(CASE WHEN row_num = 1 THEN hobby
END) AS hobby1,
MAX(CASE WHEN row_num = 2 THEN hobby
END) AS hobby2,
MAX(CASE WHEN row_num = 3 THEN hobby
END) AS hobby3,
MAX(CASE WHEN row_num = 4 THEN hobby
END) AS hobby4
INTO #dummy
FROM cte2
GROUP BY id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
IF OBJECT_ID(N'tempdb..#dummy') > 0
DROP TABLE #dummy;
GO
PRINT REPLICATE('-', 75) + '
Parsename (dups, ordered by column)'
SELECT 'Parsename (dups, ordered by column)'
SET STATISTICS TIME ON ;
SELECT h1.id,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,
REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(
CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +
CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +
CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +
CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4
INTO #dummy
FROM dbo.hobby1 h1
LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id ;
SET STATISTICS TIME OFF ;
GO
------------------------------------------------------------------------------------
IF OBJECT_ID(N'tempdb..#dummy') > 0
DROP TABLE #dummy;
GO
PRINT REPLICATE('-', 75) + '
String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'
SELECT 'String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'
SET STATISTICS TIME ON ;
SELECT
t.id,
oa.h1,
oa.h2,
oa.h3,
oa.h4
INTO #dummy
FROM
(
SELECT
h.id,
h1h1 = h.hobby1,
h1h2 = h.hobby2,
h2h1 = h2.hobby1,
h2h2 = h2.hobby2
FROM dbo.hobby1 AS h
JOIN dbo.hobby2 AS h2 ON
h2.id = h.id
) AS x
CROSS APPLY
(
SELECT
x.id,
s =
ISNULL(x.h1h1 + SPACE(36 - LEN(x.h1h1)), SPACE(0)) +
ISNULL(x.h1h2 + SPACE(36 - LEN(x.h1h2)), SPACE(0)) +
ISNULL(x.h2h1 + SPACE(36 - LEN(x.h2h1)), SPACE(0)) +
ISNULL(x.h2h2 + SPACE(36 - LEN(x.h2h2)), SPACE(0))
) AS t
OUTER APPLY
(
SELECT
h1 = RTRIM(SUBSTRING(t.s, 1, 36)),
h2 = RTRIM(SUBSTRING(t.s, 37, 36)),
h3 = RTRIM(SUBSTRING(t.s, 73, 36)),
h4 = RTRIM(SUBSTRING(t.s, 109, 36))
) AS oa
OPTION (HASH JOIN, LOOP JOIN);
SET STATISTICS TIME OFF ;
GO
Results for 200K on my machine:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1359 ms, elapsed time = 1435 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2095 ms, elapsed time = 1642 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1344 ms, elapsed time = 1419 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 2174 ms.
---------------------------------------------------------------------------
String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups
SQL Server Execution Times:
CPU time = 2031 ms, elapsed time = 1132 ms.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 8, 2011 at 11:55 am
opc.three (7/8/2011)
It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments below
Yeah no worries - I just picked one of the faster routines from earlier, I haven't been following the discussion closely enough to know what the current requirement is - so thanks for the modification. Not sure I deserve much credit on the code (it's Tommy's) I just added parallelism.
Results for 200K on my machine
Nice. You need more cores though :laugh:
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 8, 2011 at 11:58 am
opc.three (7/8/2011)
Nevyn (7/8/2011)
I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.
Do you want to show 4 nulls if there is a record but no hobbies?
If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).
One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.
I did not see a measurable performance difference when changing the JOIN type in the "Unpivot/Pivot". You can toggle whether to return rows with all NULL hobbies in the "Union All/Crosstab" by commenting out "WHERE hobby IS NOT NULL" in cte2. I did not see any noticeable difference in performance when toggling that option either.
To be clear, its not changing join type, you can take the join right out (it joined back to the first CTE only to catch the all null rows.
But even then, my guess is no performance boost.
July 8, 2011 at 12:05 pm
SQLkiwi (7/8/2011)
opc.three (7/8/2011)
It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments belowYeah no worries - I just picked one of the faster routines from earlier, I haven't been following the discussion closely enough to know what the current requirement is - so thanks for the modification. Not sure I deserve much credit on the code (it's Tommy's) I just added parallelism.
Results for 200K on my machine
Nice. You need more cores though :laugh:
No argument there 😛
Re: the machine I'm running on...it depends on the day but ATM I'm running everything on a single proc XP machine with HT :sick:
Would you mind grabbing the whole chunk of code from my last post and posting the results from running it on your machine?
I just ran it on some multi-core server HW and got this for 200K rows:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2110 ms, elapsed time = 2116 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 3980 ms, elapsed time = 1306 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1844 ms, elapsed time = 1840 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 3016 ms, elapsed time = 3012 ms.
---------------------------------------------------------------------------
String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups
SQL Server Execution Times:
CPU time = 2859 ms, elapsed time = 453 ms.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 8, 2011 at 12:15 pm
opc.three (7/8/2011)
Would you mind grabbing the whole chunk of code from my last post and posting the results from running it on your machine?
16-core test machine:
---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1046 ms, elapsed time = 1138 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2994 ms, elapsed time = 849 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1294 ms, elapsed time = 1329 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1716 ms, elapsed time = 1792 ms.
---------------------------------------------------------------------------
String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups
SQL Server Execution Times:
CPU time = 2795 ms, elapsed time = 269 ms.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 8, 2011 at 12:39 pm
Nevyn (7/8/2011)
Cool stuff, Jeff!So do you ever use PIVOT and/or UNPIVOT, or are workarounds almost always better/faster?
I don't use PIVOT because Cross-Tabs are (usually) both faster and easier to read especially if multiple values need to be aggregated and then pivoted. Since Paul taught me the neat trick with Cross-Apply, I feel the same way about UNPIVOT even though the differences in performance are less pronounced. Heh... I also try to avoid words like "always" and "never". 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
July 8, 2011 at 12:46 pm
mikes84 (7/8/2011)
Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?
I suppose but let me ask... what do you want done with the following data since you also don't want dupes?
Table1
1,'BasketBall','Tennis'
Table2
1,'Tennis','BasketBall'
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 46 through 60 (of 92 total)
You must be logged in to reply to this topic. Login to reply