Handle NULLs

Question

Handle NULLs

Viewing 15 posts - 46 through 60 (of 93 total)

You must be logged in to reply to this topic. Login to reply

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 1

opc.three (7/8/2011)
mikes84 (7/8/2011)
Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?
1) I don't want duplicates in the results
2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2
If any hobbies are null, the next non-null hobby should be in its place.
In the case of the sample you provided, it would be
tennis, cricket, tennis, null
I can't remember if your XML-based solution met both of these requirements.
Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?
From your description I would think you wanted:
tennis, cricket, null, null
Please confirm.

PS I forgot to mention...you can ditch my XML solution for this problem case. I only threw it out there because I thought that making use of the // notation in XQuery was an interesting way to simulate the T-SQL UNION ALL but it would take someone with a lot more knowledge about XQuery than I to make that method viable...that thing will chew through CPU cycles as if it were it's only job in life the way it sits now 😀

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Mike Scalise SSCrazy Eights Points: 8145 More actions · Answer 2

tommyh (7/8/2011)
Modding my code so it should do what you now want.
SELECT t.id,
RTRIM(SUBSTRING(t.s, 1, 36)),
RTRIM(SUBSTRING(t.s, 37, 36)),
RTRIM(SUBSTRING(t.s, 73, 36)),
RTRIM(SUBSTRING(t.s, 109, 36))
FROM (
SELECT h1.id,
ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +
case when h1.hobby2 in (h1.hobby1) then '' else ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') end +
case when h2.hobby1 in (h1.hobby1, h1.hobby2) then '' else ISNULL(h2.hobby1 + SPACE(36 - LEN(h2.hobby1)),'') end +
case when h2.hobby2 in (h1.hobby1, h1.hobby2, h2.hobby1) then '' else ISNULL(h2.hobby2 + SPACE(36 - LEN(h2.hobby2)),'') end s
FROM dbo.hobby1 h1
JOIN dbo.hobby2 h2 ON h1.id = h2.id
) t ;
Think it works atleast 😀
/T

Thanks, tommy.

Mike Scalise, PMP
https://www.michaelscalise.com

Mike Scalise SSCrazy Eights Points: 8145 More actions · Answer 3

opc.three (7/8/2011)
mikes84 (7/8/2011)
Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?
1) I don't want duplicates in the results
2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2
If any hobbies are null, the next non-null hobby should be in its place.
In the case of the sample you provided, it would be
tennis, cricket, tennis, null
I can't remember if your XML-based solution met both of these requirements.
Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?
From your description I would think you wanted:
tennis, cricket, null, null
Please confirm.

I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.

Mike Scalise, PMP
https://www.michaelscalise.com

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 4

mikes84 (7/8/2011)
...truncated quotes of quotes, etc....
I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.

Thanks for confirming. If you said you did not want dups I was going to ask why you provided the PARSENAME solution 😀

Here are the solutions presented so far that would get you there:

SET STATISTICS TIME OFF ;

GO

SET NOCOUNT ON ;

GO

PRINT REPLICATE('-', 75) + '

build temp tables'

SELECT 'build temp tables'

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby1')

AND type IN ( N'U' ) )

DROP TABLE hobby1 ;

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby2')

AND type IN ( N'U' ) )

DROP TABLE hobby2 ;

CREATE TABLE hobby1

(

id INT IDENTITY(1, 1) NOT NULL,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

CREATE TABLE hobby2

(

id INT IDENTITY(1, 1) NOT NULL,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

GO

--===== Add PK's to both tables

ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

GO

--=====================================================================================================================

-- Create and populate the test tables. This is NOT a part of the solution.

--=====================================================================================================================

--===== Declare and preset some obviously named local variables

DECLARE @RowsInFirstTable INT,

@RowsInSecondTable INT ;

SELECT @RowsInFirstTable = 200000,

@RowsInSecondTable = 200000 ;

IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL

DROP TABLE #HobbyName;

--===== Create a list of HobbyNames so we can randomly populate the test tables.

-- The NULL's are simply part of the possible things that could be a hobby.

SELECT HobbyNameID = IDENTITY( INT,1,1),

HobbyName = CAST(d.HobbyName AS VARCHAR(15))

INTO #HobbyName

FROM (

SELECT NULL

UNION ALL

SELECT NULL

UNION ALL

SELECT 'BaseBall'

UNION ALL

SELECT 'Basketball'

UNION ALL

SELECT 'Tennis'

UNION ALL

SELECT 'FootBall'

UNION ALL

SELECT 'Hockey'

UNION ALL

SELECT 'Rugby'

UNION ALL

SELECT 'Golf'

UNION ALL

SELECT 'Bowling'

) d ( HobbyName ) ;

ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;

--===== Create the "first" table and populate it on the fly

INSERT INTO dbo.hobby1

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInFirstTable )

Hobby1 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

Hobby2 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

--===== Create the "second" table and populate it on the fly

INSERT INTO dbo.hobby2

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInSecondTable )

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

String manipulation (dups, ordered by column)'

SELECT 'String manipulation (dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT t.id,

RTRIM(SUBSTRING(t.s, 1, 36)),

RTRIM(SUBSTRING(t.s, 37, 36)),

RTRIM(SUBSTRING(t.s, 73, 36)),

RTRIM(SUBSTRING(t.s, 109, 36))

FROM (

SELECT h1.id,

ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

+ SPACE(36- LEN(h2.hobby2)),'') s

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Unpivot/Pivot (dups, ordered by column)'

SELECT 'Unpivot/Pivot (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH OneTableCTE

AS (

SELECT a.id,

a.hobby1 AS 'h1',

a.hobby2 AS 'h2',

b.hobby1 AS 'h3',

b.hobby2 AS 'h4'

FROM hobby1 a

INNER JOIN hobby2 b ON a.id = b.id

),

UnPivotCTE

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

AS fixedorder

FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

AS Normalized

)

SELECT OneTableCTE.id,

[1],

[2],

[3],

[4]

FROM (

SELECT id,

hobby,

fixedorder

FROM UnPivotCTE

) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

AS PivotTable

RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Union All/Cross tab (dups, ordered by column)'

SELECT 'Union All/Cross tab (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH cte(id, hobby, hobby_num)

AS (

SELECT h.id,

h.hobby1,

1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby1,

3

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

4

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Parsename (dups, ordered by column)'

SELECT 'Parsename (dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT h1.id,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

SET STATISTICS TIME OFF ;

GO

And some typical results on my machine for 100K rows:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 719 ms, elapsed time = 1717 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1188 ms, elapsed time = 1601 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 546 ms, elapsed time = 1571 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1172 ms, elapsed time = 1507 ms.

And 200K:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1625 ms, elapsed time = 2917 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2189 ms, elapsed time = 3343 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1141 ms, elapsed time = 2932 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2109 ms, elapsed time = 2973 ms.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Nevyn SSChampion Points: 14210 More actions · Answer 5

I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.

Do you want to show 4 nulls if there is a record but no hobbies?

If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).

One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 6

Nevyn (7/8/2011)
I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.
Do you want to show 4 nulls if there is a record but no hobbies?
If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).
One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.

I did not see a measurable performance difference when changing the JOIN type in the "Unpivot/Pivot". You can toggle whether to return rows with all NULL hobbies in the "Union All/Crosstab" by commenting out "WHERE hobby IS NOT NULL" in cte2. I did not see any noticeable difference in performance when toggling that option either.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Mike Scalise SSCrazy Eights Points: 8145 More actions · Answer 7

opc, thanks for laying all of that out.

Nevyn, yes, I'd like to see all NULLs if wherever there aren't hobbies (even if there's 0/4 hobbies populated I'd like to see 4 NULLs).

Good catch on the other possibility -- however, there will never be a record in one table and not in the other. At the very least, there will be a record with 2 NULL fields in each table.

Mike Scalise, PMP
https://www.michaelscalise.com

Paul White SSC Guru Points: 150468 More actions · Answer 8

It strikes me that the elapsed time of the tests is limited by the time it takes to display the results - consider using SELECT INTO to write the results to a temporary table.

For fun, I played with the string solution a bit to generate a parallel plan. On my machine, the following code runs in 700ms for 200,000 rows (further improvements may be possible, I didn't touch Tommy's core algorithm).

SELECT

t.id,

oa.h1,

oa.h2,

oa.h3,

oa.h4

INTO #dummy

FROM

(

SELECT

h.id,

h1h1 = h.hobby1,

h1h2 = h.hobby2,

h2h1 = h2.hobby1,

h2h2 = h2.hobby2

FROM dbo.hobby1 AS h

JOIN dbo.hobby2 AS h2 ON

h2.id = h.id

) AS x

CROSS APPLY

(

SELECT

x.id,

s =

ISNULL(x.h1h1 + SPACE(36 - LEN(x.h1h1)), SPACE(0)) +

CASE

WHEN x.h1h2 = x.h1h1 THEN SPACE(0)

ELSE ISNULL(x.h1h2 + SPACE(36 - LEN(x.h1h2)), SPACE(0))

END +

CASE

WHEN x.h2h1 IN (x.h1h1, x.h1h2) THEN SPACE(0)

ELSE ISNULL(x.h2h1 + SPACE(36 - LEN(x.h2h1)), SPACE(0))

END +

CASE

WHEN x.h2h2 IN (x.h1h1, x.h1h2, x.h2h1) THEN SPACE(0)

ELSE ISNULL(x.h2h2 + SPACE(36 - LEN(x.h2h2)), SPACE(0))

END

) AS t

OUTER APPLY

(

SELECT

h1 = RTRIM(SUBSTRING(t.s, 1, 36)),

h2 = RTRIM(SUBSTRING(t.s, 37, 36)),

h3 = RTRIM(SUBSTRING(t.s, 73, 36)),

h4 = RTRIM(SUBSTRING(t.s, 109, 36))

) AS oa

OPTION (HASH JOIN, LOOP JOIN);

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 9

I love this site 😀

SQLkiwi (7/8/2011)
It strikes me that the elapsed time of the tests is limited by the time it takes to display the results -

That's why I typically ignore elapsed time. CPU clock ticks is what always see considered the key indicator to go by.

consider using SELECT INTO to write the results to a temporary table.
For fun, I played with the string solution a bit to generate a parallel plan. On my machine, the following code runs in 700ms for 200,000 rows (further improvements may be possible, I didn't touch Tommy's core algorithm).

It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments below 😉

I also added the "INTO #dummy" to each solution and here are the results:

SET STATISTICS TIME OFF ;

GO

SET NOCOUNT ON ;

GO

PRINT REPLICATE('-', 75) + '

build temp tables'

SELECT 'build temp tables'

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby1')

AND type IN ( N'U' ) )

DROP TABLE hobby1 ;

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby2')

AND type IN ( N'U' ) )

DROP TABLE hobby2 ;

CREATE TABLE hobby1

(

id INT IDENTITY(1, 1) NOT NULL,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

CREATE TABLE hobby2

(

id INT IDENTITY(1, 1) NOT NULL,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

GO

--===== Add PK's to both tables

ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

GO

--=====================================================================================================================

-- Create and populate the test tables. This is NOT a part of the solution.

--=====================================================================================================================

--===== Declare and preset some obviously named local variables

DECLARE @RowsInFirstTable INT,

@RowsInSecondTable INT ;

SELECT @RowsInFirstTable = 200000,

@RowsInSecondTable = 200000 ;

IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL

DROP TABLE #HobbyName;

--===== Create a list of HobbyNames so we can randomly populate the test tables.

-- The NULL's are simply part of the possible things that could be a hobby.

SELECT HobbyNameID = IDENTITY( INT,1,1),

HobbyName = CAST(d.HobbyName AS VARCHAR(15))

INTO #HobbyName

FROM (

SELECT NULL

UNION ALL

SELECT NULL

UNION ALL

SELECT 'BaseBall'

UNION ALL

SELECT 'Basketball'

UNION ALL

SELECT 'Tennis'

UNION ALL

SELECT 'FootBall'

UNION ALL

SELECT 'Hockey'

UNION ALL

SELECT 'Rugby'

UNION ALL

SELECT 'Golf'

UNION ALL

SELECT 'Bowling'

) d ( HobbyName ) ;

ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;

--===== Create the "first" table and populate it on the fly

INSERT INTO dbo.hobby1

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInFirstTable )

Hobby1 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

Hobby2 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

--===== Create the "second" table and populate it on the fly

INSERT INTO dbo.hobby2

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInSecondTable )

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

String manipulation (dups, ordered by column)'

SELECT 'String manipulation (dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT t.id,

RTRIM(SUBSTRING(t.s, 1, 36)) AS h1,

RTRIM(SUBSTRING(t.s, 37, 36)) AS h2,

RTRIM(SUBSTRING(t.s, 73, 36)) AS h3,

RTRIM(SUBSTRING(t.s, 109, 36)) AS h4

INTO #dummy

FROM (

SELECT h1.id,

ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

+ SPACE(36- LEN(h2.hobby2)),'') s

FROM dbo.hobby1 h1

LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

Unpivot/Pivot (dups, ordered by column)'

SELECT 'Unpivot/Pivot (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH OneTableCTE

AS (

SELECT a.id,

a.hobby1 AS 'h1',

a.hobby2 AS 'h2',

b.hobby1 AS 'h3',

b.hobby2 AS 'h4'

FROM hobby1 a

LEFT JOIN hobby2 b ON a.id = b.id

),

UnPivotCTE

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

AS fixedorder

FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

AS Normalized

)

SELECT OneTableCTE.id,

[1],

[2],

[3],

[4]

INTO #dummy

FROM (

SELECT id,

hobby,

fixedorder

FROM UnPivotCTE

) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

AS PivotTable

RIGHT JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

Union All/Cross tab (dups, ordered by column)'

SELECT 'Union All/Cross tab (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH cte(id, hobby, hobby_num)

AS (

SELECT h.id,

h.hobby1,

1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby1,

3

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

4

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

FROM cte

--WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

INTO #dummy

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

Parsename (dups, ordered by column)'

SELECT 'Parsename (dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT h1.id,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

INTO #dummy

FROM dbo.hobby1 h1

LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'

SELECT 'String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'

SET STATISTICS TIME ON ;

SELECT

t.id,

oa.h1,

oa.h2,

oa.h3,

oa.h4

INTO #dummy

FROM

(

SELECT

h.id,

h1h1 = h.hobby1,

h1h2 = h.hobby2,

h2h1 = h2.hobby1,

h2h2 = h2.hobby2

FROM dbo.hobby1 AS h

JOIN dbo.hobby2 AS h2 ON

h2.id = h.id

) AS x

CROSS APPLY

(

SELECT

x.id,

s =

ISNULL(x.h1h1 + SPACE(36 - LEN(x.h1h1)), SPACE(0)) +

ISNULL(x.h1h2 + SPACE(36 - LEN(x.h1h2)), SPACE(0)) +

ISNULL(x.h2h1 + SPACE(36 - LEN(x.h2h1)), SPACE(0)) +

ISNULL(x.h2h2 + SPACE(36 - LEN(x.h2h2)), SPACE(0))

) AS t

OUTER APPLY

(

SELECT

h1 = RTRIM(SUBSTRING(t.s, 1, 36)),

h2 = RTRIM(SUBSTRING(t.s, 37, 36)),

h3 = RTRIM(SUBSTRING(t.s, 73, 36)),

h4 = RTRIM(SUBSTRING(t.s, 109, 36))

) AS oa

OPTION (HASH JOIN, LOOP JOIN);

SET STATISTICS TIME OFF ;

GO

Results for 200K on my machine:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1359 ms, elapsed time = 1435 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2095 ms, elapsed time = 1642 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1344 ms, elapsed time = 1419 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 2174 ms.
---------------------------------------------------------------------------
String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups
SQL Server Execution Times:
CPU time = 2031 ms, elapsed time = 1132 ms.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Paul White SSC Guru Points: 150468 More actions · Answer 10

opc.three (7/8/2011)
It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments below

Yeah no worries - I just picked one of the faster routines from earlier, I haven't been following the discussion closely enough to know what the current requirement is - so thanks for the modification. Not sure I deserve much credit on the code (it's Tommy's) I just added parallelism.

Results for 200K on my machine

Nice. You need more cores though :laugh:

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Nevyn SSChampion Points: 14210 More actions · Answer 11

opc.three (7/8/2011)
Nevyn (7/8/2011)
I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.
Do you want to show 4 nulls if there is a record but no hobbies?
If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).
One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.
I did not see a measurable performance difference when changing the JOIN type in the "Unpivot/Pivot". You can toggle whether to return rows with all NULL hobbies in the "Union All/Crosstab" by commenting out "WHERE hobby IS NOT NULL" in cte2. I did not see any noticeable difference in performance when toggling that option either.

To be clear, its not changing join type, you can take the join right out (it joined back to the first CTE only to catch the all null rows.

But even then, my guess is no performance boost.

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 12

SQLkiwi (7/8/2011)
opc.three (7/8/2011)
It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments below
Yeah no worries - I just picked one of the faster routines from earlier, I haven't been following the discussion closely enough to know what the current requirement is - so thanks for the modification. Not sure I deserve much credit on the code (it's Tommy's) I just added parallelism.
Results for 200K on my machine
Nice. You need more cores though :laugh:

No argument there 😛

Re: the machine I'm running on...it depends on the day but ATM I'm running everything on a single proc XP machine with HT :sick:

Would you mind grabbing the whole chunk of code from my last post and posting the results from running it on your machine?

I just ran it on some multi-core server HW and got this for 200K rows:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2110 ms, elapsed time = 2116 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 3980 ms, elapsed time = 1306 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1844 ms, elapsed time = 1840 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 3016 ms, elapsed time = 3012 ms.
---------------------------------------------------------------------------
String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups
SQL Server Execution Times:
CPU time = 2859 ms, elapsed time = 453 ms.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Paul White SSC Guru Points: 150468 More actions · Answer 13

opc.three (7/8/2011)
Would you mind grabbing the whole chunk of code from my last post and posting the results from running it on your machine?

16-core test machine:

---------------------------------------------------------------------------

build temp tables

---------------------------------------------------------------------------

String manipulation (dups, ordered by column)

SQL Server Execution Times:

CPU time = 1046 ms, elapsed time = 1138 ms.

---------------------------------------------------------------------------

Unpivot/Pivot (dups, ordered by column)

SQL Server Execution Times:

CPU time = 2994 ms, elapsed time = 849 ms.

---------------------------------------------------------------------------

Union All/Cross tab (dups, ordered by column)

SQL Server Execution Times:

CPU time = 1294 ms, elapsed time = 1329 ms.

Warning: Null value is eliminated by an aggregate or other SET operation.

---------------------------------------------------------------------------

Parsename (dups, ordered by column)

SQL Server Execution Times:

CPU time = 1716 ms, elapsed time = 1792 ms.

---------------------------------------------------------------------------

String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups

SQL Server Execution Times:

CPU time = 2795 ms, elapsed time = 269 ms.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 14

Nevyn (7/8/2011)
Cool stuff, Jeff!
So do you ever use PIVOT and/or UNPIVOT, or are workarounds almost always better/faster?

I don't use PIVOT because Cross-Tabs are (usually) both faster and easier to read especially if multiple values need to be aggregated and then pivoted. Since Paul taught me the neat trick with Cross-Apply, I feel the same way about UNPIVOT even though the differences in performance are less pronounced. Heh... I also try to avoid words like "always" and "never". 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 15

mikes84 (7/8/2011)

Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?

I suppose but let me ask... what do you want done with the following data since you also don't want dupes?

Table1

1,'BasketBall','Tennis'

Table2

1,'Tennis','BasketBall'

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)