Handle NULLs

Question

Handle NULLs

Viewing 15 posts - 76 through 90 (of 92 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

mister.magoo (7/8/2011)
My latest submission is based on the string manipulation, but simplified... on my QUAD core PC I like the results, but let's see how it does on yours.

Very creative. Shame about this: SET CONCAT_NULL_YIELDS_NULL OFF;

In a future version of SQL Server CONCAT_NULL_YIELDS_NULL will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 2

How should I format my case solution to get in on the testing?

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 3

The Dixie Flatline (7/8/2011)
Fun problem. 😀
This solution might be a little less CPU intensive because the only string manipulation is done with ISNULL or CASE tests.
The trick is to squeeze out the nulls from right-to-left, instead of left-to-right.
;with
cte1 as (select C1,C2,ISNULL(C3,C4) as C3
, CASE WHEN C3 IS NULL THEN NULL ELSE C4 END as C4 from #temp)
,cte2 as (select C1, ISNULL(C2,C3) as C2
, CASE WHEN C2 IS NULL THEN C4 ELSE C3 end as C3
, CASE WHEN C2 IS NULL THEN NULL ELSE C4 END as C4 from cte1)
,cte3 as (select ISNULL(C1,C2) as C1
, CASE WHEN C1 IS NULL then C3 ELSE C2 end as C2
, CASE WHEN C1 IS NULL THEN C4 ELSE C3 end as C3
, CASE WHEN C1 IS NULL THEN NULL ELSE C4 END as C4 from cte2)
select * from cte3

I just gave them a once-over but the result set looks correct. Blazing fast!

The entire set of code with The Dixie Flatline solution modified to work with the concrete test tables:

SET STATISTICS TIME OFF ;

GO

SET NOCOUNT ON ;

GO

PRINT REPLICATE('-', 75) + '

build temp tables'

SELECT 'build temp tables'

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'dbo.hobby1')

AND type IN ( N'U' ) )

DROP TABLE dbo.hobby1 ;

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'dbo.hobby2')

AND type IN ( N'U' ) )

DROP TABLE dbo.hobby2 ;

CREATE TABLE dbo.hobby1

(

id INT IDENTITY(1, 1) NOT NULL,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

CREATE TABLE dbo.hobby2

(

id INT IDENTITY(1, 1) NOT NULL,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

GO

--===== Add PK's to both tables

ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

GO

--=====================================================================================================================

-- Create and populate the test tables. This is NOT a part of the solution.

--=====================================================================================================================

--===== Declare and preset some obviously named local variables

DECLARE @RowsInFirstTable INT,

@RowsInSecondTable INT ;

SELECT @RowsInFirstTable = 200000,

@RowsInSecondTable = 200000 ;

IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL

DROP TABLE #HobbyName;

--===== Create a list of HobbyNames so we can randomly populate the test tables.

-- The NULL's are simply part of the possible things that could be a hobby.

SELECT HobbyNameID = IDENTITY( INT,1,1),

HobbyName = CAST(d.HobbyName AS VARCHAR(15))

INTO #HobbyName

FROM (

SELECT NULL

UNION ALL

SELECT NULL

UNION ALL

SELECT 'BaseBall'

UNION ALL

SELECT 'Basketball'

UNION ALL

SELECT 'Tennis'

UNION ALL

SELECT 'FootBall'

UNION ALL

SELECT 'Hockey'

UNION ALL

SELECT 'Rugby'

UNION ALL

SELECT 'Golf'

UNION ALL

SELECT 'Bowling'

) d ( HobbyName ) ;

ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;

--===== Create the "first" table and populate it on the fly

INSERT INTO dbo.hobby1

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInFirstTable )

Hobby1 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

Hobby2 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

--===== Create the "second" table and populate it on the fly

INSERT INTO dbo.hobby2

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInSecondTable )

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

String manipulation (dups, ordered by column)'

SELECT 'String manipulation (dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT t.id,

RTRIM(SUBSTRING(t.s, 1, 36)) AS h1,

RTRIM(SUBSTRING(t.s, 37, 36)) AS h2,

RTRIM(SUBSTRING(t.s, 73, 36)) AS h3,

RTRIM(SUBSTRING(t.s, 109, 36)) AS h4

--INTO #dummy

FROM (

SELECT h1.id,

ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

+ SPACE(36- LEN(h2.hobby2)),'') s

FROM dbo.hobby1 h1

LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

Unpivot/Pivot (dups, ordered by column)'

SELECT 'Unpivot/Pivot (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH OneTableCTE

AS (

SELECT a.id,

a.hobby1 AS 'h1',

a.hobby2 AS 'h2',

b.hobby1 AS 'h3',

b.hobby2 AS 'h4'

FROM hobby1 a

LEFT JOIN hobby2 b ON a.id = b.id

),

UnPivotCTE

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

AS fixedorder

FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

AS Normalized

)

SELECT OneTableCTE.id,

[1],

[2],

[3],

[4]

--INTO #dummy

FROM (

SELECT id,

hobby,

fixedorder

FROM UnPivotCTE

) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

AS PivotTable

RIGHT JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

Union All/Cross tab (dups, ordered by column)'

SELECT 'Union All/Cross tab (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH cte(id, hobby, hobby_num)

AS (

SELECT h.id,

h.hobby1,

1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby1,

3

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

4

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

--INTO #dummy

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

Parsename (dups, ordered by column)'

SELECT 'Parsename (dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT h1.id,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

--INTO #dummy

FROM dbo.hobby1 h1

LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy;

GO

PRINT REPLICATE('-', 75) + '

String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'

SELECT 'String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'

SET STATISTICS TIME ON ;

SELECT

t.id,

oa.h1,

oa.h2,

oa.h3,

oa.h4

--INTO #dummy

FROM

(

SELECT

h.id,

h1h1 = h.hobby1,

h1h2 = h.hobby2,

h2h1 = h2.hobby1,

h2h2 = h2.hobby2

FROM dbo.hobby1 AS h

JOIN dbo.hobby2 AS h2 ON

h2.id = h.id

) AS x

CROSS APPLY

(

SELECT

x.id,

s =

ISNULL(x.h1h1 + SPACE(36 - LEN(x.h1h1)), SPACE(0)) +

ISNULL(x.h1h2 + SPACE(36 - LEN(x.h1h2)), SPACE(0)) +

ISNULL(x.h2h1 + SPACE(36 - LEN(x.h2h1)), SPACE(0)) +

ISNULL(x.h2h2 + SPACE(36 - LEN(x.h2h2)), SPACE(0))

) AS t

OUTER APPLY

(

SELECT

h1 = RTRIM(SUBSTRING(t.s, 1, 36)),

h2 = RTRIM(SUBSTRING(t.s, 37, 36)),

h3 = RTRIM(SUBSTRING(t.s, 73, 36)),

h4 = RTRIM(SUBSTRING(t.s, 109, 36))

) AS oa

OPTION (HASH JOIN, LOOP JOIN);

GO

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

IF OBJECT_ID(N'tempdb..#dummy') > 0

DROP TABLE #dummy ;

GO

PRINT REPLICATE('-', 75) + '

CTEs - The Dixie Flatline (dups, ordered by column)'

SELECT 'CTEs - The Dixie Flatline (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH cte1

AS (

SELECT h1.id,

h1.hobby1 AS C1,

h1.hobby2 AS C2,

ISNULL(h2.hobby1, h2.hobby2) AS C3,

CASE WHEN h2.hobby1 IS NULL THEN NULL

ELSE h2.hobby2

END AS C4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

),

cte2

AS (

SELECT id,

C1,

ISNULL(C2, C3) AS C2,

CASE WHEN C2 IS NULL THEN C4

ELSE C3

END AS C3,

CASE WHEN C2 IS NULL THEN NULL

ELSE C4

END AS C4

FROM cte1

),

cte3

AS (

SELECT id,

ISNULL(C1, C2) AS C1,

CASE WHEN C1 IS NULL THEN C3

ELSE C2

END AS C2,

CASE WHEN C1 IS NULL THEN C4

ELSE C3

END AS C3,

CASE WHEN C1 IS NULL THEN NULL

ELSE C4

END AS C4

FROM cte2

)

SELECT *

--INTO #dummy

FROM cte3 ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

Results on a dual-core machine:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1716 ms, elapsed time = 4077 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2916 ms, elapsed time = 4727 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1700 ms, elapsed time = 3982 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2683 ms, elapsed time = 4183 ms.
---------------------------------------------------------------------------
String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups
SQL Server Execution Times:
CPU time = 2449 ms, elapsed time = 4031 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
---------------------------------------------------------------------------
CTEs - The Dixie Flatline (dups, ordered by column)
SQL Server Execution Times:
CPU time = 561 ms, elapsed time = 3903 ms.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 4

opc.three (7/9/2011)
The entire set of code with The Dixie Flatline solution modified to work with the concrete test tables:

I have to ask... is there a reason why you avoid the use of SELECT/INTO?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 5

tfifield (7/8/2011)
Jeff Moden (7/8/2011)
Nevyn (7/8/2011)
Cool stuff, Jeff!
So do you ever use PIVOT and/or UNPIVOT, or are workarounds almost always better/faster?
I don't use PIVOT because Cross-Tabs are (usually) both faster and easier to read especially if multiple values need to be aggregated and then pivoted. Since Paul taught me the neat trick with Cross-Apply, I feel the same way about UNPIVOT even though the differences in performance are less pronounced. Heh... I also try to avoid words like "always" and "never". 😀
Jeff,
Is there a link to Paul's CROSS APPLY method for UNPIVOT? I'd love to see it.
Todd Fifield

I don't have one for UNPIVOT but here's one I have on PIVOT...

http://www.sqlservercentral.com/articles/T-SQL/63681/

"Pre-aggregated Cross Tabs" pretty much blows the doors off of PIVOT. Regular "Cross Tabs" don't do so bad, either. Paul's explanation tells why.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 6

Jeff Moden (7/9/2011)
opc.three (7/9/2011)
The entire set of code with The Dixie Flatline solution modified to work with the concrete test tables:
I have to ask... is there a reason why you avoid the use of SELECT/INTO?

Old habits die hard I guess 😀

I was on a thread recently where Craig showed that SELECT...INTO was faster for creating heaps in tempdb but that creating a table with a clustered index in place and then doing the insert was faster for creating clustered tables.The tests were not exhaustive but he concluded he would start creating his temp tables ahead of time if a clustered index was needed.

In my earlier posts on this thread I was not using a clustered index on the hobby1 and hobby2 tables when I would have been better served using SELECT...INTO to create them. However when you introduced your test data routine and included a clustered index on each table that's when the practice of creating the tables ahead of time and using INSERT...INTO became more appropriate.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 7

opc.three (7/10/2011)
However when you introduced your test data routine and included a clustered index on each table that's when the practice of creating the tables ahead of time and using INSERT...INTO became more appropriate.

Heh.. I won't give that an "It Depends". I guess I'll give that a "Whatever works" at this point but only because it's test data and not production data. It's more important that someone is actually using some representative test data. I'll even (almost) tolerate someone making a loop to build the data so long as they're making the effort to build more than the usual small handful of rows. 🙂

I use the SELECT/INTO on test data only because if a data requirement changes as I'm writing code, I don't have to change both the data and the test table... the SELECT/INTO changes it for me. Other than that, "It Depends" comes into play for production building of large tables and it's sometimes very well worth testing both as Craig did.

As a side bar, when I get a little time, I'll have to try some general testing like Craig did (I remember the post but not sure I can find it easily). I always like to double check.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 8

Craig's testing showed it for a single-column narrow PK (INT). I wondered whether wider clustering keys would have SELECT...INTO start outperforming the CREATE/INSERT method, and while I did some basic tests using 3-column INT keys showing CREATE/INSERT was still faster, I did not wonder enough to do any exhaustive testing including character-based columns etc. so I go with Craig's conclusion 😀

From the post http://www.sqlservercentral.com/Forums/FindPost1130873.aspx:

-- Skipping the gory details, the fully pre-built STILL wins.
-- Even though it takes on average about 40ms longer to insert all the data,
-- the subsequent clustered index build takes about 180ms.
-- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Revenant SSC-Forever Points: 42467 More actions · Answer 9

opc.three (7/11/2011)
Craig's testing showed it for a single-column narrow PK (INT). I wondered whether wider clustering keys would have SELECT...INTO start outperforming the CREATE/INSERT method, and while I did some basic tests using 3-column INT keys showing CREATE/INSERT was still faster, I did not wonder enough to do any exhaustive testing including character-based columns etc. so I go with Craig's conclusion 😀
From the post http://www.sqlservercentral.com/Forums/FindPost1130873.aspx:
-- Skipping the gory details, the fully pre-built STILL wins.
-- Even though it takes on average about 40ms longer to insert all the data,
-- the subsequent clustered index build takes about 180ms.
-- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.

These results will vary with the flavor of SQLS and with the family of processors it runs on (Sandy Bridge vs. Nehalem, typically).

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 10

Revenant (7/11/2011)
opc.three (7/11/2011)
Craig's testing showed it for a single-column narrow PK (INT). I wondered whether wider clustering keys would have SELECT...INTO start outperforming the CREATE/INSERT method, and while I did some basic tests using 3-column INT keys showing CREATE/INSERT was still faster, I did not wonder enough to do any exhaustive testing including character-based columns etc. so I go with Craig's conclusion 😀
From the post http://www.sqlservercentral.com/Forums/FindPost1130873.aspx:
-- Skipping the gory details, the fully pre-built STILL wins.
-- Even though it takes on average about 40ms longer to insert all the data,
-- the subsequent clustered index build takes about 180ms.
-- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.
These results will vary with the flavor of SQLS and with the family of processors it runs on (Sandy Bridge vs. Nehalem, typically).

:unsure: How would CPU architecture figure in? Got stats?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Revenant SSC-Forever Points: 42467 More actions · Answer 11

opc.three (7/11/2011)
. . . :unsure: How would CPU architecture figure in? Got stats?

Yes, I do have hard measurements. I did comparisons between Nehalem and Sandy Bridge, and at a nominally same clock speed Sandy Bridge is 10 or more percent faster.

As the study was done on client's penny, I have to ask my boss for (formal) permission to release the original data. I will ask him on Wednesday during our weekly one-on-one and mail it to your private address.

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 12

Revenant (7/11/2011)
opc.three (7/11/2011)
. . . :unsure: How would CPU architecture figure in? Got stats?
Yes, I do have hard measurements. I did comparisons between Nehalem and Sandy Bridge, and at a nominally same clock speed Sandy Bridge is 10 or more percent faster.

Thanks...but 10% better at what? I assume you mean SELECT...INTO as to contradict Craig's and my conclusions?

So does this mean we're back to "It Depends" 😛

As the study was done on client's penny, I have to ask my boss for (formal) permission to release the original data. I will ask him on Wednesday during our weekly one-on-one and mail it to your private address.

Any insight into "How would CPU architecture figure in?"? Cache possibly?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Revenant SSC-Forever Points: 42467 More actions · Answer 13

opc.three (7/11/2011)
Any insight into "How would CPU architecture figure in?"? Cache possibly?

Sandy Bridge has on-the-chip support for parallelization and is able to share load more efficiently, without being told how to go about it.

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 14

Thanks a lot Jeff...another thread completely derailed!

Anyone have thoughts on The Dixie Flatline's fine solution? Hopefully conclusions won't depend on a video GPU or sound card or anything like that :hehe:

Revenant (7/11/2011)
opc.three (7/11/2011)
Any insight into "How would CPU architecture figure in?"? Cache possibly?
Sandy Bridge has on-the-chip support for parallelization and is able to share load more efficiently, without being told how to go about it.

I am being dense again Revenant...sorry. I would say it's because it's Monday morning but that would be wishful thinking on my part in trying to isolate it to one morning of one day of the week.

Can you please clarify your comparisons?

You said:

I did comparisons between Nehalem and Sandy Bridge, and at a nominally same clock speed Sandy Bridge is 10 or more percent faster.

Are you saying that SELECT...INTO is 10% faster on Sandy Bridge than CREATE TABLE + INSERT INTO on Nehalem?

Or is it that SELECT...INTO on Sandy Bridge 10% faster than SELECT...INTO on Nehalem?

Given that you opened the discussion, hopefully clarifying your conclusions will not violate anything regarding your client's penny.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Paul White SSC Guru Points: 150468 More actions · Answer 15

opc.three (7/11/2011)
Craig's testing showed it for a single-column narrow PK (INT). I wondered whether wider clustering keys would have SELECT...INTO start outperforming the CREATE/INSERT method, and while I did some basic tests using 3-column INT keys showing CREATE/INSERT was still faster, I did not wonder enough to do any exhaustive testing including character-based columns etc. so I go with Craig's conclusion 😀
From the post http://www.sqlservercentral.com/Forums/FindPost1130873.aspx

I added my opinion to that thread.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi