Data Loading Performance issues

Question

Post reply

Data Loading Performance issues

shump66@yahoo.com

Right there with Babe

Points: 791
More actions
November 13, 2010 at 5:58 am

#231547

Hi all
This based on this article http://msdn.microsoft.com/en-us/library/dd425070(SQL.100).aspx
General information
Trace Flag 610 in on, the data base is in Simply Recover, and Server is 2008 sp1
What I am doing
I have a table w/clustered index and it empty I do my first batch of insert into table and minimum logging works and data look good. I run my second batch of insert and minimum logging does not seem to work. Just so we are clear the cluster index we are using is very simple for this test four values A,B,C,D 10 million rows each and we insert the data in the order of the cluster 10m ‘A’ then 10m ‘B’ and so on.
Here is a sample on the insert any thoughts would helpful
Thanks
Scott
CREATE TABLE OutPutTable
(
IDRow int NULL
,ColInt int NULL
,ExpRow Char(1) NULL
,ColVarchar varchar(20) NULL
,Colchar char(2) NULL
,ColCSV varchar(80) NULL
,ColMoney money NULL
,ColNumeric numeric(16,4) NULL
,ColDate datetime NULL
,AutoId int IDENTITY(1,1) NOT NULL
)
CREATE CLUSTERED INDEX Clust_IDX ON OutPutTable (ExpRow)WITH (FillFactor = 100)
GO
DBCC TRACEON (610)
Go
--First Batch
INSERT INTO OutPutTable WITH(Tablockx)
(
IDRow
,ColInt
,ExpRow
,ColVarchar
,Colchar
,ColCSV
,ColMoney
,ColNumeric
,ColDate
)
SELECT
IDRow
,ColInt
,ExpRow
,ColVarchar
,Colchar
,ColCSV
,ColMoney
,ColNumeric
,ColDate
FROM
SAMPLEDATA
WHERE
ExpRow = 'A'
GO
DBCC TRACEOFF (610)
GO
DBCC TRACEON (610)
Go
--Second Batch
INSERT INTO OutPutTable WITH(Tablockx)
(
IDRow
,ColInt
,ExpRow
,ColVarchar
,Colchar
,ColCSV
,ColMoney
,ColNumeric
,ColDate
)
SELECT
IDRow
,ColInt
,ExpRow
,ColVarchar
,Colchar
,ColCSV
,ColMoney
,ColNumeric
,ColDate
FROM
SAMPLEDATA
WHERE
ExpRow = 'B'
GO
DBCC TRACEOFF (610)
GO
Scott
I am a Senior Data Analyst/DBA
I work with MS SQL, Oracle & several ETL Tools
@thewolf66[/url]

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

I my understanding of Sunils blog is correct, you don't need to add any TABLOCK hint when using Trace Flag 610.

You might even be able to laod the data in parallel with minimal logging.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

shump66@yahoo.com Right there with Babe Points: 791 More actions · Answer 2

Hi

I am not saying that Sunils blog is incorrect but in real world testing it not working w/t610 on and with TABLOCKX or with out TABLOCKX.

Scott

Scott
I am a Senior Data Analyst/DBA
I work with MS SQL, Oracle & several ETL Tools

@thewolf66[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 3

There are a couple of ways in which you might not get minimal logging in your example where ExpRow = B. Are you able to run that query (with Actual Execution Plan ON), save the graphical execution plan (right-click on it), and upload it please?

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 4

In the meantime, I knocked together a demo based on your example, which shows TF610 working as advertised:

-- REQUIRES 2008

-- USE A DATABASE IN SIMPLE OR BULK_LOGGED MODE

-- A suitable database on my machine is called 'Sandpit'

USE Sandpit;

GO

-- This is the sample data for bulk-loading

CREATE TABLE dbo.SampleData

(

exp_row CHAR(1) NOT NULL,

);

GO

-- Load 9,160 test rows

-- 3,000 each in exp_row 'A', 'B', 'C'

-- 160 rows in exp_row 'D'

WITH Numbers (n)

AS (

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1

FROM master.sys.all_columns AC1,

master.sys.all_columns AC2,

master.sys.all_columns AC3

)

INSERT dbo.SampleData

(exp_row)

SELECT TOP (3000 * 3 + 160)

exp_row = CHAR(65 + (n / 3000))

FROM Numbers;

GO

-- Show the sample data summary

SELECT SD.exp_row,

row_count = COUNT_BIG(*)

FROM dbo.SampleData SD

GROUP BY

SD.exp_row

ORDER BY

SD.exp_row;

-- Create the target of the bulk load

CREATE TABLE dbo.ToLoad

(

row_id INTEGER IDENTITY NOT NULL,

exp_row CHAR(1) NOT NULL,

padding CHAR(1000) NOT NULL DEFAULT SPACE(1000)

);

GO

-- The clustered index on the target table

CREATE CLUSTERED INDEX c ON dbo.ToLoad (exp_row);

GO

-- Truncate the log

CHECKPOINT;

GO

-- Show that the only log entries are the start and end of the checkpoint

SELECT L.Operation,

L.[Checkpoint Begin],

L.[Checkpoint End]

FROM sys.fn_dblog(NULL, NULL) L

GO

-- Ensure TF610 is off

DBCC TRACEOFF (610);

GO

-- ===============

-- === TEST 1 ====

-- ===============

-- Load exp_row A

-- Minimally-logged load into empty clustered table

-- Only TABLOCK required

INSERT dbo.ToLoad WITH (TABLOCK)

(exp_row)

SELECT SD.exp_row

FROM dbo.SampleData SD

WHERE SD.exp_row = 'A'

ORDER BY

SD.exp_row

OPTION (RECOMPILE);

GO

-- Show largest log records relating to the target table

-- (All far too small to be fully logged)

SELECT TOP (10)

test = 'A',

L.Operation,

L.Context,

L.[Log Record Fixed Length],

L.[Log Record Length]

FROM sys.fn_dblog(NULL, NULL) L

WHERE L.AllocUnitName = N'dbo.ToLoad.c'

AND L.Context <> N'LCX_INDEX_INTERIOR'

ORDER BY

L.[Log Record Length] DESC;

GO

-- ===============

-- === TEST 2 ====

-- ===============

-- Load exp_row B

-- Clear the log again

CHECKPOINT;

GO

-- Enable TF610

DBCC TRACEON (610);

GO

-- TABLOCK not required

INSERT dbo.ToLoad

(exp_row)

SELECT SD.exp_row

FROM dbo.SampleData SD

WHERE SD.exp_row = 'B'

ORDER BY

SD.exp_row

OPTION (RECOMPILE);

GO

-- Log records show fully-logged inserts

-- until a new page is allocated then

-- minimal logging for the remaining rows

SELECT TOP (10)

test = 'B',

L.Operation,

L.Context,

L.[Log Record Fixed Length],

L.[Log Record Length]

FROM sys.fn_dblog(NULL, NULL) L

WHERE L.AllocUnitName = N'dbo.ToLoad.c'

AND L.Context <> N'LCX_INDEX_INTERIOR'

ORDER BY

L.[Log Record Length] DESC;

GO

-- Clear the log again

CHECKPOINT;

GO

-- ===============

-- === TEST 3 ====

-- ===============

-- Load exp_row C

-- Disable TF610

DBCC TRACEOFF (610);

GO

-- Minimal logging not possible now

-- (even with TABLOCK)

INSERT dbo.ToLoad WITH (TABLOCK)

(exp_row)

SELECT SD.exp_row

FROM dbo.SampleData SD

WHERE SD.exp_row = 'C'

ORDER BY

SD.exp_row

OPTION (RECOMPILE);

GO

-- Log records show all inserts were fully logged

-- even after a new page/extent is allocated

SELECT TOP (10)

test = 'C',

L.Operation,

L.Context,

L.[Log Record Fixed Length],

L.[Log Record Length]

FROM sys.fn_dblog(NULL, NULL) L

WHERE L.AllocUnitName = N'dbo.ToLoad.c'

AND L.Context <> N'LCX_INDEX_INTERIOR'

ORDER BY

L.[Log Record Length] DESC;

GO

-- Clear the log again

CHECKPOINT;

GO

-- ===============

-- === TEST 4 ====

-- ===============

-- Load exp_row D (small number of rows)

-- Enable TF610

DBCC TRACEON (610);

GO

INSERT dbo.ToLoad

(exp_row)

SELECT SD.exp_row

FROM dbo.SampleData SD

WHERE SD.exp_row = 'D'

ORDER BY

SD.exp_row

OPTION (RECOMPILE);

GO

-- Log records show inserts were fully logged

SELECT TOP (10)

test = 'D',

L.Operation,

L.Context,

L.[Log Record Fixed Length],

L.[Log Record Length]

FROM sys.fn_dblog(NULL, NULL) L

WHERE L.AllocUnitName = N'dbo.ToLoad.c'

AND L.Context <> N'LCX_INDEX_INTERIOR'

ORDER BY

L.[Log Record Length] DESC;

GO

DBCC TRACEOFF (610);

-- Clean up

DROP TABLE

dbo.SampleData,

dbo.ToLoad;

GO

edit: error in comment block

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 5

I haven't looked at the article yet but BOL states that minimal logging will only work on a table with a clustered index if and only if the table starts out empty. A second load on the table will use minimal logging only if you truncate the table first... trace flag or no trace flag, TABLOCKX or no TABLOCKX... period. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

Heh... OK... Now I get to pry the shoe out of my mouth. :blush: It looks like TF 610 does, in fact, help with converting fully logged inserts to minimally logged inserts. Oddly enough, it still doesn't matter for anything I do because I never do bulk inserts directly to the final table. I always put them into a new or newly truncated staging table first so I can validate the data, etc, before it goes anywhere near the final table. Once I start moving data from the staging table to the final table, I'm going to want it all to be logged, anyway.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 7

Jeff Moden (11/14/2010)
I haven't looked at the article yet but BOL states that minimal logging will only work on a table with a clustered index if and only if the table starts out empty. A second load on the table will use minimal logging only if you truncate the table first... trace flag or no trace flag, TABLOCKX or no TABLOCKX... period. 🙂

You really do need to read that article. It's a major change in 2008, and a huge performance win in many cases.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 8

Jeff Moden (11/14/2010)
Heh... OK... Now I get to pry the shoe out of my mouth. :blush: It looks like TF 610 does, in fact, help with converting fully logged inserts to minimally logged inserts.

Yes, it does. I spent quite some time on the script I posted to make that point.

Oddly enough, it still doesn't matter for anything I do because I never do bulk inserts directly to the final table.

You *never* use INSERT...SELECT with several hundred rows or more?

Once I start moving data from the staging table to the final table, I'm going to want it all to be logged, anyway.

I'm interested to know why you need full logging when bulk loading the final table.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

shump66@yahoo.com Right there with Babe Points: 791 More actions · Answer 9

Thanks for the time I will pull the plan for you and test your script in my environment

Scott

Scott
I am a Senior Data Analyst/DBA
I work with MS SQL, Oracle & several ETL Tools

@thewolf66[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 10

shump66 (11/15/2010)
Thanks for the time I will pull the plan for you and test your script in my environment

I'll look forward to it.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

shump66@yahoo.com Right there with Babe Points: 791 More actions · Answer 11

Jeff Moden (11/14/2010)
Heh... OK... Now I get to pry the shoe out of my mouth. :blush: It looks like TF 610 does, in fact, help with converting fully logged inserts to minimally logged inserts. Oddly enough, it still doesn't matter for anything I do because I never do bulk inserts directly to the final table. I always put them into a new or newly truncated staging table first so I can validate the data, etc, before it goes anywhere near the final table. Once I start moving data from the staging table to the final table, I'm going to want it all to be logged, anyway.

We work with very large data set, At times it is necessary to re-create a base tables to match a client new extract we have tables in house that start at 100M rows and go to and pass 10B rows If I have a table say with 50M rows 10 columns that has a clustered index I can use this insert..select into a new table with the same Clust index and it will take about 2-4 min. and the log file would grow to 1g. The issue I am have is I need break a 1.8B row table up into batch as we are changing the clust index and would like to Insert...Select in the correct order to take advantage minimum logging w/t610

Scott

Scott
I am a Senior Data Analyst/DBA
I work with MS SQL, Oracle & several ETL Tools

@thewolf66[/url]

shump66@yahoo.com Right there with Babe Points: 791 More actions · Answer 12

Hi Paul

Here is some of the info you asked for As you can see for the log size and it is acting as it is fully log

As before

Thanks for your help

Scott

****Before any data insert just contains source data

DataBaseName TotalSize_MB SpaceUsed_MB FreeSpaceinDB_MB LogSize_MB

----------------- ------------ ------------ ---------------- -----------

SPHTest 4288 4284 4 1

****After First Run With Type A

DataBaseName TotalSize_MB SpaceUsed_MB FreeSpaceinDB_MB LogSize_MB

----------------- ------------ ------------ ---------------- -----------

SPHTest 5488 5452 36 426

****After Second0 Run With Type B

DataBaseName TotalSize_MB SpaceUsed_MB FreeSpaceinDB_MB LogSize_MB

---------------- ------------ ------------ ---------------- -----------

SPHTest 6688 6626 62 5551

test Operation Context Log Record Fixed Length Log Record Length

---- ------------------------------- ------------------------------- ----------------------- -----------------

B LOP_INSERT_ROWS LCX_CLUSTERED 62 4216

B LOP_INSERT_ROWS LCX_CLUSTERED 62 204

--Here is the code as is

CHECKPOINT;

go

DBCC TRACEON (610)

GO

INSERT INTO SPHTest.dbo.OutPutTable WITH(Tablockx)

(

IDRowNum

,SomeInt

,ExpRow

,SomeVarchar

,Somechar

,SomeCSV

,SomeMoney

,SomeNumeric

,SomeDate

,SomeHex12

)

SELECT

IDRowNum

,SomeInt

,ExpRow

,SomeVarchar

,Somechar

,SomeCSV

,SomeMoney

,SomeNumeric

,SomeDate

,SomeHex12

FROM

SPHTest.dbo.SAMPLEDATA

WHERE

ExpRow = 'B'

ORDER BY

ExpRow

OPTION (RECOMPILE);

DBCC TRACEOFF (610)

GO

Scott
I am a Senior Data Analyst/DBA
I work with MS SQL, Oracle & several ETL Tools

@thewolf66[/url]

shump66@yahoo.com Right there with Babe Points: 791 More actions · Answer 13

Paul White NZ (11/14/2010)

In the meantime, I knocked together a demo based on your example, which shows TF610 working as advertised:

Here is the result to your script run on my servers I have some looging going on

I change the script to insert a few more rows

But the log file is over 8Gig after the fact? is this correct

Before Run

DataBaseName TotalSize_MB SpaceUsed_MB FreeSpaceinDB_MB LogSize_MB

----------------- ------------ ------------ ---------------- -----------

SPHTest 54 1 53 1

After I Created Sample Data and shrunk log

DataBaseName TotalSize_MB SpaceUsed_MB FreeSpaceinDB_MB LogSize_MB

---------------- ------------ ------------ ---------------- -----------

SPHTest 117 117 0 1

After I ran type 'A' note the size of the log File minimal logging

DataBaseName TotalSize_MB SpaceUsed_MB FreeSpaceinDB_MB LogSize_MB

--------------- ------------ ------------ ---------------- -----------

SPHTest 3517 3442 75 151

After I ran type 'B' note the size of the log File this can not be minimal logging

DataBaseName TotalSize_MB SpaceUsed_MB FreeSpaceinDB_MB LogSize_MB

-------------- ------------ ------------ ---------------- -----------

SPHTest 6917 6795 122 8976

After full Run of your script

DataBaseName TotalSize_MB SpaceUsed_MB FreeSpaceinDB_MB LogSize_MB

----------------- ------------ ------------ ---------------- -----------

SPHTest 12054 11982 72 8976

--Server Version

Microsoft SQL Server 2008 (SP1) - 10.0.2723.0 (X64)

Jul 9 2009 23:46:07

Developer Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2)

--Database Setup

SET COMPATIBILITY_LEVEL = 100

SET ANSI_NULL_DEFAULT ON

SET ANSI_NULLS OFF

SET ANSI_PADDING OFF

SET ANSI_WARNINGS OFF

SET ARITHABORT OFF

SET AUTO_CLOSE OFF

SET AUTO_CREATE_STATISTICS ON

SET AUTO_SHRINK OFF

SET AUTO_UPDATE_STATISTICS ON

SET CURSOR_CLOSE_ON_COMMIT OFF

SET CURSOR_DEFAULT GLOBAL

SET CONCAT_NULL_YIELDS_NULL OFF

SET NUMERIC_ROUNDABORT OFF

SET QUOTED_IDENTIFIER OFF

SET RECURSIVE_TRIGGERS OFF

SET DISABLE_BROKER

SET AUTO_UPDATE_STATISTICS_ASYNC OFF

SET DATE_CORRELATION_OPTIMIZATION OFF

SET TRUSTWORTHY OFF

SET ALLOW_SNAPSHOT_ISOLATION OFF

SET PARAMETERIZATION SIMPLE

SET READ_COMMITTED_SNAPSHOT OFF

SET HONOR_BROKER_PRIORITY OFF

SET READ_WRITE

SET RECOVERY SIMPLE

SET MULTI_USER

SET PAGE_VERIFY TORN_PAGE_DETECTION

SET DB_CHAINING OFF

(10600000 row(s) affected)

exp_row row_count

------- --------------------

A 3000000

B 3000000

C 3000000

D 1600000

(4 row(s) affected)

Operation Checkpoint Begin Checkpoint End

------------------------------- ------------------------ ------------------------

LOP_BEGIN_CKPT 2010/11/15 14:38:52:930 NULL

LOP_END_CKPT NULL 2010/11/15 14:38:59:530

(2 row(s) affected)

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(3000000 row(s) affected)

test Operation Context Log Record Fixed Length Log Record Length

---- ------------------------------- ------------------------------- ----------------------- -----------------

A LOP_MODIFY_ROW LCX_PFS 62 92

(10 row(s) affected)

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(3000000 row(s) affected)

test Operation Context Log Record Fixed Length Log Record Length

---- ------------------------------- ------------------------------- ----------------------- -----------------

B LOP_INSERT_ROWS LCX_CLUSTERED 62 1108

B LOP_INSERT_ROWS LCX_CLUSTERED 62 1100

B LOP_MODIFY_ROW LCX_PFS 62 92

(10 row(s) affected)

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(3000000 row(s) affected)

test Operation Context Log Record Fixed Length Log Record Length

---- ------------------------------- ------------------------------- ----------------------- -----------------

C LOP_INSERT_ROWS LCX_CLUSTERED 62 1104

(10 row(s) affected)

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(1600000 row(s) affected)

test Operation Context Log Record Fixed Length Log Record Length

---- ------------------------------- ------------------------------- ----------------------- -----------------

D LOP_INSERT_ROWS LCX_CLUSTERED 62 1108

D LOP_INSERT_ROWS LCX_CLUSTERED 62 1100

D LOP_MODIFY_ROW LCX_PFS 62 92

(10 row(s) affected)

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Scott
I am a Senior Data Analyst/DBA
I work with MS SQL, Oracle & several ETL Tools

@thewolf66[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 14

Scott,

The good news is that you are seeing minimal logging, but read on.

Minimal logging means that SQL Server does not log the individual row changes; it logs changes to pages instead. The crucial point is that SQL Server performs many fewer logging operations when using minimal logging. Minimal logging does not necessarily mean that less log space will be used overall (it might use more).

When changing one row at a time, SQL Server has to build the log record for that change (including any compensating actions needed to undo it), write the log record, wait for the storage system to report that the log record is safely stored, and then make the data change. With minimal logging, SQL Server can make a whole page of changes at once.

The primary advantage of minimal logging then, is that the number of logged operations is roughly proportional to the number of pages written, rather than the number of rows. It is often more efficient to log a smaller number of (potentially slightly larger) records, than log a larger number of perhaps slightly smaller records.

So, under minimal logging, operations that change a large number of rows will complete faster than under full logging. Minimal logging therefore optimizes for speed - though it might end up using more log space, under some conditions. For a small number of rows, minimal logging might actually be slower - that is the reason that test D in my script does not use minimal logging.

Inserting into an empty clustered table allows more minimal-logging optimizations than when the same table already contains records. This is the reason that test A generates many fewer log operations, compared to test B. For an empty clustered table, the structure of the clustered index can be created with the pages. When data already exists, we can't just create the clustered structure as we go, we have to link it in to the existing structure.

I modified my script to show the differences between full and minimally logged behaviour:

Test A (empty clustered table)

Minimal Logging: 35 log records; 23,800 bytes used; 43,876 bytes reserved

Full Logging: 464 log records; 3,220,716 bytes used; 52,158 bytes reserved

Test B (non-empty clustered table)

Minimal Logging: 74 log records; 321,064 bytes used; 11,521,977 bytes reserved

Full Logging: 3069 log records; 3,689,584 bytes used; 970,684 bytes reserved

As you can see, minimal logging generates many fewer records, which use less log space. Notice though that more log space is reserved under minimal logging. This log reservation is used to guarantee there will be enough log space if the server needs to roll the changes back. As Sunil mentions on his blog, the algorithm that reserves space in the log is deliberately very conservative, and reserves space as if we were using full logging.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi