Table to table copy performance mystery - 5 billions rows.

Question

Table to table copy performance mystery - 5 billions rows.

Tom John-342103

SSCrazy

Points: 2694
More actions
May 17, 2016 at 7:04 am

#307822

I have a cross reference table as follows:
[SecDetailDirKey] [bigint] NOT NULL,
[ImpLogDirKey] [int] NOT NULL,
[AggrTranID] [bigint] NOT NULL,
[AggrTranIDSource] [varchar](30) NOT NULL,
[Disabled] [bit] NOT NULL
The cluster index is: [AggrTranID] ASC,[ImpLogDirKey] ASC,[SecDetailDirKey] ASC
The table is partitioned into 5 equal sized (by row count) partitions on the AggrTranID column.
I have five copy scripts that I run in SSMS that copies the data from another table and inserts it into this table. The copy jobs are aligned on partitions meaning that the data copied by any one of the jobs is for only one of the partitions. Each script copies about 1.054 billion rows of data in chunks of about 8.5 million rows in cluster index order.
What I don't understand is why the scripts run at different rates. The scripts that populate the top 3 partitions run at a sustained rate of about 50K rows per second with a total run time of around 6 hours. The scripts that populate the first two partitions run at a sustained rate of 20K rows per second with a total run time of around 14 hours.
Why doesn't each script run at about the same rate.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004558 More actions · Answer 1

You have to do this wholesale copy job on a regular basis??? :blink:

If so, consider putting that table in a database of it's own and use synonyms to point to it for your "normal" stuff. Then, for the "copy job", you could just restore the database, which should take a lot less effort and time. If you're really clever, you'd restore the database to alternating databases so that you could keep the old copy online while you're restoring the new copy. Once the new copy is done restoring, have a proc that repoints the synonyms. Total "downtime" should be sub-second that way.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Tom John-342103 SSCrazy Points: 2694 More actions · Answer 2

No. It is a one time data conversion process. The BIGINT columns in the target table are INT columns in the source table.

Eric M Russell SSC Guru Points: 125589 More actions · Answer 3

Aside from comparing the execution plan generated by the two scripts, the following can be used to return usage stats (pages inserted/update, last update time, row counts, etc.) at the object / parition level. This will confirm for you what parititions are getting touched.

declare @object_name varchar(180) = '%', @schema_name varchar(180) = '%';

set transaction isolation level read uncommitted;

select * from

(

select db_name() as db_name,

SCHEMA_NAME(o.schema_id) as schema_name,

o.nameAS object_name,

isnull(i.name,'HEAP') AS index_name,

i.type_descAS index_type,

p.partition_number,

p.data_compression_desc,

sum(p.rows) as row_count,

cast(((sum(ps.in_row_used_page_count + ps.row_overflow_used_page_count + ps.lob_reserved_page_count)) * 8192.0)

/ (1024 * 1024 * 1024) as numeric(12,1)) as size_gb,

cast((sum(ps.in_row_used_page_count + ps.row_overflow_used_page_count + ps.lob_reserved_page_count)

/ ((select sum(size) from sys.database_files where type_desc = 'ROWS') * 1.0))*100.0 as numeric(9,1)) pct_db_size,

min(o.create_date) create_date,

sum(su.user_seeks + su.user_scans + su.user_lookups + su.system_seeks + su.system_scans + su.system_lookups) read_count,

max(su.last_user_seek)last_user_seek,

sum(su.user_updates + su.system_updates) update_count,

max(su.last_user_update)last_user_update

from sys.dm_db_partition_stats ps

join sys.partitions p

on ps.partition_id = p.partition_id

join sys.objects o

on o.object_id = p.object_id

and o.is_ms_shipped = 0

and o.name like @object_name

and schema_name(o.schema_id) like @schema_name

join sys.indexes i

on p.index_id = i.index_id

and p.object_id = i.object_id

left join sys.dm_db_index_usage_stats su on su.object_id = i.object_id and su.index_id = i.index_id

group by

SCHEMA_NAME(o.schema_id),

o.name,

i.name,

i.type_desc,

p.partition_number,

p.data_compression_desc

) x

order by

db_name, schema_name, object_name, index_name;

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

sparky-407434 Right there with Babe Points: 731 More actions · Answer 4

Are you using switching?

http://www.cathrinewilhelmsen.net/2015/04/19/table-partitioning-in-sql-server-partition-switching/

https://www.mssqltips.com/sqlservertip/1406/switching-data-in-and-out-of-a-sql-server-2005-data-partition/

Tom John-342103 SSCrazy Points: 2694 More actions · Answer 5

I had thought about switching. I guess that will be my project for the weekend - load each partition as a separate table and then bring them all together into a partitioned table.

Table to table copy performance mystery - 5 billions rows.

Cookies on SQLServerCentral