Table to table copy performance mystery - 5 billions rows.

  • I have a cross reference table as follows:

    [SecDetailDirKey] [bigint] NOT NULL,

    [ImpLogDirKey] [int] NOT NULL,

    [AggrTranID] [bigint] NOT NULL,

    [AggrTranIDSource] [varchar](30) NOT NULL,

    [Disabled] [bit] NOT NULL

    The cluster index is: [AggrTranID] ASC,[ImpLogDirKey] ASC,[SecDetailDirKey] ASC

    The table is partitioned into 5 equal sized (by row count) partitions on the AggrTranID column.

    I have five copy scripts that I run in SSMS that copies the data from another table and inserts it into this table. The copy jobs are aligned on partitions meaning that the data copied by any one of the jobs is for only one of the partitions. Each script copies about 1.054 billion rows of data in chunks of about 8.5 million rows in cluster index order.

    What I don't understand is why the scripts run at different rates. The scripts that populate the top 3 partitions run at a sustained rate of about 50K rows per second with a total run time of around 6 hours. The scripts that populate the first two partitions run at a sustained rate of 20K rows per second with a total run time of around 14 hours.

    Why doesn't each script run at about the same rate.

  • You have to do this wholesale copy job on a regular basis??? :blink:

    If so, consider putting that table in a database of it's own and use synonyms to point to it for your "normal" stuff. Then, for the "copy job", you could just restore the database, which should take a lot less effort and time. If you're really clever, you'd restore the database to alternating databases so that you could keep the old copy online while you're restoring the new copy. Once the new copy is done restoring, have a proc that repoints the synonyms. Total "downtime" should be sub-second that way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No. It is a one time data conversion process. The BIGINT columns in the target table are INT columns in the source table.

  • Aside from comparing the execution plan generated by the two scripts, the following can be used to return usage stats (pages inserted/update, last update time, row counts, etc.) at the object / parition level. This will confirm for you what parititions are getting touched.

    declare @object_name varchar(180) = '%', @schema_name varchar(180) = '%';

    set transaction isolation level read uncommitted;

    select * from

    (

    select db_name() as db_name,

    SCHEMA_NAME(o.schema_id) as schema_name,

    o.nameAS object_name,

    isnull(i.name,'HEAP') AS index_name,

    i.type_descAS index_type,

    p.partition_number,

    p.data_compression_desc,

    sum(p.rows) as row_count,

    cast(((sum(ps.in_row_used_page_count + ps.row_overflow_used_page_count + ps.lob_reserved_page_count)) * 8192.0)

    / (1024 * 1024 * 1024) as numeric(12,1)) as size_gb,

    cast((sum(ps.in_row_used_page_count + ps.row_overflow_used_page_count + ps.lob_reserved_page_count)

    / ((select sum(size) from sys.database_files where type_desc = 'ROWS') * 1.0))*100.0 as numeric(9,1)) pct_db_size,

    min(o.create_date) create_date,

    sum(su.user_seeks + su.user_scans + su.user_lookups + su.system_seeks + su.system_scans + su.system_lookups) read_count,

    max(su.last_user_seek)last_user_seek,

    sum(su.user_updates + su.system_updates) update_count,

    max(su.last_user_update)last_user_update

    from sys.dm_db_partition_stats ps

    join sys.partitions p

    on ps.partition_id = p.partition_id

    join sys.objects o

    on o.object_id = p.object_id

    and o.is_ms_shipped = 0

    and o.name like @object_name

    and schema_name(o.schema_id) like @schema_name

    join sys.indexes i

    on p.index_id = i.index_id

    and p.object_id = i.object_id

    left join sys.dm_db_index_usage_stats su on su.object_id = i.object_id and su.index_id = i.index_id

    group by

    SCHEMA_NAME(o.schema_id),

    o.name,

    i.name,

    i.type_desc,

    p.partition_number,

    p.data_compression_desc

    ) x

    order by

    db_name, schema_name, object_name, index_name;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I had thought about switching. I guess that will be my project for the weekend - load each partition as a separate table and then bring them all together into a partitioned table.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply