Huge Table archive from One server to Another server

Question

Huge Table archive from One server to Another server

Chittam

SSC Veteran

Points: 247
More actions
June 5, 2021 at 7:39 am

#3892283

Hi All,
I have a huge table of DataSpace 200 GB data and 450 GB of Index Space, Compression type is ColumnStore.
Currently, I am using a power shell script with SqlbulkCopy Object to copy the data with a batch size of 10000.
As the destination table size grows the copy is becoming slower and the current script is taking almost 3 hours to copy approx. 6000000 rows.
Could you please suggest better options if any?
Thank you.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16309 More actions · Answer 1

more details required - DDL for source table and destination table please - including ALL indexes (and triggers on destination if applicable)

And please do post the powershell script - at least the part that is doing the retrieval and bulkinsert

regarding destination - are you dropping indexes it has and recreating or leaving them as is?

Is destination table being truncated every time or is it just adding to it?

and is destination DB in full, bulk or simple recovery mode?

and... 10000 - way way too low normally - and if destination table is also a columnstore you wanna make it AT LEAST 102400 rows.

Chittam SSC Veteran Points: 247 More actions · Answer 2

Hi Frederico,

Thank you for your response.

Source Table has 5 Indexes,

> Two Non Clustered on Date field
> One Column store Index
> Two Non Clustered indexes for Unique key constraints (One of them is for Identity column).
Destination Table has only 3 Indexes,
> One Column store Index
> Two Non Clustered indexes for Unique key constraints (One of them is for Identity column).
No Triggers on both tables.
I am appending the rows to destination table.
Destination DB Recovery model is "Simple".
The Part of the script is here
$DestinationConnection = new-object System.Data.sqlclient.sqlconnection ("Connection details")
$DestinationConnection.Open()
$sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, sqlbulkcopyOptions::KeepIndentity)
$sqlBulkCopy.DestinationTableName=$TargetTableName
$sqlBulkCopy.BulkCopyTimeout=0
$sqlBulkCopy.BatchSize=10000
while($dataReader.Read() looping through a set of dates)
{
$sqlcommand= new-object System.data.sqlclient.sqlcommand
$sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"
$reader=$sqlCommand.ExecuteReader()
$sqlbulkcopy.writetoserver($reader)
......
}

frederico_fonseca SSCoach Points: 16309 More actions · Answer 3

Destination Table has only 3 Indexes,

One Column store Index

is this a clustered or non clustered columnstore?

and how many rows on that table - you did mention that source table had 6.000.000 rows.

And how many rows on each date block on source table? depending on this it may be better to load each block of dates onto a staging table before loading onto final table with a straight insert into (as it will be done in parallel)

I have 1 or 2 suggestions to make to that powershell that are likely to improve things but would like to know the above details first.

and can you tell us the spec of the destination server - cpu's, memory (both on server itself and on those allocated to the SQL instance, as well as what is setting for maxdop

This reply was modified 4 years, 9 months ago by frederico_fonseca.

Chittam SSC Veteran Points: 247 More actions · Answer 4

Column store index is clustered on both source and destination.

The source table has close to 600.000.000 rows and the Destination table has approx 160.000.000 rows.

Each Date has 6.000.000 rows approx in the source table.

From SQL Server Instance: The destination server has 2 processors and 16383 MB Memory and MaxDOP is 0 (Zero).

From Server (box), I don't have specs at the moment.

Thanks

frederico_fonseca SSCoach Points: 16309 More actions · Answer 5

see below.

cases below based on the code you supplied - will need to be adjusted 

see all comments below.

As your destination server is rather week using the maximum rowstore group below may be too much - so do try with values between 102,400 and 1,048,576 to see which ones behave better - increments of 50 or 100k rows

But do note that one of your major contraints maybe the fact that with only 2 processors on the server columnstore processing may be constrained - and I do hope that the script is not running on this server (destination)


# define bulkcopy options
# UseInternalTransaction - ensure that each batch is committed instead of a single one at the end - 
# should be used with some kind of check to prevent duplicated data being reloaded if it fails
# half way through processing a date block

[System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction

# case 1 - insert all in 1 go - 6 Million rows isn't that much and equates to 6 columnstore rowgroups -- a rowstore group comprises a maximum of 1,048,576 rows
# with UseInternalTransaction option turned on

$sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, $bulkoptions)
$sqlBulkCopy.DestinationTableName=$TargetTableName
$sqlBulkCopy.BulkCopyTimeout=0
$sqlBulkCopy.BatchSize=0
while($dataReader.Read() looping through a set of dates)
{
   $sqlcommand= new-object System.data.sqlclient.sqlcommand
   $sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"
   $reader=$sqlCommand.ExecuteReader()
   $sqlbulkcopy.writetoserver($reader)
   ......
}

# case 2 - insert rowgroup size ( 1,048,576 rows)

$sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, $bulkoptions)
$sqlBulkCopy.DestinationTableName=$TargetTableName
$sqlBulkCopy.BulkCopyTimeout=0
$sqlBulkCopy.BatchSize=1048576
while($dataReader.Read() looping through a set of dates)
{
   $sqlcommand= new-object System.data.sqlclient.sqlcommand
   $sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"
   $reader=$sqlCommand.ExecuteReader()
   $sqlbulkcopy.writetoserver($reader)
   ......
}

# case 3 and 4
# same as case 1 and 2 but don't use internal transaction
[System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity

... repeat code from case 1 and 2


# case 5 and 6
# same code from case 1 and 2 but enable streaming - in a 50k row transfer it gives 20% better performance - mileage may vary
[System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction

and add following option 

$sqlBulkCopy.EnableStreaming=$true

Sergiy SSC Guru Points: 110209 More actions · Answer 6

Copying of columnstore by row-based batches makes very little sense, if any.

It's Columnstore, so it has to be copied in the way which matches its definition - by column.

No matter how many rows you're copying - the columnstore index needs to be rebuild a whole.

You may wish to kill the columnstore index on the target table, copy the data chunk by chunk, and in the end recreate the columnstore index at once.

_____________
Code for TallyGenerator

Chittam SSC Veteran Points: 247 More actions · Answer 7

Hi Frederico,

I have tried # case 2 - insert rowgroup size ( 1,048,576 rows). I could see a drastic change in copy speed. approx 3.000.000 records copied in 15 min. previously took almost 90 min.

I will verify the rest of the options, Thank you for all your support.

Regards,

Chittam

Chittam SSC Veteran Points: 247 More actions · Answer 8

Hi Sergiy,

I have tried the script with a batch size of 4.000.000 records, writing to a staging table without any indexes, but the batch speed is slower than writing to the destination table., could not understand the reason.

Thank you for your thoughts.

Chittam

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

wrote:

We manage SQL Server data in a growing environment where our One of the most popular archiving techniques with data that includes date and can be scaled early as separate feeds from different tables or databases.

Ok... so how do you do that? 😉 Or is this a preamble to some product spam?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

frederico_fonseca SSCoach Points: 16309 More actions · Answer 10

Jeff Moden wrote:

wrote:
We manage SQL Server data in a growing environment where our One of the most popular archiving techniques with data that includes date and can be scaled early as separate feeds from different tables or databases.
Ok... so how do you do that? 😉 Or is this a preamble to some product spam?

spam as expected