Huge Table archive from One server to Another server

  • Hi All,

    I have a huge table of DataSpace 200 GB data and 450 GB of Index Space, Compression type is ColumnStore.

    Currently, I am using a power shell script with SqlbulkCopy Object to copy the data with a batch size of 10000.

    As the destination table size grows the copy is becoming slower and the current script is taking almost 3 hours to copy approx. 6000000 rows.

    Could you please suggest better options if any?

    Thank you.

     

     

     

     

  • more details required - DDL for source table and destination table please - including ALL indexes (and triggers on destination if applicable)

    And please do post the powershell script - at least the part that is doing the retrieval and bulkinsert

    regarding destination - are you dropping indexes it has and recreating or leaving them as is?

    Is destination table being truncated every time or is it just adding to it?

    and is destination DB in full, bulk or simple recovery mode?

    and... 10000 - way way too low normally - and if destination table is also a columnstore you wanna make it AT LEAST 102400 rows.

     

  • Hi Frederico,

    Thank you for your response.

    Source Table has 5 Indexes,

    > Two Non Clustered on Date field

    > One Column store Index

    > Two Non Clustered indexes for Unique key constraints (One of them is for Identity column).

    Destination Table has only 3 Indexes,

    > One Column store Index

    > Two Non Clustered indexes for Unique key constraints (One of them is for Identity column).

    No Triggers on both tables.

    I am appending the rows to destination table.

    Destination DB Recovery model is "Simple".

    The Part of the script is here

    $DestinationConnection = new-object System.Data.sqlclient.sqlconnection ("Connection details")

    $DestinationConnection.Open()

    $sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, sqlbulkcopyOptions::KeepIndentity)

    $sqlBulkCopy.DestinationTableName=$TargetTableName

    $sqlBulkCopy.BulkCopyTimeout=0

    $sqlBulkCopy.BatchSize=10000

    while($dataReader.Read() looping through a set of dates)

    {

    $sqlcommand= new-object System.data.sqlclient.sqlcommand

    $sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"

    $reader=$sqlCommand.ExecuteReader()

    $sqlbulkcopy.writetoserver($reader)

    ......

    }

     

  • Destination Table has only 3 Indexes,

    One Column store Index

    is this a clustered or non clustered columnstore?

    and how many rows on that table - you did mention that source table had 6.000.000 rows.

    And how many rows on each date block on source table? depending on this it may be better to load each block of dates onto a staging table before loading onto final table with a straight insert into (as it will be done in parallel)

    I have 1 or 2 suggestions to make to that powershell that are likely to improve things but would like to know the above details first.

    and can you tell us the spec of the destination server - cpu's, memory (both on server itself and on those allocated to the SQL instance, as well as what is setting for maxdop

     

  • Column store index is clustered on both source and destination.

    The source table has close to 600.000.000 rows and the Destination table has approx 160.000.000 rows.

    Each Date has 6.000.000 rows approx in the source table.

    From SQL Server Instance: The destination server has 2 processors and 16383 MB Memory and MaxDOP is 0 (Zero).

    From Server (box), I don't have specs at the moment.

    Thanks

     

  • see below.

    cases below based on the code you supplied - will need to be adjusted 

    see all comments below.

    As your destination server is rather week using the maximum rowstore group below may be too much - so do try with values between 102,400 and 1,048,576 to see which ones behave better - increments of 50 or 100k rows

    But do note that one of your major contraints maybe the fact that with only 2 processors on the server columnstore processing may be constrained - and I do hope that the script is not running on this server (destination)


    # define bulkcopy options
    # UseInternalTransaction - ensure that each batch is committed instead of a single one at the end -
    # should be used with some kind of check to prevent duplicated data being reloaded if it fails
    # half way through processing a date block

    [System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction

    # case 1 - insert all in 1 go - 6 Million rows isn't that much and equates to 6 columnstore rowgroups -- a rowstore group comprises a maximum of 1,048,576 rows
    # with UseInternalTransaction option turned on

    $sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, $bulkoptions)
    $sqlBulkCopy.DestinationTableName=$TargetTableName
    $sqlBulkCopy.BulkCopyTimeout=0
    $sqlBulkCopy.BatchSize=0
    while($dataReader.Read() looping through a set of dates)
    {
    $sqlcommand= new-object System.data.sqlclient.sqlcommand
    $sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"
    $reader=$sqlCommand.ExecuteReader()
    $sqlbulkcopy.writetoserver($reader)
    ......
    }

    # case 2 - insert rowgroup size ( 1,048,576 rows)

    $sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, $bulkoptions)
    $sqlBulkCopy.DestinationTableName=$TargetTableName
    $sqlBulkCopy.BulkCopyTimeout=0
    $sqlBulkCopy.BatchSize=1048576
    while($dataReader.Read() looping through a set of dates)
    {
    $sqlcommand= new-object System.data.sqlclient.sqlcommand
    $sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"
    $reader=$sqlCommand.ExecuteReader()
    $sqlbulkcopy.writetoserver($reader)
    ......
    }

    # case 3 and 4
    # same as case 1 and 2 but don't use internal transaction
    [System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity

    ... repeat code from case 1 and 2


    # case 5 and 6
    # same code from case 1 and 2 but enable streaming - in a 50k row transfer it gives 20% better performance - mileage may vary
    [System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction

    and add following option

    $sqlBulkCopy.EnableStreaming=$true

  • Copying of columnstore by row-based batches makes very little sense, if any.

    It's Columnstore, so it has to be copied in the way which matches its definition - by column.

    No matter how many rows you're copying - the columnstore index needs to be rebuild a whole.

    You may wish to kill the columnstore index on the target table, copy the data chunk by chunk, and in the end recreate the columnstore index at once.

     

    _____________
    Code for TallyGenerator

  • Hi Frederico,

    I have tried # case 2 - insert rowgroup size ( 1,048,576 rows). I could see a drastic change in copy speed.  approx 3.000.000 records copied in 15 min. previously took almost 90 min.

    I will verify the rest of the options, Thank you for all your support.

    Regards,

    Chittam

  • Hi Sergiy,

    I have tried the script with a batch size of 4.000.000 records, writing to a staging table without any indexes, but the batch speed is slower than writing to the destination table., could not understand the reason.

    Thank you for your thoughts.

    Chittam

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • Madrid14278 wrote:

    We manage SQL Server data in a growing environment where our One of the most popular archiving techniques with data that includes date and can be scaled early as separate feeds from different tables or databases.

    Ok... so how do you do that? 😉  Or is this a preamble to some product spam?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Madrid14278 wrote:

    We manage SQL Server data in a growing environment where our One of the most popular archiving techniques with data that includes date and can be scaled early as separate feeds from different tables or databases.

    Ok... so how do you do that? 😉  Or is this a preamble to some product spam?

    spam as expected

     

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply