June 5, 2021 at 7:39 am
Hi All,
I have a huge table of DataSpace 200 GB data and 450 GB of Index Space, Compression type is ColumnStore.
Currently, I am using a power shell script with SqlbulkCopy Object to copy the data with a batch size of 10000.
As the destination table size grows the copy is becoming slower and the current script is taking almost 3 hours to copy approx. 6000000 rows.
Could you please suggest better options if any?
Thank you.
June 5, 2021 at 9:30 am
more details required - DDL for source table and destination table please - including ALL indexes (and triggers on destination if applicable)
And please do post the powershell script - at least the part that is doing the retrieval and bulkinsert
regarding destination - are you dropping indexes it has and recreating or leaving them as is?
Is destination table being truncated every time or is it just adding to it?
and is destination DB in full, bulk or simple recovery mode?
and... 10000 - way way too low normally - and if destination table is also a columnstore you wanna make it AT LEAST 102400 rows.
June 5, 2021 at 9:59 am
Hi Frederico,
Thank you for your response.
Source Table has 5 Indexes,
> Two Non Clustered on Date field
> One Column store Index
> Two Non Clustered indexes for Unique key constraints (One of them is for Identity column).
Destination Table has only 3 Indexes,
> One Column store Index
> Two Non Clustered indexes for Unique key constraints (One of them is for Identity column).
No Triggers on both tables.
I am appending the rows to destination table.
Destination DB Recovery model is "Simple".
The Part of the script is here
$DestinationConnection = new-object System.Data.sqlclient.sqlconnection ("Connection details")
$DestinationConnection.Open()
$sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, sqlbulkcopyOptions::KeepIndentity)
$sqlBulkCopy.DestinationTableName=$TargetTableName
$sqlBulkCopy.BulkCopyTimeout=0
$sqlBulkCopy.BatchSize=10000
while($dataReader.Read() looping through a set of dates)
{
$sqlcommand= new-object System.data.sqlclient.sqlcommand
$sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"
$reader=$sqlCommand.ExecuteReader()
$sqlbulkcopy.writetoserver($reader)
......
}
June 5, 2021 at 11:04 am
Destination Table has only 3 Indexes,
One Column store Index
is this a clustered or non clustered columnstore?
and how many rows on that table - you did mention that source table had 6.000.000 rows.
And how many rows on each date block on source table? depending on this it may be better to load each block of dates onto a staging table before loading onto final table with a straight insert into (as it will be done in parallel)
I have 1 or 2 suggestions to make to that powershell that are likely to improve things but would like to know the above details first.
and can you tell us the spec of the destination server - cpu's, memory (both on server itself and on those allocated to the SQL instance, as well as what is setting for maxdop
June 5, 2021 at 11:37 am
Column store index is clustered on both source and destination.
The source table has close to 600.000.000 rows and the Destination table has approx 160.000.000 rows.
Each Date has 6.000.000 rows approx in the source table.
From SQL Server Instance: The destination server has 2 processors and 16383 MB Memory and MaxDOP is 0 (Zero).
From Server (box), I don't have specs at the moment.
Thanks
June 5, 2021 at 2:05 pm
see below.
cases below based on the code you supplied - will need to be adjusted
see all comments below.
As your destination server is rather week using the maximum rowstore group below may be too much - so do try with values between 102,400 and 1,048,576 to see which ones behave better - increments of 50 or 100k rows
But do note that one of your major contraints maybe the fact that with only 2 processors on the server columnstore processing may be constrained - and I do hope that the script is not running on this server (destination)
# define bulkcopy options
# UseInternalTransaction - ensure that each batch is committed instead of a single one at the end -
# should be used with some kind of check to prevent duplicated data being reloaded if it fails
# half way through processing a date block
[System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction
# case 1 - insert all in 1 go - 6 Million rows isn't that much and equates to 6 columnstore rowgroups -- a rowstore group comprises a maximum of 1,048,576 rows
# with UseInternalTransaction option turned on
$sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, $bulkoptions)
$sqlBulkCopy.DestinationTableName=$TargetTableName
$sqlBulkCopy.BulkCopyTimeout=0
$sqlBulkCopy.BatchSize=0
while($dataReader.Read() looping through a set of dates)
{
$sqlcommand= new-object System.data.sqlclient.sqlcommand
$sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"
$reader=$sqlCommand.ExecuteReader()
$sqlbulkcopy.writetoserver($reader)
......
}
# case 2 - insert rowgroup size ( 1,048,576 rows)
$sqlBulkCopy=new-object System.Data.sqlclient.sqlbulkcopy($DestinationConnection, $bulkoptions)
$sqlBulkCopy.DestinationTableName=$TargetTableName
$sqlBulkCopy.BulkCopyTimeout=0
$sqlBulkCopy.BatchSize=1048576
while($dataReader.Read() looping through a set of dates)
{
$sqlcommand= new-object System.data.sqlclient.sqlcommand
$sqlcommand.CommandText = "SELECT * FROM TABLE with (TABLOCK) WHERE DATE = 'Datefromloop'"
$reader=$sqlCommand.ExecuteReader()
$sqlbulkcopy.writetoserver($reader)
......
}
# case 3 and 4
# same as case 1 and 2 but don't use internal transaction
[System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity
... repeat code from case 1 and 2
# case 5 and 6
# same code from case 1 and 2 but enable streaming - in a 50k row transfer it gives 20% better performance - mileage may vary
[System.Data.SqlClient.SqlBulkCopyOptions]$bulkoptions = [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIndentity -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction
and add following option
$sqlBulkCopy.EnableStreaming=$true
June 6, 2021 at 1:17 am
Copying of columnstore by row-based batches makes very little sense, if any.
It's Columnstore, so it has to be copied in the way which matches its definition - by column.
No matter how many rows you're copying - the columnstore index needs to be rebuild a whole.
You may wish to kill the columnstore index on the target table, copy the data chunk by chunk, and in the end recreate the columnstore index at once.
_____________
Code for TallyGenerator
June 6, 2021 at 5:09 pm
Hi Frederico,
I have tried # case 2 - insert rowgroup size ( 1,048,576 rows). I could see a drastic change in copy speed. approx 3.000.000 records copied in 15 min. previously took almost 90 min.
I will verify the rest of the options, Thank you for all your support.
Regards,
Chittam
June 6, 2021 at 5:14 pm
Hi Sergiy,
I have tried the script with a batch size of 4.000.000 records, writing to a staging table without any indexes, but the batch speed is slower than writing to the destination table., could not understand the reason.
Thank you for your thoughts.
Chittam
June 12, 2021 at 6:25 am
This was removed by the editor as SPAM
June 14, 2021 at 4:45 am
This was removed by the editor as SPAM
July 1, 2021 at 8:24 am
This was removed by the editor as SPAM
July 1, 2021 at 7:22 pm
We manage SQL Server data in a growing environment where our One of the most popular archiving techniques with data that includes date and can be scaled early as separate feeds from different tables or databases.
Ok... so how do you do that? 😉 Or is this a preamble to some product spam?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 2, 2021 at 10:48 pm
Madrid14278 wrote:We manage SQL Server data in a growing environment where our One of the most popular archiving techniques with data that includes date and can be scaled early as separate feeds from different tables or databases.
Ok... so how do you do that? 😉 Or is this a preamble to some product spam?
spam as expected
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply