Alter Huge Table

  • Hello,

    I need to customize a table with 125,000,000 rows and a size of 900GB.

    One column needs to be changed from Varchar(4000) to Varchar(max).

    My plan was to copy the table with the DBA Tools. And then rename it and swap the tables.

    I have tested it with the DBA tools

    Copy-DbaDbTableData -SqlInstance S*** -Database DB -Table TB_1 -DestinationTable TB_2

    However, the copy only includes the primary index and not the two additional ones. The creation of the two missing indexes would take a long time.

    I also thought about adding a column with VarChar(max) to the table and copying the data inside the table.

    Is there a better way to customize the table?

    Thank you and best regards

    TOM

     

  • SQLServer will have to rewrite the whole table as internal workings of varchar(n) versus varchar(max) !

    1. You could also prepare the modification via SSMS and script the operation.

    It will probably create an intermediate table ( new defs ) and copy the data and only afterward add the clustered index and all non-clustered indexes.

    You could modify the script so it first adds the clustered index and only after that copies the data.

    Big advantage of this is that SSMS includes all the needed stuff to be able to handle the full operation and keep all other definitions of the object.

    2. Another option is to Add a new column (varchar(max)), copy the data of the old column to the new column, drop the old column, rename the new column.

    This operation will also need vast IO and write capacity, but if the column is nullable without a default bound to it, you avoid an extra table scan.

    3. While you're at it, maybe implement compression for the object

    Prepare your options !

    Given the size of the table, you need to prepare for downtime , data space and log space!

     

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hello Johan,

    Thanks for your help.

    I tried again to create a new column with VARCHAR(max) and to copy the data from the old column to the new column via a loop.

    Unfortunately the copy process takes a long time:

    For 1.000.000 rows the server needs 190 seconds.

    For 8,050,000 rows the server needs 2940 (49min) seconds [2738 Row/Sec].

    With 124.000.000 rows this would be 12,5 h

    If I copy the whole table with 50.000 BatchSize via DBA tools I come to

    RowsCopied : 124443368

    Elapsed : 05:26:25

    Fortunately, the maintenance window is very large.

    However, I wanted to have as short a time as possible because the product still needs to test the application afterwards.

    I will therefore test again to copy the table completely.

    But this time create the indexes before and test the copy with the DBATools again.

    Enclosed the used query which I had found in the net

    DBCC DROPCLEANBUFFERS 

    SET STATISTICS IO ON

    DECLARE @id_control INT
    DECLARE @batchSize INT
    DECLARE @results INT

    SET @results = 1
    SET @batchSize = 50000
    SET @id_control = 0

    WHILE (@results > 0)
    BEGIN
    -- put your custom code here
    UPDATE TOP (@BatchSize) tab
    SET tab.ORG_XML_new = tab.ORG_XML
    FROM TB_ORG_XML tab
    WHERE tab.ORG_XML_new IS NULL

    -- very important to obtain the latest rowcount to avoid infinite loops
    SET @results = @@ROWCOUNT
    print @id_control
    PRINT CURRENT_TIMESTAMP
    -- next batch
    SET @id_control = @id_control + @batchSize
    END

    --ALTER TABLE [dbo].[TB_ORG_XML0]
    --DROP COLUMN [ORG_XML_new];


    --ALTER TABLE [dbo].[TB_ORG_XML]
    --ADD [ORG_XML_new] [varchar](MAX) NULL



     

    Best Regards

    BAM 🙂

  • can you put the full DDL for the table - including indexes and data_compression clause. (you may need to change your ssms settings for generationg scripts for all these to be generated - dbatools also has a option to output them all)

    if table does not have data_compression = page do you know why that is the case?

    doing a straight insert into newtable with (tablock) select from oldtable may be an option.

    doing a bit of SSIS/powershell you can also set it up so that you do parallel inserts onto the new table which should make faster as well.

    and can you advise us of what is the server spec (cores/ram/disk type (ssd/san)

    1. Of course, updating the existing table will cause splits / fragmentation
    2. My guess, your primary key is non-clustered ( dbatools scenario ), so insert into a heap ( quickest ). Keep in mind It may need to add a clustered index ( rewrite the whole table ) and add the extra indexes afterward.

      You need to account for that time too, I fear.

    3. It all depends on the tables usage pattern !

      If insert only, prepare a new table (exactly as the current one) and copy over the data left-joining + target keycol is null (/ not-exists ) the primary key columns ( or clustered key if that is unique )

      Then, right before you switch table names, perform a last copy in a new transaction (with tablelock on both tables ) and rename both tables.

    4. I/O subsystem  will also be challenged

    Test your strategies and determine the window of action(s) and downtime needed.

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hello frederico, hello Johan

    thank you very much for your help 🙂

     

    if table does not have data_compression = page do you know why that is the case?

    -> No data compression is set on the complete database so far.

    Unfortunately, I was not present during the construction of the instance and do not know why the option was not set.

    Would a compression in this case speed up the process ?

     

    Server Specs are :

    Xeon Platinum 8176 CPU @ 2.10 GHZ

    14 Sockets 14 VCpu

    64 RAM

    Running in Vmware

     

    I Scripted the Table with Constraints,Data Compression,Foreign Keys,Fullt-Text Indexes,Indexes,Primay Key,Triggers,Uniqe Keys

    The ORG_XML_ID has Values like (MIN) -23038806 to (MAX) 128412292

    sorry, it was stupid of me that I have not immediately provided the table to you.

    /****** Object:  Table [dbo].[TB_ORG_XML_NEW]    ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[TB_ORG_XML_NEW](
    [ORG_XML_ID] [int] NOT NULL,
    [PLA_B_ID] [int] NOT NULL,
    [ORDERING] [int] NOT NULL,
    [ORG_XML] [varchar](4000) NULL,
    [CHANGED_AT] [varchar](20) NULL,
    [CHANGED_BY] [varchar](50) NULL,
    [CREATED_AT] [varchar](20) NULL,
    [CREATED_WHO] [varchar](50) NULL,
    [DELETED] [char](1) NOT NULL,
    [ORG_XML_new] [varchar](max) NULL,
    CONSTRAINT [PK_TB_ORG_XML_NEW] PRIMARY KEY CLUSTERED
    (
    [ORG_XML_ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO
    SET ANSI_PADDING ON
    GO
    /****** Object: Index [_dta_index_TB_ORG_XML_NEW_13_1138103095__K2_K9] ******/
    CREATE NONCLUSTERED INDEX [_dta_index_TB_ORG_XML_NEW_13_1138103095__K2_K9] ON [dbo].[TB_ORG_XML_NEW]
    (
    [PLA_B_ID] ASC,
    [DELETED] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    SET ANSI_PADDING ON
    GO
    /****** Object: Index [_dta_index_TB_ORG_XML_NEW_DB_CLEANER] ******/
    CREATE NONCLUSTERED INDEX [_dta_index_TB_ORG_XML_NEW_DB_CLEANER] ON [dbo].[TB_ORG_XML_NEW]
    (
    [CREATED_AT] ASC,
    [DELETED] ASC,
    [PLA_B_ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    ALTER TABLE [dbo].[TB_ORG_XML_NEW] ADD CONSTRAINT [DF_TB_ORG_XML_NEW_DELETED] DEFAULT ('0') FOR [DELETED]
    GO
    /****** Object: Trigger [dbo].[TRIGGER_DELETE_SIC] ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TRIGGER [dbo].[TRIGGER_DELETE_SIC_New]
    ON [dbo].[TB_ORG_XML_NEW]
    AFTER INSERT
    AS
    BEGIN
    DECLARE @PLA_ID as int
    DECLARE @ORDERING as int
    SELECT @PLA_ID=PLA_B_ID, @ORDERING=ORDERING FROM INSERTED
    IF @PLA_ID>0 AND @ORDERING=1
    EXEC DELETE_SIC_PLA @PLA_ID
    END
    GO
    ALTER TABLE [dbo].[TB_ORG_XML_NEW] ENABLE TRIGGER [TRIGGER_DELETE_SIC_New]
    GO
    /****** Object: Statistic [_dta_stat_1138103095_1_3_9_2] ******/
    CREATE STATISTICS [_dta_stat_1138103095_1_3_9_2] ON [dbo].[TB_ORG_XML_NEW]([ORG_XML_ID], [ORDERING], [DELETED], [PLA_B_ID])
    GO
    /****** Object: Statistic [_dta_stat_1138103095_3_2_9] ******/
    CREATE STATISTICS [_dta_stat_1138103095_3_2_9] ON [dbo].[TB_ORG_XML_NEW]([ORDERING], [PLA_B_ID], [DELETED])
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'ORG_XML_ID'
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'PLA_B_ID'
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'ORDERING'
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'ORG_XML'
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'CHANGED_AT'
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'CHANGED_BY'
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'CREATED_AT'
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'CREATED_WHO'
    GO
    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'TB_ORG_XML_NEW', @level2type=N'COLUMN',@level2name=N'DELETED'
    GO
    Copy-DbaDbTableData -SqlInstance SQL -Database E2 -Table TB_ORG_XML -DestinationTable TB_ORG_XML_NEW

    I have a maintenance window of 2 days for the change on which the production stands.

    However, after the change, the users have to test the application and check if everything still works.

    The application maintainers have to go to the production plant for this.

    Therefore, it would be good to know if I can estimate the remaining duration during the changeover.

    Copying the table with the column change works well with the DBA tools in a reasonable time frame. However, I don't know the application that uses the database very well. And I am a little bit worried that I will kill the application by swapping tables

    I actually assumed that the easiest and fastest way would be to copy the data into a new column within the table.

    Thank you and Best Regards

    BAM

     

  • adding page compression is likely to reduce the size of that table to around 90 GB - overall result is that querying the table will require less IO and less memory to process it on day to day queries.

    for the copy from old to new table - again less IO writing the data - and with that server spec the extra cpu required should not negativily impact on the transfer.

    you could give it a go with a sample volume

    create 2 versions of the table - with the org_xml field as varchar(max) - do not have both old and new  - one without compression and one with - DO NOT create the other indexes neither the trigger.

    then do the following twice for each table - get the timings for each execution

    and then with SSMS look at the resulting size for both tables

    truncate table [dbo].[TB_ORG_XML_NEW_COMP]

    insert into [dbo].[TB_ORG_XML_NEW_COMP] with (tablock)

    select top (1000000) * -- 1 million rows -- after doing the first 2 runs increase this to 10 million and run again

    from [dbo].[TB_ORG_XML] -- I assume this is the old table name

    order by [ORG_XML_ID]

    CREATE TABLE [dbo].[TB_ORG_XML_NEW_COMP](
    [ORG_XML_ID] [int] NOT NULL,
    [PLA_B_ID] [int] NOT NULL,
    [ORDERING] [int] NOT NULL,
    [ORG_XML] [varchar](max) NULL,
    [CHANGED_AT] [varchar](20) NULL,
    [CHANGED_BY] [varchar](50) NULL,
    [CREATED_AT] [varchar](20) NULL,
    [CREATED_WHO] [varchar](50) NULL,
    [DELETED] [char](1) NOT NULL,
    CONSTRAINT [PK_TB_ORG_XML_NEW_COMP] PRIMARY KEY CLUSTERED
    (
    [ORG_XML_ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
  • I'll assume that when you refer to "primary index" on the original table you mean "clustering index".  That is critical for copying the data (whereas a (nonclustered) primary key is meaningless for copying the table data).

    Here are the steps (numbered to simplify commenting on them):

    (1) Create the new table structure, including the clustered index only; specify DATA_COMPRESSION = PAGE on the new index if EXEC sys.sp_estimate_data_compression_savings indicates it is worth it to do so (extremely likely).

    (2) EXEC sys.sp_tableoption 'your_table_name_here', 'large value types out of row', 1

    (3A) Copy data from the old table to the new table in batches in clustering key order based on clus key seek and scan.  From a size and performance later standpoint, it's best to COMPRESS the varchar(max) column(s); however, it will dramatically slow down the copying process, so I'd suggest doing that separately later.  You also need trigger(s) or some other mechanism to handle INSERTs and UPDATEs of max column(s) if you COMPRESS it(them).

    (3B) General structure for copy code below.

    (4) Create the nonclustered index(es) on the new table.

    (5) Copy any final rows from the original table to the new table with a TABLOCKX.

    (6) Rename the old table to your_table_original (or whatever) and rename the new table to the original table name.

    --(3B)
    SET ANSI_NULLS ON;
    SET ANSI_PADDING ON;
    SET ANSI_WARNINGS ON;
    SET ARITHABORT ON;
    SET CONCAT_NULL_YIELDS_NULL ON;
    SET NOCOUNT OFF;
    SET NUMERIC_ROUNDABORT OFF;
    SET QUOTED_IDENTIFIER ON;
    SET XACT_ABORT ON;

    DECLARE @batch_copy_count int;
    DECLARE @batch_rows_actually_copied int;
    DECLARE @last_clus_key_already_copied int;
    DECLARE @total_rows_copied int;
    DECLARE @total_rows_to_copy_in_this_run int;

    SET @last_clus_key_already_copied = -1 /*start at beginning of data*/SET @batch_copy_count = 10000; /* copy this number of rows with each SELECT */SET @total_rows_to_copy_in_this_run = 100000; /* copy this number of rows total before stopping execution */SET @batch_rows_actually_copied = 0;
    SET @total_rows_copied = 0;

    PRINT 'Start time = ' + CONVERT(varchar(30), GETDATE(), 0) + '.';

    WHILE 1 = 1
    BEGIN
    INSERT INTO dbo.new_table WITH (TABLOCK)
    SELECT *
    FROM (
    SELECT TOP (@batch_copy_count) *
    FROM dbo.your_table
    WHERE clus_key > @last_clus_key_already_copied
    ORDER BY clus_key
    ) AS batch
    SET @batch_rows_actually_copied = @@ROWCOUNT;
    IF @batch_rows_actually_copied = 0
    BREAK;
    SET @total_rows_copied = @total_rows_copied + @batch_rows_actually_copied;
    PRINT 'Total Rows Copied = ' + CAST(@total_rows_copied AS varchar(10));
    IF @total_rows_copied >= @total_rows_to_copy_in_this_run
    BREAK;
    IF @total_rows_copied % 100000 = 0
    BEGIN
    CHECKPOINT;
    END; --IF
    IF @total_rows_to_copy_in_this_run <= 1000000
    WAITFOR DELAY '00:00:00.006';
    ELSE
    IF @total_rows_to_copy_in_this_run <= 2000000
    WAITFOR DELAY '00:00:00.012';
    ELSE
    WAITFOR DELAY '00:00:00.024';
    END --WHILE

    CHECKPOINT;

    SELECT 'Last clus key value copied = ' + CAST(MAX(clus_key) AS varchar(10)) + '.'
    FROM dbo.new_table

    PRINT '';
    PRINT 'End time = ' + CONVERT(varchar(30), GETDATE(), 0) + '.';

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • If the database is FULL recovery, would it help to run t-log backups frequently during the process, to prevent a huge t-log ?

  • You could also consider using a brand new filegroup, with at least two files, for the new/replacement table, which will reduce any interference with other filegroups.  Then again, you're stuck with an additional filegroup, which you may not want.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • homebrew01 wrote:

    If the database is FULL recovery, would it help to run t-log backups frequently during the process, to prevent a huge t-log ?

    If copying to a new table - then switching to bulk-logged recovery model and following the steps for a minimally logged operation could help with performance.  Something to test at least...

    A bit off topic - but something that needs to be addressed:

    CREATE TRIGGER [dbo].[TRIGGER_DELETE_SIC_New]
    ON [dbo].[TB_ORG_XML_NEW]
    AFTER INSERT
    AS
    BEGIN
    DECLARE @PLA_ID as int
    DECLARE @ORDERING as int
    SELECT @PLA_ID=PLA_B_ID, @ORDERING=ORDERING FROM INSERTED
    IF @PLA_ID>0 AND @ORDERING=1
    EXEC DELETE_SIC_PLA @PLA_ID
    END
    GO
    ALTER TABLE [dbo].[TB_ORG_XML_NEW] ENABLE TRIGGER [TRIGGER_DELETE_SIC_New]
    GO

    This trigger is not written correctly - it expects there to be a single row inserted at any given time.  If the insert statement into this table inserts more than a single row then the stored procedure that is called will be for only one of the rows and you have no way of determining which row SQL Server is going to pick.

    If you do create a new table - then you really don't want to include that trigger on the new table until after the table has been fully populated and ideally you should fix the trigger before moving forward with making any changes.

    I would also recommend renaming your indexes and statistics - using the DTA provided names is rather useless.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • homebrew01 wrote:

    If the database is FULL recovery, would it help to run t-log backups frequently during the process, to prevent a huge t-log ?

    if insert is done in one go  with tablock (or using dbatools) then its minimally logged if it only has the clustered index as I advised - so no need for extra t-logs.

  • frederico_fonseca wrote:

    homebrew01 wrote:

    If the database is FULL recovery, would it help to run t-log backups frequently during the process, to prevent a huge t-log ?

    if insert is done in one go  with tablock (or using dbatools) then its minimally logged if it only has the clustered index as I advised - so no need for extra t-logs.

    No... not if you're in the FULL Recovery Model.  EVERYTHING is fully logged in the FULL recovery model even with TABLOCK.

    To do as you say with Minimal Logging, the database needs to be in the BULK LOGGED Recovery Model (or SIMPLE but that has some nasty ramifications).  If there's an IDENTITY column involved, then be forewarned that there's STILL a bug in SQL Server that will load the entire table into TempDB to sort ALL the rows if you need to use SET IDENTITY INSERT ON, and you would for this case.

    Also, changing from VARCHAR(4000) to VARCHAR(MAX) opens up the opportunity to greatly reduce the size of the Clustered Index, itself IF and ONLY IF the LOBs are forced out of row using sp_tableoption.  There's also the issue of having to default the LOB column to a single space and make the column NOT NULL to be sure.  That will do a huge amount to prevent physical fragmentation due to "Trapped Short Rows" and logical fragmentation when the LOB is updated from a NULL to anything because MS messed up and made the damned lob pointer "ExpAnsive" itself.  The "space" default would cause the LOB pointer to materialize during the initial row creation (INSERT) and that's NOT "ExpAnsive" and would never be.  It would always be there and it's only 16 bytes wide when you do it that way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BAM wrote:

    Hello Johan,

    Thanks for your help.

    I tried again to create a new column with VARCHAR(max) and to copy the data from the old column to the new column via a loop. Unfortunately the copy process takes a long time: For 1.000.000 rows the server needs 190 seconds. For 8,050,000 rows the server needs 2940 (49min) seconds [2738 Row/Sec]. With 124.000.000 rows this would be 12,5 h

    If I copy the whole table with 50.000 BatchSize via DBA tools I come to RowsCopied : 124443368 Elapsed : 05:26:25

    Fortunately, the maintenance window is very large. However, I wanted to have as short a time as possible because the product still needs to test the application afterwards.

    I will therefore test again to copy the table completely. But this time create the indexes before and test the copy with the DBATools again.

    Enclosed the used query which I had found in the net

    DBCC DROPCLEANBUFFERS 

    SET STATISTICS IO ON

    DECLARE @id_control INT
    DECLARE @batchSize INT
    DECLARE @results INT

    SET @results = 1
    SET @batchSize = 50000
    SET @id_control = 0

    WHILE (@results > 0)
    BEGIN
    -- put your custom code here
    UPDATE TOP (@BatchSize) tab
    SET tab.ORG_XML_new = tab.ORG_XML
    FROM TB_ORG_XML tab
    WHERE tab.ORG_XML_new IS NULL

    -- very important to obtain the latest rowcount to avoid infinite loops
    SET @results = @@ROWCOUNT
    print @id_control
    PRINT CURRENT_TIMESTAMP
    -- next batch
    SET @id_control = @id_control + @batchSize
    END

    --ALTER TABLE [dbo].[TB_ORG_XML0]
    --DROP COLUMN [ORG_XML_new];


    --ALTER TABLE [dbo].[TB_ORG_XML]
    --ADD [ORG_XML_new] [varchar](MAX) NULL



    Best Regards

    BAM 🙂

    Using a loop to break things up into sections will cause the every section except the first to be FULLY LOGGED, which takes as much as 3 times longer than minimal logging.

    And I agree with Johan... consider making modifications to take advantage of using COMPRESS, which does a GZIP on the column.  If you're not using stored procedures with the front-end, that will require you to change the front-end code instead of stored procedures.

    Also, now would be a good time to also consider "partitioning" especially if the older data is not modified after a month or two.  That would also allow you do to the transfer to the new table using segmented minimal logging AND, if the "old" partitions are not updated, they could be set to "READ ONLY", have a "final" backup taken of each of their file groups, and never have to be backed up again, either.

    And, like I said above, forcing the LOBs our of row the right way has some HUGE performance advantages.  Partioning will slow things down a bit but the out of row thing will more than make up for that especially when it comes to scans and range scans of the Clustered Index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And, NO... don't create the Non-Clustered Indexes before hand.  I've never seen it work like MS says it "could" work.  Having Non-Clustered indexes will make it so that the transfer is NOT Minimally Logged.  You should resign yourself to ONLY having the Clustered Index on the empty table before you start the transfer.

    You should also look up what the requirements are for "Minimal Logging".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply