Change column smallint to int 300 million records

  • At this point I think Igor and Sergiy are tired and quoting me sometimes when they think they're quoting eachother. 🙂

    RedGate's smart rename function came up with this for renaming the Line_number column to Line_Number_Archive

    /*

    Script created by SQL Prompt version 6.5.0.326 from Red Gate Software Ltd at 2/9/2016 6:13:39 AM

    Run this script on cmsperformance to perform the Smart Rename refactoring.

    Please back up your database before running this script.

    */

    -- Summary for the smart rename:

    --

    -- Action:

    -- Drop index [index_FILE_RECORD_Status_ID_Time] from [dbo].[FILE_RECORD]

    -- Alter table [dbo].[FILE_RECORD]

    -- Create index [index_FILE_RECORD_Status_ID_Time] on [dbo].[FILE_RECORD]

    --

    -- Warnings:

    -- Medium: Objects need to be created using one or more non-standard filegroups. These filegroups may need to be created manually: CM_Index

    SET NUMERIC_ROUNDABORT OFF

    GO

    SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS ON

    GO

    SET XACT_ABORT ON

    GO

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

    GO

    BEGIN TRANSACTION

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    PRINT N'Dropping index [index_FILE_RECORD_Status_ID_Time] from [dbo].[FILE_RECORD]'

    GO

    DROP INDEX [index_FILE_RECORD_Status_ID_Time] ON [dbo].[FILE_RECORD]

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    PRINT N'Altering [dbo].[FILE_RECORD]'

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    EXEC sp_rename N'[dbo].[FILE_RECORD].[LINE_NUMBER]', N'LINE_NUMBER_ARCHIVE', N'COLUMN'

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    PRINT N'Creating index [index_FILE_RECORD_Status_ID_Time] on [dbo].[FILE_RECORD]'

    GO

    CREATE NONCLUSTERED INDEX [index_FILE_RECORD_Status_ID_Time] ON [dbo].[FILE_RECORD] ([FILE_RECORD_STATUS], [FILE_DESCRIPTOR_ID], [MESSAGE_ID], [FILE_RECORD_ID], [CREATED_DATE_TIME]) INCLUDE ([ACCOUNT_DATA_ID], [CHECKSUM], [CONCRETE_TYPE], [FILE_ID], [LAST_MOD_DATE_TIME], [LAST_MODIFIER], [LINE_NUMBER_ARCHIVE], [LOGICAL_RECORD_DESCRIPTOR_REFERENCE_ID], [MESSAGE_DIRECTION], [MESSAGE_ERRORERROR_ID], [RAW_CONTENT], [RECORD_CONTEXT], [TRANS_SEQ_NUM]) WITH (FILLFACTOR=90) ON [CM_Index]

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    COMMIT TRANSACTION

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    DECLARE @Success AS BIT

    SET @Success = 1

    SET NOEXEC OFF

    IF (@Success = 1) PRINT 'The database update succeeded'

    ELSE BEGIN

    IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION

    PRINT 'The database update failed'

    END

    GO

  • Sergiy (2/8/2016)


    Igor Micev (2/5/2016)


    This is not a good thinking. What if you have many schema bound objects with that table?

    The advise from ScottPletcher is just fine. Changing from smallint to int should not be a serious problem, unless maybe the time required to finish it all.

    You obviously don't know what you are talking about.

    You never tried it on a big table, so for you it "should not be a serious problem".

    Increasing capacity of a column in table means a split on every page occupied by the table.

    Which means effectively doubling the space needed to accommodate the table.

    With corresponding requirements to the log file.

    With "maybe the time required to finish it all" - hours and hours for the system to be offline.

    Absurd to make a blanket claim about "every page". Increasing from a smallint to a full int requires only 4 extra bytes per row (as I understand it, SQL leaves the old column in place and simply adds the new one). Unquestionably many pages will already have enough free space in them to hold the new value.

    Moreover, changing the schema of a table is very common. But dropping a production table is extraordinarily risky, and almost never done:

    What about permissions on the table?

    What about extended attributes on the table?

    What about constraints and indexes? You have to insure you get every index back into its original filegroup.

    Just to find out that after 20 hours the system has run out of space and started reversing the transaction.

    Which will take almost twice as long.

    By the end of the week, when the system is back online, your stuff is already packed into a box and ready for you to carry it away.

    Scott expects something to happen, that's why he included the item 4 into his plan, which repeats my plan in shorter terms.

    If the task fails, it's a major problem either way. But much more likely to happen when you've dropped the table.

    As for dependencies - it's not that hard to find them and script dropping and recreating.

    Actually, SSMS can do it all for you.

    Open the table in Design view, alter the column type and do not save but hit the "Generate Change Script" button - here is the script you can start from.

    Lol, yeah, right, if you rely solely on that, start packing that box now!

    Any procedure, any view or function, any script in SSIS which mentions this column needs to be revised not to generate run time errors or performance issues due to implicit data type conversions caused by the new data type used for the column.

    Perhaps. Or perhaps, where possible, the code was written more resiliently and gets its data types from the metadata.

    Changing schema of a live database should not be taken lightly.

    Of course not. Neither should it be traumatic. It's going to happen fairly often, after all.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Indianrock (2/9/2016)


    At this point I think Igor and Sergiy are tired and quoting me sometimes when they think they're quoting eachother. 🙂

    RedGate's smart rename function came up with this for renaming the Line_number column to Line_Number_Archive

    /*

    Script created by SQL Prompt version 6.5.0.326 from Red Gate Software Ltd at 2/9/2016 6:13:39 AM

    Run this script on cmsperformance to perform the Smart Rename refactoring.

    Please back up your database before running this script.

    */

    -- Summary for the smart rename:

    --

    -- Action:

    -- Drop index [index_FILE_RECORD_Status_ID_Time] from [dbo].[FILE_RECORD]

    -- Alter table [dbo].[FILE_RECORD]

    -- Create index [index_FILE_RECORD_Status_ID_Time] on [dbo].[FILE_RECORD]

    --

    -- Warnings:

    -- Medium: Objects need to be created using one or more non-standard filegroups. These filegroups may need to be created manually: CM_Index

    SET NUMERIC_ROUNDABORT OFF

    GO

    SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS ON

    GO

    SET XACT_ABORT ON

    GO

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

    GO

    BEGIN TRANSACTION

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    PRINT N'Dropping index [index_FILE_RECORD_Status_ID_Time] from [dbo].[FILE_RECORD]'

    GO

    DROP INDEX [index_FILE_RECORD_Status_ID_Time] ON [dbo].[FILE_RECORD]

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    PRINT N'Altering [dbo].[FILE_RECORD]'

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    EXEC sp_rename N'[dbo].[FILE_RECORD].[LINE_NUMBER]', N'LINE_NUMBER_ARCHIVE', N'COLUMN'

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    PRINT N'Creating index [index_FILE_RECORD_Status_ID_Time] on [dbo].[FILE_RECORD]'

    GO

    CREATE NONCLUSTERED INDEX [index_FILE_RECORD_Status_ID_Time] ON [dbo].[FILE_RECORD] ([FILE_RECORD_STATUS], [FILE_DESCRIPTOR_ID], [MESSAGE_ID], [FILE_RECORD_ID], [CREATED_DATE_TIME]) INCLUDE ([ACCOUNT_DATA_ID], [CHECKSUM], [CONCRETE_TYPE], [FILE_ID], [LAST_MOD_DATE_TIME], [LAST_MODIFIER], [LINE_NUMBER_ARCHIVE], [LOGICAL_RECORD_DESCRIPTOR_REFERENCE_ID], [MESSAGE_DIRECTION], [MESSAGE_ERRORERROR_ID], [RAW_CONTENT], [RECORD_CONTEXT], [TRANS_SEQ_NUM]) WITH (FILLFACTOR=90) ON [CM_Index]

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    COMMIT TRANSACTION

    GO

    IF @@ERROR <> 0 SET NOEXEC ON

    GO

    DECLARE @Success AS BIT

    SET @Success = 1

    SET NOEXEC OFF

    IF (@Success = 1) PRINT 'The database update succeeded'

    ELSE BEGIN

    IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION

    PRINT 'The database update failed'

    END

    GO

    Yes, definitely. This the solution by not going to create another table, but alter the existing one. RedGate is smart enough to use archive column and then make renaming.

    Thanks to you.

    Igor Micev,My blog: www.igormicev.com

  • Indianrock (2/9/2016)


    Thanks for all of the feedback.. I'll check with development to see how large they expect line numbers to get ( I guess the @cap variable was just for documentation purposes )

    @cap is how many positive numbers (including zero) smallint can accommodate.

    Taken from BOL.

    _____________
    Code for TallyGenerator

  • Indianrock (2/9/2016)


    At this point I think Igor and Sergiy are tired and quoting me sometimes when they think they're quoting eachother. 🙂

    RedGate's smart rename function came up with this for renaming the Line_number column to Line_Number_Archive

    How did it go on your QA database?

    _____________
    Code for TallyGenerator

  • ScottPletcher (2/9/2016)


    Absurd to make a blanket claim about "every page". Increasing from a smallint to a full int requires only 4 extra bytes per row (as I understand it, SQL leaves the old column in place and simply adds the new one). Unquestionably many pages will already have enough free space in them to hold the new value.

    "Absurd", "Unquestionably" - that's the lexicon of a wise man, sure.

    Moreover, changing the schema of a table is very common. But dropping a production table is extraordinarily risky, and almost never done:

    What about permissions on the table?

    What about extended attributes on the table?

    What about constraints and indexes? You have to insure you get every index back into its original filegroup.

    I strongly recommend you to read the documentation for the standard tools provided with SQL Server.

    https://msdn.microsoft.com/query/dev10.query?appId=Dev10IDEF1&l=EN-US&k=k(VS.TOOLSOPTIONSPAGES.SQL_SERVER_OBJECT_EXPLORER.SCRIPTING)&rd=true

    You might discover broad new horizons.

    If the task fails, it's a major problem either way. But much more likely to happen when you've dropped the table.

    What makes you believe in that?

    How exactly dropping a table complicates a transaction rolling back?

    As for dependencies - it's not that hard to find them and script dropping and recreating.

    Actually, SSMS can do it all for you.

    Open the table in Design view, alter the column type and do not save but hit the "Generate Change Script" button - here is the script you can start from.

    Lol, yeah, right, if you rely solely on that, start packing that box now!

    Did I ever said to rely solely on that?

    Perhaps. Or perhaps, where possible, the code was written more resiliently and gets its data types from the metadata.

    Lol, yeah, right, if you rely solely on that, start packing that box now!

    Changing schema of a live database should not be taken lightly.

    Of course not. Neither should it be traumatic. It's going to happen fairly often, after all.

    Fairly often? Can you define "fairly often"?

    Changing schema in a live database is like changing a ceiling height in a populated apartment block.

    Should not be traumatic, sure.

    _____________
    Code for TallyGenerator

  • Sergiy (2/9/2016)


    ScottPletcher (2/9/2016)


    Absurd to make a blanket claim about "every page". Increasing from a smallint to a full int requires only 4 extra bytes per row (as I understand it, SQL leaves the old column in place and simply adds the new one). Unquestionably many pages will already have enough free space in them to hold the new value.

    "Absurd", "Unquestionably" - that's the lexicon of a wise man, sure.

    Could you stop the useless ad hominem long enough to address the technical issue? How would adding 4 bytes to each row force every page to be split?

    Changing schema of a live database should not be taken lightly.

    Of course not. Neither should it be traumatic. It's going to happen fairly often, after all.

    Fairly often? Can you define "fairly often"?

    Changing schema in a live database is like changing a ceiling height in a populated apartment block.

    Should not be traumatic, sure.

    I've never seen a shop that was afraid to ALTER a table to add/modify a column before, but always preferred to drop the table. Far more chances for errors. Sounds more like an Oracle approach than a SQL Server approach.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (2/9/2016)


    I've never seen a shop that was afraid to ALTER a table to add/modify a column before, but always preferred to drop the table. Far more chances for errors. Sounds more like an Oracle approach than a SQL Server approach.

    That only means your experience is fairly limited.

    Millions of people have never seen snow.

    So what?

    _____________
    Code for TallyGenerator

  • I'm still waiting for management and QA to decide what they want to do, but all of the suggestions here have been passed along. So, no test of anything in QA yet except the initial direct "change data type of existing column" which ran over an hour in QA before we killed it. At that point the log had reached 65% full in QA.

    I think one of the main objections in prod would be doubling the size of tempdb to accomodate this 460GB table. At a minimum we'd have to restore a fresh copy of prod in QA and boost tempdb there so the direct alter table alter column approach could be properly tested, even if it took days.

    But I'm also waiting to see if Sergiy's code-based fix using negative values in the small integer column would give them enough flexibility.

  • Indianrock (2/9/2016)


    But I'm also waiting to see if Sergiy's code-based fix using negative values in the small integer column would give them enough flexibility.

    Or you may wish to create a separate table for INT LINE_NUMBER.

    And populate it with a query like this:

    SELECT [FILE_RECORD_ID],

    ROW_NUMBER() OVER (PARTITION BY [FILE_ID] ORDER BY [FILE_RECORD_ID]) AS [LINE_NUMBER]

    FROM [dbo].[FILE_RECORD]

    _____________
    Code for TallyGenerator

  • Sergiy (2/9/2016)


    Indianrock (2/9/2016)


    But I'm also waiting to see if Sergiy's code-based fix using negative values in the small integer column would give them enough flexibility.

    Or you may wish to create a separate table for INT LINE_NUMBER.

    And populate it with a query like this:

    SELECT [FILE_RECORD_ID],

    ROW_NUMBER() OVER (PARTITION BY [FILE_ID] ORDER BY [FILE_RECORD_ID]) AS [LINE_NUMBER]

    FROM [dbo].[FILE_RECORD]

    But you must make sure you enforce the order of records when inserting by FILE_ID, LINE_NUMBER.

    _____________
    Code for TallyGenerator

  • So the entire 460GB is in the main part of the table, none in overflow space? Often with a row that wide you'd be able to offload a decent chunk of it to lob. Yeah, that will be a bigger problem.

    You definitely want to pre-allocate enough log space to handle the entire ALTER/INSERT and make sure that you don't have too many VLFs. The INSERTs have to be logged as well, as least minimally.

    You might also try adding a new column, UPDATE'ing the existing values into it, and finally renaming the columns so the new one replaces the old one. Naturally check first to see how much freespace you have in the table to hold the new column.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I don't say Sergiy approach is not going to work, but it's more risky. Simply you can do it without dropping the table.

    -- Steps:

    -- 1. Script the Drop and Create statements for the dependecies of column Line_Number

    -- 2. Drop the constratins, indexes on column Line_number

    -- 3.

    ALTER TABLE [dbo].[FILE_RECORD]

    ADD Line_number_archive int NULL;

    -- 4.

    DECLARE @flag bit;

    Step4:

    SET @flag = 0;

    WHILE 1 = 1

    BEGIN

    UPDATE TOP (100000) t -- test the chunks size so that updates go fast enough

    SET Line_number_archive = Line_number

    FROM [dbo].[FILE_RECORD] t

    WHERE t.Line_number_archive IS NULL AND Line_number IS NOT NULL;

    IF @@ROWCOUNT = 0

    BEGIN

    BREAK;

    END;

    END;

    BEGIN TRY

    BEGIN TRAN;

    ALTER TABLE dbo.[FILE_RECORD] DISABLE TRIGGER ALL;

    ALTER TABLE dbo.[FILE_RECORD] DROP Line_number;

    EXEC sp_rename 'dbo.FILE_RECORD.Line_number_archive', 'Line_number', 'COLUMN';

    ALTER TABLE dbo.[FILE_RECORD] ENABLE TRIGGER ALL;

    COMMIT;

    END TRY

    BEGIN CATCH

    SET @flag = 1;

    ROLLBACK;

    END CATCH;

    IF @flag = 1

    BEGIN

    GOTO Step4

    --5. Run the create/enable statements from step 1

    Additionally to consider the NULLs!, if it is NOT NULL defined than is ok.

    Igor Micev,My blog: www.igormicev.com

  • Scott, the table is not in the primary file group and the database has about 20 data files spread over numerous Netapp Luns if that's what you mean

    CREATE TABLE [dbo].[FILE_RECORD](

    [FILE_RECORD_ID] [int] IDENTITY(1,1) NOT NULL,

    [LOGICAL_RECORD_DESCRIPTOR_REFERENCE_ID] [varchar](50) NOT NULL,

    [RAW_CONTENT] [varchar](3000) NOT NULL,

    [CREATED_DATE_TIME] [datetime] NOT NULL,

    [LINE_NUMBER] [smallint] NULL,

    [MESSAGE_ID] [int] NULL,

    [FILE_ID] [int] NULL,

    [CONCRETE_TYPE] [varchar](150) NOT NULL,

    [LAST_MODIFIER] [int] NOT NULL,

    [LAST_MOD_DATE_TIME] [datetime] NOT NULL,

    [TRANS_SEQ_NUM] [smallint] NOT NULL,

    [MESSAGE_DIRECTION] [char](8) NOT NULL,

    [MESSAGE_ERRORERROR_ID] [int] NULL,

    [ACCOUNT_DATA_ID] [int] NULL,

    [RECORD_CONTEXT] [varchar](10) NOT NULL,

    [FILE_DESCRIPTOR_ID] [int] NULL,

    [FILE_RECORD_STATUS] [varchar](50) NULL,

    [CHECKSUM] [char](48) NULL,

    CONSTRAINT [FILE_RECORD_PK] PRIMARY KEY CLUSTERED

    (

    [FILE_RECORD_ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [CM_Data]

    ) ON [CM_Data]

    END

  • Igor Micev (2/9/2016)


    I don't say Sergiy approach is not going to work, but it's more risky. Simply you can do it without dropping the table.

    -- Steps:

    -- 1. Script the Drop and Create statements for the dependecies of column Line_Number

    -- 2. Drop the constratins, indexes on column Line_number

    Igor, 3300 updates by 100k rows each will take rather long time to execute.

    And all this time the constraints, indexes will be missing, affecting current functionality in who knows which way.

    I'd say this is risky.

    --5. Run the create/enable statements from step 1

    This step can fail too, you know.

    It must be placed inside transaction, so all the changes can be reversed, if some of them cannot be completed.

    Look how it's done in SSMS.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply