Change column smallint to int 300 million records

Question

Change column smallint to int 300 million records

Viewing 15 posts - 31 through 45 (of 45 total)

You must be logged in to reply to this topic. Login to reply

Indianrock SSC-Insane Points: 20333 More actions · Answer 1

Unless you expect your database to remain small, the lesson seems to be make your data types large enough for the next 10 years, not for the next 10 days.

Sergiy SSC Guru Points: 110208 More actions · Answer 2

Indianrock (2/9/2016)
Unless you expect your database to remain small, the lesson seems to be make your data types large enough for the next 10 years, not for the next 10 days.

Lessons from history teach us that they don't teach us anything.

🙂

I'm still to find that commercial software house which includes scalability tests in their pre-production release routine...

_____________
Code for TallyGenerator

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 3

Sergiy (2/9/2016)
Igor Micev (2/9/2016)
I don't say Sergiy approach is not going to work, but it's more risky. Simply you can do it without dropping the table.
-- Steps:
-- 1. Script the Drop and Create statements for the dependecies of column Line_Number
-- 2. Drop the constratins, indexes on column Line_number
Igor, 3300 updates by 100k rows each will take rather long time to execute.
And all this time the constraints, indexes will be missing, affecting current functionality in who knows which way.
I'd say this is risky.
--5. Run the create/enable statements from step 1
This step can fail too, you know.
It must be placed inside transaction, so all the changes can be reversed, if some of them cannot be completed.
Look how it's done in SSMS.

The table is operational. You can add index on the two columns for faster work. It's not just run the query, and for your steps too, it requires extra work of course

Igor Micev,My blog: www.igormicev.com

Sergiy SSC Guru Points: 110208 More actions · Answer 4

Igor Micev (2/9/2016)
Put the db in simple recovery model

That alone would be enough to see your stuff packed in a box.

Igor, all your suggestions may be good for a sand-box database in DEV environment.

But they absolutely unacceptable for a live database in PROD.

Especially for a big live database in PROD.

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110208 More actions · Answer 5

Igor Micev (2/9/2016)
The table is operational.

Yeah, right.

I once disabled an index which was a duplicate of another one and was only taking space without any benefit (as I thought).

Hours later it turned out the Invoicing module failed because one of the functions within it was querying the table with a hint WITH (INDEX ...) which, of course, was pointing onto exactly that index.

Who would have thought?

😉

_____________
Code for TallyGenerator

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 6

Sergiy (2/9/2016)
Igor Micev (2/9/2016)
The table is operational.
Yeah, right.
I once disabled an index which was a duplicate of another one and was only taking space without any benefit (as I thought).
Hours later it turned out the Invoicing module failed because one of the functions within it was querying the table with a hint WITH (INDEX ...) which, of course, was pointing onto exactly that index.
Who would have thought?
😉

Then, won't you meet that problem again?

Igor Micev,My blog: www.igormicev.com

Sergiy SSC Guru Points: 110208 More actions · Answer 7

Igor Micev (2/9/2016)
Then, won't you meet that problem again?

With my approach?

No.

Read it again:

1. Create a new table with desired table design.
2. Start populating the new table with portions of data from the existing table.
Choose appropriate size of chunks, so you do not overwhelm your transaction log, and take TRN backups in between of INSERT statements.
It may take many hours or even days - does not matter, as the original table is still operational.
Make sure that any changes made to the data in the original table get copied to the new one.

Where do you see anything about altering the existing table?

While the data is being copied you have plenty of time to prepare and test all the scripts for indexes, privileges, depending objects, etc.

When everything is ready - lock the whole table in a transaction, copy the last bit of freshly changed data, drop old table, rename new one and run the prepared script for the dependencies.

If everything went file - COMMIT.

Otherwise - ROLLBACK.

Either way - it's done within seconds (as you modify only system catalogues), no noticeable interruption to the normal operation.

You end up either with updated column, or (in case of a failure) with the same 2 old and new tables, and the background process which keeps synchronising data in them.

Fix errors in your script and repeat the attempt in an appropriate moment.

_____________
Code for TallyGenerator

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 8

Sergiy (2/9/2016)
Igor Micev (2/9/2016)
Then, won't you meet that problem again?
With my approach?
No.
Read it again:
1. Create a new table with desired table design.
2. Start populating the new table with portions of data from the existing table.
Choose appropriate size of chunks, so you do not overwhelm your transaction log, and take TRN backups in between of INSERT statements.
It may take many hours or even days - does not matter, as the original table is still operational.
Make sure that any changes made to the data in the original table get copied to the new one.
Where do you see anything about altering the existing table?
While the data is being copied you have plenty of time to prepare and test all the scripts for indexes, privileges, depending objects, etc.
When everything is ready - lock the whole table in a transaction, copy the last bit of freshly changed data, drop old table, rename new one and run the prepared script for the dependencies.
If everything went file - COMMIT.
Otherwise - ROLLBACK.
Either way - it's done within seconds (as you modify only system catalogues), no noticeable interruption to the normal operation.
You end up either with updated column, or (in case of a failure) with the same 2 old and new tables, and the background process which keeps synchronising data in them.
Fix errors in your script and repeat the attempt in an appropriate moment.

Not so different from my approach. I make a column drop and rename, and you make a table drop and rename. The prepared dependencies goes for creation after within a transaction, all steps. In case of fail the table is still alive.

Igor Micev,My blog: www.igormicev.com

Sergiy SSC Guru Points: 110208 More actions · Answer 9

Igor Micev (2/9/2016)
Not so different from my approach. I make a column drop and rename, and you make a table drop and rename. The prepared dependencies goes for creation after within a transaction, all steps. In case of fail the table is still alive.

Totally different.

Whole world away.

Altering a column changes every record in the table which is supposed to stay operational.

Every data page and most of index pages get affected by "drop and rename column".

Massive data change.

And you need to do it within a transaction, so if anything goes wrong you can roll back to the original state.

Which means - the table is not accessible for the whole duration of 0.5TB table being re-written.

With my approach all the data get altered outside the transaction, by the asynchronous process which makes it ready when it's ready.

Transaction only covers "last minute" data changes in the original table plus system catalogue alterations.

Renaming a table does not change a single data page in it, so it usually gets done in no time.

See the difference now?

_____________
Code for TallyGenerator

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 10

Sergiy (2/9/2016)
Igor Micev (2/9/2016)
Not so different from my approach. I make a column drop and rename, and you make a table drop and rename. The prepared dependencies goes for creation after within a transaction, all steps. In case of fail the table is still alive.
Totally different.
Whole world away.
Altering a column changes every record in the table which is supposed to stay operational.
Every data page and most of index pages get affected by "drop and rename column".
Massive data change.
And you need to do it within a transaction, so if anything goes wrong you can roll back to the original state.
Which means - the table is not accessible for the whole duration of 0.5TB table being re-written.
With my approach all the data get altered outside the transaction, by the asynchronous process which makes it ready when it's ready.
Transaction only covers "last minute" data changes in the original table plus system catalogue alterations.
Renaming a table does not change a single data page in it, so it usually gets done in no time.
See the difference now?

Correct.

Igor Micev,My blog: www.igormicev.com

Sergiy SSC Guru Points: 110208 More actions · Answer 11

Here is the quickly baked test script:

DROP TABLE TestCalendar

CREATE TABLE [dbo].[TestCalendar](

[ID] [bigint] NOT NULL,

[N_Date] [datetime] NOT NULL,

[WeekDay] [smallint] NULL,

PRIMARY KEY (ID) WITH FILLFACTOR = 100

)

CREATE TABLE [dbo].[Tmp_TestCalendar](

[ID] [bigint] NOT NULL,

[N_Date] [datetime] NOT NULL,

[WeekDay] [int] NULL,

PRIMARY KEY (ID) WITH FILLFACTOR = 100

)

SET NOCOUNT ON

INSERT INTO dbo.TestCalendar ( ID, N_Date, WeekDay )

SELECT tg.N ID, DATEADD(dd, N, 0) N_Date, CONVERT(SMALLINT, n%7 +1 ) WeekDay

FROM Service.dbo.TallyGenerator(0, 50000, NULL, 1) tg

ORDER BY ID

WHILE EXISTS (SELECT * FROM dbo.TestCalendar T1

WHERE NOT EXISTS (SELECT * FROM dbo.[Tmp_TestCalendar] T2

WHERE T2.ID = T1.ID)

)

BEGIN

INSERT INTO dbo.Tmp_TestCalendar ( ID, N_Date, WeekDay )

SELECT TOP 1000 ID, N_Date, WeekDay

FROM dbo.TestCalendar T1

WHERE NOT EXISTS (SELECT * FROM dbo.[Tmp_TestCalendar] T2

WHERE T2.ID = T1.ID)

ORDER BY ID

END

DBCC SHOWCONTIG (TestCalendar) WITH ALL_INDEXES--, TABLERESULTS

DBCC SHOWCONTIG (Tmp_TestCalendar) WITH ALL_INDEXES--, TABLERESULTS