Best Practice To Change Datatypes

  • I have a table containing 16M rows of data (Fact table in a data warehouse) that I'd like to make some modifications to. The existing columns are generally made up of bigint and float for FKs and facts respectively. Most of the FKs do not require anywhere near the storage size of bigint since the possible number of entries is so low... I'd therefore like to reduce the size of these datatypes to a suitable size for each column.

    I'd like to think that doing this will not only reduce the size of the table and any non clustered indexes based on those fields but will reduce the IO effort for user queries since the table will be more compact in terms of page count.

    With a table of this size, what would be considered a good approach to safely altering the datatypes with minimal downtime for users?

    Any advice on this would be gratefully received!

    davas

  • What is your loading strategy - if you have a strategy where you do a destructive full load i.e. dropping all of the tables, rebuild and reload then I would change the datatype at this point.

    If you do a full refresh at any point rather than a rebuild i.e. Truncating tables then reloading I would change at this point

    I'm sure that there are other strategies but as we do a full load every night on the warehouses I would build it in for a one off rather than alter a table that has millions of rows

    Failing that you could create a job and schedule this out of hours - I would change the recovery mode to save log size

    Andy

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • We only load a relatively small number of records each night so they are loaded as an append to the existing table. I had thought about reloading into a new table and then renaming the tables as required.

    If I did a SELECT INTO with suitable cast conversions would that work efficiently?

    For example,

    SELECT CONVERT(int, BIGINTCOLUMN1) AS COLUMNNAME

    INTO TableNameTemp

    FROM TableNameOriginal

    Thanks

  • Yes it would certainly do the trick for you - I have had colleagues who have used this method in the past successfully and SELECT INTO is minimally logged.

    My personal preference would be to schedule a job, but that's not to say that's the "correct way" just lends its self better to our loading strategies 😉

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • Thanks for that!

    A scheduled job would certainly be most suitable but i think for a change like this i would rather be on-hand to know what is going on. We are an international business so have users all over the world so there's not really a time where we won't affect them unfortunately.

    Newbie question here... Our production DB is set to Full Recovery whereas my test DB is Simple. If I run SELECT INTO on Live will it still fill the log files? You said it is a minimally logged operation but I thought some will always create log entries?

    Thanks again,

    davas

  • No as far as I am aware if you are set to Full Recovery then it will be logged:

    http://msdn.microsoft.com/en-us/library/ms191244.aspx

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply