drive space issues trying to index 2.4TB table

  • OK, this landed on my desk this morning: user created a 2.4TB table without a Clustered Key. He now wants a clustered key on the table but is unable to do so because he's running out of disk space. We're unable to allocate him more disk space and have cleaned up about all we can cleanup.

    Here's what I'm working with:

    SQL Server 2008 R2

    Data Drive: 5.3TB (200GB free space)

    DB unallocated space: .5TB

    Tbl Size: 2.4TB

    Rows: 1,268,636,579

    TempDb Drive: 115GB

    My initial thought was to create a nonclustered index and then recommend he consult with someone before doing something like this in the future. The more likely scenario would be to create a new table with a clustered key and then migrate the data over in small batches.

    Any thoughts/ideas/approaches on an alternative would be appreciated.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • No way for the user to go through the original table population procedure again?

    Changing to Bulk Logged recovery mode might help with the disk space required, although not sure if that helps with going Heap -> Clustered table.

  • To create a clustered index on a 2.4 TB table, you need 2.4TB for the old structure (the heap), the same amount of space for the new index, around 20% of that for sort space, either in the user database or tempDB, then you need space to log that, if the DB is in full recovery, that's the full size of the index + around 10%.

    If you need the cluster (and all tables should generally have a clustered index), I'd recommend the 'new table and move option, but that could still require an extra 2.4 TB or so,

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gazareth and Gail for the quick response!

    Gazareth (5/11/2012)


    No way for the user to go through the original table population procedure again?

    Nope. The raw data was deleted off this server once he manipulated and loaded it into the table. We could go and collect all the data again for him but we're talking weeks of processing time which he doesn't have.

    GilaMonster (5/11/2012)


    To create a clustered index on a 2.4 TB table, you need 2.4TB for the old structure (the heap), the same amount of space for the new index, around 20% of that for sort space, either in the user database or tempDB, then you need space to log that, if the DB is in full recovery, that's the full size of the index + around 10%.

    If you need the cluster (and all tables should generally have a clustered index), I'd recommend the 'new table and move option, but that could still require an extra 2.4 TB or so,

    Thanks for the detailed breakdown, I kind of figured I would need at least the same amount of free space as the existing table.

    Going to recommend the nonclustered index for the short-term and if performance is an issue then we'll try migrating the data over to a new table.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • If you have the storage available for the raw data on the server, then you should be able to BCP out the data from the table, truncate the table, add the clustered index, and BCP the data back in.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • RP_DBA (5/11/2012)


    OK, this landed on my desk this morning: user created a 2.4TB table without a Clustered Key. He now wants a clustered key on the table but is unable to do so because he's running out of disk space. We're unable to allocate him more disk space and have cleaned up about all we can cleanup.

    Here's what I'm working with:

    SQL Server 2008 R2

    Data Drive: 5.3TB (200GB free space)

    DB unallocated space: .5TB

    Tbl Size: 2.4TB

    Rows: 1,268,636,579

    TempDb Drive: 115GB

    My initial thought was to create a nonclustered index and then recommend he consult with someone before doing something like this in the future. The more likely scenario would be to create a new table with a clustered key and then migrate the data over in small batches.

    Any thoughts/ideas/approaches on an alternative would be appreciated.

    If you have the Enterprise Edition, you might be able to use Partitioned Tables to make the index management a whole lot easier. If you have the Standard Edition, you might be able to pull of a similar miracle with a partitioned view. Both will take a while to build because you'll need to split off "sub tables" which means moving smaller amounts of duplicated data and then deleted that data from the big table, but it can work.

    As a side bar, if it's absolutely necessary for this user to have a 2.4 TB database, some addititions to infrastructure in the form of additional hard disk space would definitely go a long way. The tricks with partitioning will serve you well for only so long.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply