Compress or NOT to Compress data before loading data

  • Environment:

    SQL Server 2008 Enterprise Edition 64 bit (32GB RAM)

    Database containing tables which are partitioned by a DateTime field.

    200 partitoned files which I cycle through

    No updates to the data once it is done loading.

    Data used by reports only

    Purge data where date inserted is older than 20 days

    Scenario:

    Through out the dayight, we receive zipped files (4GB to 60GB) containing millions of rows which gets loaded into those tables via a SSIS package I have built. I am dumping each file into its own separate partition. At the end of the loading process, I then compress the partitioned data (index and table) to save space. The process works but it seems to take a really long time.

    Question:

    Would it help to setup the compression on the partiton data (index and table) before I start loading the file? Would the load time go up? Is there a better way to do this?

    Thanks in advance.

  • I think that the answer here is "it depends". By adding the compression ahead of time, you will end up using more CPU during the load than you would without the compression. At the same time, if you do not set up compression ahead of time then you will be writing more data to disk. So the question here is do you feel like your server is more IO bound or more CPU bound? For most of the servers in our system, we are FAR more IO bound than we are CPU bound, so adding compression definitely does not hurt the load times and in many cases helps. But if you finding CPU to be a bottleneck, I would say do not compress until the data is loaded.

  • Definitely compress first. Minimally-logged bulk insert into a compressed table (heap or clustered) is supported.

    Inserting compressed data will be much faster than inserting the (presumably much larger) uncompressed data and then compressing it in-place. Compressing in-place is also a fully-logged operation.

    Be asbolutely sure that you meet the conditions for minimally-logged operations.

  • When say...'Insert Compressed Data' did you mean inserting data into a compressed table or did you mean compressing the data in the flat file before inserting it into the table?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply