Data Compression

  • Hi Guys,

    We have a database that is 2tb in size with 2 tables that are partitioned (monthly partition) in sql 2008. I want to make some space available from the server for other projects... please advice me what best I should do in order to compress the data? Tables have 2 billions records...Please advice me the approaches in order to free up some space. Thanks a lot in advance for the suggestions.

  • You need to take care of the following things

    1. Is the table partitioned. If not it would be great if you could partition it?

    2. Based on the querying needs you could compress the data in the older partitions and keep it uncompressed for the latest ones

    Cheers

    V

    http://www.sqlsimplified.com

  • yes the tables are partitioned... so joshhh you think page level partition should work fine... anything else I can do to free up some space? I guess database shrink should free up unused space as well... thanks for the response.

  • 1. Yes , you will have to shrink the data files in order to release that space from that database. (Shrink might require considerable downtime)

    2. Check if defrag is required, If so do defrag and then shrink

    3. Do archival of non required data

    All this will provide you some space, but how much is required and how long can it sustain assuming that the existing database is going to expand, is a bigger question

    V

  • See Creating Compressed Tables and Indexes in Books Online.

    See this blog entry by the SQL Server Storage Engine Team for more information.

    SQL Server provides a facility to estimate the space savings from row or page compression before you commit to anything. See sp_estimate_data_compression_savings in Books Online.

    Compression is a complex topic - there's no way to cover it comprehensively in a single post.

    You must be running Enterprise Edition or equivalent to use data compression. Carefully assess the risks and benefits before implementing compression. If you don't have the skills in-house to make this assessment, I would encourage you to seek expert advice.

    Paul

  • joshhhhhh


    Yes , you will have to shrink the data files in order to release that space from that database. (Shrink might require considerable downtime)

    It is generally unwise to shrink a database. See this blog entry by Paul Randal

    In particular there is no point shrinking the database if it is just going to grow again.

    If you do end up shrinking one or more files, be aware that it is a fully-logged operation.

    Shrinking never requires the database to go off-line - it is a fully on-line operation.

    Again, I stress that shrinking is almost always the wrong thing to do.

    joshhhhhh


    Check if defrag is required, If so do defrag and then shrink

    That is exactly backwards 😛

    Shrinking will undo the good work done by defragmenting. If you absolutely must perform a shrink operation, do it first, and then defragment. It really makes no sense to do it the other way around.

    Don't defragment for the sake of it either - you will typically not gain very much space just by doing this. Only defragment structures that are regularly scanned in logical order. Fragmentation does not affect singleton selects for example.

    Paul

  • Hi Paul

    Thanks for your inputs.. Its a small ex I have put down

    -- create a db

    USE master;

    GO

    IF EXISTS (SELECT 1 FROM sys.databases WHERE name = 'Page')

    DROP DATABASE Page;

    GO

    CREATE DATABASE Page

    ON PRIMARY

    ( NAME='Page_Data',

    FILENAME = 'D:\Work\POC\DatabaseEngine\Data\Page_Data.mdf',

    SIZE=10MB,

    MAXSIZE=20MB,

    FILEGROWTH=1MB)

    LOG ON

    ( NAME='Page_Log',

    FILENAME = 'D:\Work\POC\DatabaseEngine\Data\Page_Log.ldf',

    SIZE=1MB,

    MAXSIZE=100MB,

    FILEGROWTH=1MB);

    GO

    -- create a table

    CREATE TABLE Page (c1 int identity(1,1), c2 varchar(900))

    CREATE CLUSTERED INDEX idx_Page ON Page (c1)

    -- populate this table

    DECLARE @i int;

    SET @i = 1;

    WHILE (@i<=2000)

    BEGIN

    INSERT INTO Page VALUES (REPLICATE('a',900))

    INSERT INTO Page VALUES (REPLICATE('a',900))

    INSERT INTO Page VALUES (REPLICATE('a',900))

    INSERT INTO Page VALUES (REPLICATE('a',900))

    INSERT INTO Page VALUES (REPLICATE('a',900))

    INSERT INTO Page VALUES (REPLICATE('a',900))

    INSERT INTO Page VALUES (REPLICATE('a',900))

    INSERT INTO Page VALUES (REPLICATE('a',900))

    SET @i = @i + 1;

    END

    -- check how many pages allocated

    DBCC IND (Page,Page,1)

    -- check the space used by this table .. note it down

    sp_spaceused Page

    -- since each page has 8 rows delete rows that are not multiple of 8. This makes all the pages very much fragmented

    DELETE FROM Page WHERE c1%8 <> 0

    -- check the space used by this table . I hope it hasn't reduced

    sp_spaceused Page

    -- lets now shrink the data file first

    DBCC SHRINKFILE (N'Page_Data' , 20)

    -- lets see what did we gain by checking the database. It hsn't changed

    -- I will now do a defrag .. It removed for me 1747 references out of the total 2009 pages.

    DBCC INDEXDEFRAG (Page, Page, idx_Page)

    -- lets see the space used by the table .. Atleast for me the table size reduced significantly

    sp_spaceused Page

    -- at this stage you may or may not want to shrink depending on whether u want ro don't want this space to be released from database.

    -- lets see what shrink does.. it does release space from data file

    DBCC SHRINKFILE (N'Page_Data' , 20)

    Last but not the least I did mention

    "All this will provide you some space, but how much is required and how long can it sustain assuming that the existing database is going to expand, is a bigger question" 🙂

  • Josh,

    DBCC INDEXDEFRAG is another deprecated feature, and will be removed from the next version of SQL Server. You should use ALTER INDEX REORGANIZE instead.

    Your example, as I am sure you are aware, is extremely artificial - and probably represents close to the maximum space savings possible. You will recall that I said that defragmentation typically did not result in significant space savings. I was very careful not to make an absolute statement about it.

    That said, I fully stand by my original comments - nothing in your script changes a thing.

    Can I assume that you accept the other points in my post without quibble?

    Paul

  • Paul,

    I that case I should also focus that I mentioned

    "Check if defrag is required"

    Cheers

    V

  • joshhhhhh (2/20/2010)


    I that case I should also focus that I mentioned "Check if defrag is required"

    Josh/V,

    I really don't see how that is relevant. 😛

    Paul

  • It's one thing to check for fragmentation, but that's a performance thing, not a space thing. If this is Enterprise, compression should help you. However likely what you need to do for other projects is add space. Playing space games is a good way to get yourself into trouble.

    A few thoughts:

    1. make sure you have enough space for this server to run, which includes free space in the database to allow for maintenance and data operations.

    2. You ought to have backups on a separate drive, but even if you do want to co-locate with data, you could move copies of these immediately to another server and reduce what you need to keep live on this one.

    3. If you add projects/dbs, you might need more tempdb space, so account for that.

  • I guess I have to do page level compression and do it on each partition one by one... how about sparse column? Can it be done on an existing table?

  • thanks for all the response and helpful tips guys.

  • Ghanta (2/20/2010)


    I guess I have to do page level compression and do it on each partition one by one... how about sparse column? Can it be done on an existing table?

    Do you have a lot of columns with mostly NULL values?

    See Using Sparse Columns in Books Online for for information.

    The following code demonstrates converting an existing column to SPARSE, and also shows how to use a column_set column.

    -- Drop the temp table if it already exists

    IF OBJECT_ID(N'tempdb..#Test', N'U') IS NOT NULL DROP TABLE #Test;

    GO

    -- Create a test table with two NULLable columns

    CREATE TABLE #Test (A INT IDENTITY PRIMARY KEY NOT NULL, B INT NULL, C INT NULL);

    GO

    -- Add some sample data

    INSERT #Test

    (B, C)

    VALUES (NULL, 1),

    (NULL, NULL),

    (1, NULL),

    (NULL, 2),

    (NULL, NULL),

    (2, NULL);

    GO

    -- Add a column set

    ALTER TABLE #Test ADD special_columns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS;

    -- Change the NULLable columns to sparse

    ALTER TABLE #Test ALTER COLUMN B INT SPARSE NULL;

    ALTER TABLE #Test ALTER COLUMN C INT SPARSE NULL;

    GO

    -- Show the data

    SELECT * FROM #Test; -- Just columns A and special_columns

    SELECT A, B, C, special_columns FROM #Test; -- All columns

    GO

    -- Clean up

    DROP TABLE #Test;

    Paul

  • Thanks for the response... I am sure I can find few columns that has mostly nulls values and for those Sparse Properties is a good idea. Also, how about downtime for tables of 2 billion records if I want to alter using sparse and also do a page level compression? They are partitioned by month. Wanted to learn from others experience on this... thanks for the help.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply