February 19, 2010 at 12:39 pm
Hi Guys,
We have a database that is 2tb in size with 2 tables that are partitioned (monthly partition) in sql 2008. I want to make some space available from the server for other projects... please advice me what best I should do in order to compress the data? Tables have 2 billions records...Please advice me the approaches in order to free up some space. Thanks a lot in advance for the suggestions.
February 19, 2010 at 12:56 pm
You need to take care of the following things
1. Is the table partitioned. If not it would be great if you could partition it?
2. Based on the querying needs you could compress the data in the older partitions and keep it uncompressed for the latest ones
Cheers
V
February 19, 2010 at 8:36 pm
yes the tables are partitioned... so joshhh you think page level partition should work fine... anything else I can do to free up some space? I guess database shrink should free up unused space as well... thanks for the response.
February 19, 2010 at 10:03 pm
1. Yes , you will have to shrink the data files in order to release that space from that database. (Shrink might require considerable downtime)
2. Check if defrag is required, If so do defrag and then shrink
3. Do archival of non required data
All this will provide you some space, but how much is required and how long can it sustain assuming that the existing database is going to expand, is a bigger question
V
February 19, 2010 at 11:08 pm
See Creating Compressed Tables and Indexes in Books Online.
See this blog entry by the SQL Server Storage Engine Team for more information.
SQL Server provides a facility to estimate the space savings from row or page compression before you commit to anything. See sp_estimate_data_compression_savings in Books Online.
Compression is a complex topic - there's no way to cover it comprehensively in a single post.
You must be running Enterprise Edition or equivalent to use data compression. Carefully assess the risks and benefits before implementing compression. If you don't have the skills in-house to make this assessment, I would encourage you to seek expert advice.
Paul
February 19, 2010 at 11:18 pm
joshhhhhh
Yes , you will have to shrink the data files in order to release that space from that database. (Shrink might require considerable downtime)
It is generally unwise to shrink a database. See this blog entry by Paul Randal
In particular there is no point shrinking the database if it is just going to grow again.
If you do end up shrinking one or more files, be aware that it is a fully-logged operation.
Shrinking never requires the database to go off-line - it is a fully on-line operation.
Again, I stress that shrinking is almost always the wrong thing to do.
joshhhhhh
Check if defrag is required, If so do defrag and then shrink
That is exactly backwards 😛
Shrinking will undo the good work done by defragmenting. If you absolutely must perform a shrink operation, do it first, and then defragment. It really makes no sense to do it the other way around.
Don't defragment for the sake of it either - you will typically not gain very much space just by doing this. Only defragment structures that are regularly scanned in logical order. Fragmentation does not affect singleton selects for example.
Paul
February 20, 2010 at 1:02 am
Hi Paul
Thanks for your inputs.. Its a small ex I have put down
-- create a db
USE master;
GO
IF EXISTS (SELECT 1 FROM sys.databases WHERE name = 'Page')
DROP DATABASE Page;
GO
CREATE DATABASE Page
ON PRIMARY
( NAME='Page_Data',
FILENAME = 'D:\Work\POC\DatabaseEngine\Data\Page_Data.mdf',
SIZE=10MB,
MAXSIZE=20MB,
FILEGROWTH=1MB)
LOG ON
( NAME='Page_Log',
FILENAME = 'D:\Work\POC\DatabaseEngine\Data\Page_Log.ldf',
SIZE=1MB,
MAXSIZE=100MB,
FILEGROWTH=1MB);
GO
-- create a table
CREATE TABLE Page (c1 int identity(1,1), c2 varchar(900))
CREATE CLUSTERED INDEX idx_Page ON Page (c1)
-- populate this table
DECLARE @i int;
SET @i = 1;
WHILE (@i<=2000)
BEGIN
INSERT INTO Page VALUES (REPLICATE('a',900))
INSERT INTO Page VALUES (REPLICATE('a',900))
INSERT INTO Page VALUES (REPLICATE('a',900))
INSERT INTO Page VALUES (REPLICATE('a',900))
INSERT INTO Page VALUES (REPLICATE('a',900))
INSERT INTO Page VALUES (REPLICATE('a',900))
INSERT INTO Page VALUES (REPLICATE('a',900))
INSERT INTO Page VALUES (REPLICATE('a',900))
SET @i = @i + 1;
END
-- check how many pages allocated
DBCC IND (Page,Page,1)
-- check the space used by this table .. note it down
sp_spaceused Page
-- since each page has 8 rows delete rows that are not multiple of 8. This makes all the pages very much fragmented
DELETE FROM Page WHERE c1%8 <> 0
-- check the space used by this table . I hope it hasn't reduced
sp_spaceused Page
-- lets now shrink the data file first
DBCC SHRINKFILE (N'Page_Data' , 20)
-- lets see what did we gain by checking the database. It hsn't changed
-- I will now do a defrag .. It removed for me 1747 references out of the total 2009 pages.
DBCC INDEXDEFRAG (Page, Page, idx_Page)
-- lets see the space used by the table .. Atleast for me the table size reduced significantly
sp_spaceused Page
-- at this stage you may or may not want to shrink depending on whether u want ro don't want this space to be released from database.
-- lets see what shrink does.. it does release space from data file
DBCC SHRINKFILE (N'Page_Data' , 20)
Last but not the least I did mention
"All this will provide you some space, but how much is required and how long can it sustain assuming that the existing database is going to expand, is a bigger question" 🙂
February 20, 2010 at 3:54 am
Josh,
DBCC INDEXDEFRAG is another deprecated feature, and will be removed from the next version of SQL Server. You should use ALTER INDEX REORGANIZE instead.
Your example, as I am sure you are aware, is extremely artificial - and probably represents close to the maximum space savings possible. You will recall that I said that defragmentation typically did not result in significant space savings. I was very careful not to make an absolute statement about it.
That said, I fully stand by my original comments - nothing in your script changes a thing.
Can I assume that you accept the other points in my post without quibble?
Paul
February 20, 2010 at 6:34 am
Paul,
I that case I should also focus that I mentioned
"Check if defrag is required"
Cheers
V
February 20, 2010 at 6:48 am
joshhhhhh (2/20/2010)
I that case I should also focus that I mentioned "Check if defrag is required"
Josh/V,
I really don't see how that is relevant. 😛
Paul
February 20, 2010 at 10:30 am
It's one thing to check for fragmentation, but that's a performance thing, not a space thing. If this is Enterprise, compression should help you. However likely what you need to do for other projects is add space. Playing space games is a good way to get yourself into trouble.
A few thoughts:
1. make sure you have enough space for this server to run, which includes free space in the database to allow for maintenance and data operations.
2. You ought to have backups on a separate drive, but even if you do want to co-locate with data, you could move copies of these immediately to another server and reduce what you need to keep live on this one.
3. If you add projects/dbs, you might need more tempdb space, so account for that.
February 20, 2010 at 8:28 pm
I guess I have to do page level compression and do it on each partition one by one... how about sparse column? Can it be done on an existing table?
February 20, 2010 at 8:28 pm
thanks for all the response and helpful tips guys.
February 20, 2010 at 10:37 pm
Ghanta (2/20/2010)
I guess I have to do page level compression and do it on each partition one by one... how about sparse column? Can it be done on an existing table?
Do you have a lot of columns with mostly NULL values?
See Using Sparse Columns in Books Online for for information.
The following code demonstrates converting an existing column to SPARSE, and also shows how to use a column_set column.
-- Drop the temp table if it already exists
IF OBJECT_ID(N'tempdb..#Test', N'U') IS NOT NULL DROP TABLE #Test;
GO
-- Create a test table with two NULLable columns
CREATE TABLE #Test (A INT IDENTITY PRIMARY KEY NOT NULL, B INT NULL, C INT NULL);
GO
-- Add some sample data
INSERT #Test
(B, C)
VALUES (NULL, 1),
(NULL, NULL),
(1, NULL),
(NULL, 2),
(NULL, NULL),
(2, NULL);
GO
-- Add a column set
ALTER TABLE #Test ADD special_columns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS;
-- Change the NULLable columns to sparse
ALTER TABLE #Test ALTER COLUMN B INT SPARSE NULL;
ALTER TABLE #Test ALTER COLUMN C INT SPARSE NULL;
GO
-- Show the data
SELECT * FROM #Test; -- Just columns A and special_columns
SELECT A, B, C, special_columns FROM #Test; -- All columns
GO
-- Clean up
DROP TABLE #Test;
Paul
February 22, 2010 at 11:49 am
Thanks for the response... I am sure I can find few columns that has mostly nulls values and for those Sparse Properties is a good idea. Also, how about downtime for tables of 2 billion records if I want to alter using sparse and also do a page level compression? They are partitioned by month. Wanted to learn from others experience on this... thanks for the help.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply