Clustered Index leaves a lot of empty space

Question

Clustered Index leaves a lot of empty space

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101103 More actions · Answer 1

Jeff Moden wrote:

It was not "obvious" that you were speaking of VARCHAR because they are NOT affected by row compression at all. Only CHAR is affected by row compression and you clearly stated CHAR.
The ratio of 25 GB to 1 Tb works out to 0.025, which is 2.5%. If we use Microsoft's definition of a maximum of 8060 bytes of user definable space and being generous on the Fill Factor of 97%, the leaves 241 bytes for expansion. If you have 24 rows in the table, that allows you enough room to expand each of the rows by only 10 bytes. 11 Bytes per row will cause a page split. That's without compression.
With row compression, you might get lucky with a 2:1 compression, which would give you 48 rows per page. If your "Expansive" update was on CHAR (which is now expansive) or VARCHAR, your 10 byte update per row is still going to be 10 bytes per row. 10 *48 isn't going to fit into 241 bytes and so you're guaranteed to have page splits. The good part is you can lower the Fill Factor more on the compressed table, which might now only be a half TB if you're lucky. Ah... but wait a minute... the compression has made a whole lot of things that used to be fixed width become variable width. What used to be a mere 10 byte update might be a whole lot more. On top of all that, you've also slowed down your queries even if the data resides in memory.
Could it all be worked out? Sure... well... maybe. What is the actual trade off? How badly will those "only once every 3 months {updates}" affect the table and where? Is it going to cause the most recent data (which is typically also the most read data) to madly fragment?
The only time that it's safe to recommend compression without doing a whole lot of work is if you can guarantee that you're only doing Inserts on an ever-increasing keyed index and you can also guarantee that there will be no updates. I say it's "safe" but not really. You have to do use the proc that estimates the savings and then also ask your self how many times the table will be read for large reports both large and small and try to figure out if it's going to be worth the performance hit, especially and ironically if you're using SSDs, which don't take much of a hit on read-aheads as spinning rust did in the past... that even takes away one of the justifications for compression.
Again, my point isn't to avoid page compression. My point is that people should stop recommending it to neophytes (all forums have neophytes that are just lurkers) without mentioning the other potentially serious problems like a crazy increase in fragmentation that basically destroys the memory savings and the fact a monster index will have to be rebuilt where it may not have been fragmented at all without compression.
We need to find out what the OP means by "This table only gets updated quarterly...". We can't even recommend partitioning of the data and compressing the "non-updated" data because we don't know where the "updated" data is going to be done.
Speaking of the OP, I hope none of this scares him off. The non-compressed solution I provided to prevent the huge size increase and tremendous waste in the form of free-space works a treat for me and it will for him, as well. Once he tells us what the quarterly updates consist of, then we can make other suggestions and recommendations, including page compression (or not).

"With row compression, you might get lucky with a 2:1 compression". That wouldn't be lucky, that would be truly miraculous. 2:1 from just row compression??

I explicitly stated page compression, so, yes, varchar could be affected, and quite significantly. When you say, "row compression won't affect varchar at all", I agree, which is why I explicitly never mentioned row compression. That will of course always be done as part of page compression, and it can be helpful, but it's not going to give you the big payback of page compression.

As to growth issue, why on earth would you assume *every* row is expanding? That's extraordinarily unlikely, esp. as rows age.

Also, the fillfactor would be 98%, not 97%. Not every row will leave exactly just 2% of space, so you need to account for an extra half percent or so (usually less but be safe).

You state, "A monster index will have to be rebuilt where it may not have been fragmented at all without compression." What is the basis for that claim? If you're expanding varchar columns, you've vastly more likely to fragment an uncompressed table than a page-compressed one, esp. if the varchar value is common among the rows already on that page.

In conclusion, if OP wants to post more details, particularly full DDL, yes, we can make far more specific recommendations. But, absent that, we can make only general recommendations, and my general recommendation for these types of tables is, and will remain, to strongly consider page compressing them.

Yes, you're correct about neophytes, I tend to think, and write, like a DBA (and I don't just mean by being boring 🙂 ).

OP: if you're not familiar with page compression, please ask any qs you may have about it. MS provides a useful function to help pre-determine how much space you would likely save using it:

EXEC sys.sp_estimate_data_compression_savings 'dbo', 'your_table_name', NULL, NULL, 'PAGE';

This reply was modified 2 years, 3 months ago by ScottPletcher.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 2

ScottPletcher wrote:

You state, "A monster index will have to be rebuilt where it may not have been fragmented at all without compression." What is the basis for that claim?

Same as what I've said many times now, Scott... The documentation. A lot of columns that were previously fixed width suddenly become variable width and updates from things like NULL to a date or a time or both and a whole lot of numeric values will become expansive during an update.

ScottPletcher wrote:

If you're expanding varchar columns, you've vastly more likely to fragment an uncompressed table than a page-compressed one, esp. if the varchar value is common among the rows already on that page.

Heh... ok... you've convinced me... I'll setup a couple of tests because I believe the opposite is true and I know you won't provide even one. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 3

@haahof,

I'd like to setup a test table to do the some testing on for all of this. With in mind...

Could you tell us what is meant by "This table only gets updated quarterly". When you say "updated", what does that actually mean? Are you just adding new rows, changing values in new rows after they've been inserted, changing values in old rows , or some combination? If, by "updated", you mean that you're only inserting new rows at the logical end of the clustered index, I don't need to know much more than that. If that's not what you're doing, then I could really use some help as follows...

Also, could you provide the CREATE TABLE statement and the definition of the Clustered Index you added?

Last, but not least, would it be possible for you to provide a couple of rows of information so I have a better model of what the test data should look like. Obviously, no proprietary, PHI, or PII should be included.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101103 More actions · Answer 4

Jeff Moden wrote:

ScottPletcher wrote:
You state, "A monster index will have to be rebuilt where it may not have been fragmented at all without compression." What is the basis for that claim?
Same as what I've said many times now, Scott... The documentation. A lot of columns that were previously fixed width suddenly become variable width and updates from things like NULL to a date or a time or both and a whole lot of numeric values will become expansive during an update.
ScottPletcher wrote:
If you're expanding varchar columns, you've vastly more likely to fragment an uncompressed table than a page-compressed one, esp. if the varchar value is common among the rows already on that page.
Heh... ok... you've convinced me... I'll setup a couple of tests because I believe the opposite is true and I know you won't provide even one. 😉

And as I previously said, if you have big issues with lots of 3-month old, fixed-length columns going from NULL to a value (rather odd), use a placeholder value rather than NULL; you can make that completely transparent to the app by using a view and a trigger (you can avoid the trigger by having the app insert the placeholder value, but I prefer to make it totally transparent to the app).

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 5

ScottPletcher wrote:

Jeff Moden wrote:
ScottPletcher wrote:
You state, "A monster index will have to be rebuilt where it may not have been fragmented at all without compression." What is the basis for that claim?
Same as what I've said many times now, Scott... The documentation. A lot of columns that were previously fixed width suddenly become variable width and updates from things like NULL to a date or a time or both and a whole lot of numeric values will become expansive during an update.
ScottPletcher wrote:
If you're expanding varchar columns, you've vastly more likely to fragment an uncompressed table than a page-compressed one, esp. if the varchar value is common among the rows already on that page.
Heh... ok... you've convinced me... I'll setup a couple of tests because I believe the opposite is true and I know you won't provide even one. 😉
And as I previously said, if you have big issues with lots of 3-month old, fixed-length columns going from NULL to a value (rather odd), use a placeholder value rather than NULL; you can make that completely transparent to the app by using a view and a trigger (you can avoid the trigger by having the app insert the placeholder value, but I prefer to make it totally transparent to the app).

At this point, you should show us with demonstrable code, Scott.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101103 More actions · Answer 6

Here's a sample table and the corresponding view, with one non-nullable datetime and one non-nullable int column that have placeholder values that return NULL when querying the table.

For now, I'll leave the corresponding trigger(s) as an interesting exercise to do.

USE tempdb;

DROP VIEW IF EXISTS dbo.tablename;
DROP TABLE IF EXISTS dbo.tablename_raw;
GO
CREATE TABLE dbo.tablename ( id int identity(1, 1) not null, 
    datetime1 datetime not null, int1 int not null )
INSERT INTO dbo.tablename VALUES('19010101', -2147483648), /*placeholder values, should show NULL when SELECTed*/    
    (GETDATE()-1, 22), /*standard row values*/    
    (GETDATE(), -2147483648) /*mixed row*/

EXEC sys.sp_rename 'dbo.tablename', 'tablename_raw'
GO
CREATE VIEW dbo.tablename
AS
SELECT id, NULLIF(datetime1, '19010101') AS datetime1, NULLIF(int1, -2147483648) AS int1
FROM dbo.tablename_raw
GO

SELECT * FROM dbo.tablename /*standard SELECT against table (which is now a view, but that is transparent to users)*/
SELECT * FROM dbo.tablename_raw /*non-standard SELECT available to developers to see actual contents of table*/

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

ScottPletcher SSC Guru Points: 101103 More actions · Answer 7

And here's the trigger. The app INSERTs / UPDATEs NULL values, and SELECTs return NULL values, but a NULL never actually appears in the table.

Again, I've never had this type of issue be significant when doing page compression, but if it does for a 3TB table, then it would be worth doing something like this to allow you to fully compress the data.

USE tempdb;
DROP VIEW IF EXISTS dbo.tablename;
DROP TABLE IF EXISTS dbo.tablename_raw;
GO
CREATE TABLE dbo.tablename (
    id int IDENTITY(1, 1) NOT NULL
        CONSTRAINT tablename__PK PRIMARY KEY,
    int1 int NOT NULL,
    varchar1 varchar(30) NULL,
    datetime1 datetime NOT NULL,
    datetime2_using_placeholder datetime NULL,
    int2_using_placeholder int NULL,
    char2_using_placeholder char(4) NULL
    )
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE schema_id = 1 AND name = 'tablename')
    EXEC sys.sp_rename 'dbo.tablename', 'tablename_raw'
GO
CREATE VIEW dbo.tablename
AS
SELECT
    id, int1, varchar1, datetime1,
    NULLIF(datetime2_using_placeholder, '19010101') AS datetime2_using_placeholder, 
    NULLIF(int2_using_placeholder, -2147483648) AS int2_using_placeholder,
    NULLIF(char2_using_placeholder, '~~~~') AS char2_using_placeholder
FROM dbo.tablename_raw
/*end of view*/GO
CREATE TRIGGER dbo.tablename_raw__TRG_INS_UPD
ON dbo.tablename_raw
AFTER INSERT, UPDATE
AS
SET NOCOUNT ON;
UPDATE tr
SET 
    datetime2_using_placeholder = ISNULL(tr.datetime2_using_placeholder, '19010101'),
    int2_using_placeholder = ISNULL(tr.int2_using_placeholder, -2147483648),
    char2_using_placeholder = ISNULL(tr.char2_using_placeholder, '~~~~')
FROM dbo.tablename_raw tr
INNER JOIN inserted i ON i.id = tr.id
WHERE
    i.datetime2_using_placeholder IS NULL OR
    i.int2_using_placeholder IS NULL OR
    i.char2_using_placeholder IS NULL
/*end of trigger*/GO

INSERT INTO dbo.tablename_raw VALUES(1, 'A', GETDATE()-5, NULL, 17, 'ABCD')
INSERT INTO dbo.tablename_raw VALUES(2, 'B', GETDATE()-4, GETDATE()-4, NULL, 'BCDE')
INSERT INTO dbo.tablename_raw VALUES(3, 'C', GETDATE()-3, NULL, 11, NULL)
SELECT * FROM dbo.tablename;

UPDATE dbo.tablename_raw
SET int2_using_placeholder = NULL
WHERE int2_using_placeholder = 11 AND int1 = 3
GO

SELECT * FROM dbo.tablename;
SELECT * FROM dbo.tablename_raw;
GO
DROP VIEW IF EXISTS dbo.tablename;
DROP TABLE IF EXISTS dbo.tablename_raw;
/*end of script*/

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

ScottPletcher SSC Guru Points: 101103 More actions · Answer 8

Of course that's just the general technique. You'd want to use random values in a range to represent NULLs -- such as any date between, say, Jan 01, 1901 and Dec 31, 1979 -- rather than a single value. But that's relatively easy to adjust.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 9

ScottPletcher wrote:

Here's a sample table and the corresponding view, with one non-nullable datetime and one non-nullable int column that have placeholder values that return NULL when querying the table.

For now, I'll leave the corresponding trigger(s) as an interesting exercise to do.

USE tempdb;

DROP VIEW IF EXISTS dbo.tablename;
DROP TABLE IF EXISTS dbo.tablename_raw;
GO
CREATE TABLE dbo.tablename ( id int identity(1, 1) not null, 
    datetime1 datetime not null, int1 int not null )
INSERT INTO dbo.tablename VALUES('19010101', -2147483648), /*placeholder values, should show NULL when SELECTed*/    
    (GETDATE()-1, 22), /*standard row values*/    
    (GETDATE(), -2147483648) /*mixed row*/

EXEC sys.sp_rename 'dbo.tablename', 'tablename_raw'
GO
CREATE VIEW dbo.tablename
AS
SELECT id, NULLIF(datetime1, '19010101') AS datetime1, NULLIF(int1, -2147483648) AS int1
FROM dbo.tablename_raw
GO

SELECT * FROM dbo.tablename /*standard SELECT against table (which is now a view, but that is transparent to users)*/
SELECT * FROM dbo.tablename_raw /*non-standard SELECT available to developers to see actual contents of table*/

I don't understand what that has to do with this tread, Scott. Are you sure you posted to the correct thread?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)