September 18, 2017 at 6:19 am
Hello to all
I was doing some queries to check if i could get any advantage using sparse columns. But i am confuse with my results. Here is what i did
1 - Determinate the % of null values per column
ColumnName | TableName | NullValues | NumberOfRows | DataType | NULL%Sugested | TotalNull% |
ActionLogNumber | Job | 185567 | 185567 | varchar | 0.6 | 1.00 |
DetailOutersOver | Job | 132086 | 185567 | int | 0.64 | 0.71 |
DetailOutersShort | Job | 184011 | 185567 | int | 0.64 | 0.99 |
GrnNumber | Job | 185567 | 185567 | varchar | 0.6 | 1.00 |
GrnRefusedDesc | Job | 185567 | 185567 | varchar | 0.6 | 1.00 |
GrnRefusedReason | Job | 185567 | 185567 | varchar | 0.6 | 1.00 |
OuterCount | Job | 124926 | 185567 | int | 0.64 | 0.67 |
ProofOfDelivery | Job | 177240 | 185567 | int | 0.64 | 0.96 |
RoyaltyCodeDesc | Job | 149383 | 185567 | varchar | 0.6 | 0.81 |
TotalOutersOver | Job | 132038 | 185567 | int | 0.64 | 0.71 |
TotalOutersShort | Job | 125175 | 185567 | int | 0.64 | 0.67 |
2 - Following this link Use Sparse Columns i determinate the columns candidate to be sparse
3 - Apply the script
ALTER TABLE [Job] ALTER COLUMN [TotalOutersShort] int SPARSE NULL
So at this point I was expecting fewer number of pages in the table using this query
SELECT SUM(used_page_count)
FROM sys.dm_db_partition_stats
WHERE object_id = OBJECT_ID('job');
but it actually went up. Furthermore , if i query the table before and after apply the sparse with "SET STATISTICS PROFILE ON" i have some strange results
BEFORE
EstimateIO = 7.020162; AvgRowSize = 603
AFTER
EstimateIO = 5.557199; AvgRowSize = 603
Am i doing something wrong?
Thanks in advance
Henrry
September 18, 2017 at 6:33 am
Sparse columns are only good when you have a very high number of rows with a null value, those percentages are to low (that is if it is 0.64% and not 64%) to benefit from a sparse column.
You will actually use more space storing data in sparse columns if the percentage of the NULL values is so low, as you have seen due to the sparse column overheads to manage the sparse bits etc
September 18, 2017 at 6:41 am
Hi Anthony
Thanks for the replay. But lets see the case of TotalOutersShort column
Null Values = 125175
Total Rows = 185567
So 125175 / 185567 = 0.67 OR 67%
Which is over the recommendation for an int column 0.64 or 64%
Thanks
September 18, 2017 at 7:48 am
I probably wouldn't consider sparse without having 95%+ being null and having a good number of the columns in the table mbeing mostly null. It's a feature with a specific use. It also means that the table can't be row/page compressed.
The 64% for int isn't a recommendation. The page isn't saying that you should switch to sparse with that % null. Since sparse makes rows that have data large, altering the table likely caused a bunch of page splits, hence increasing the page count.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 18, 2017 at 7:51 am
henrrypires - Monday, September 18, 2017 6:41 AMHi AnthonyThanks for the replay. But lets see the case of TotalOutersShort column
Null Values = 125175
Total Rows = 185567So 125175 / 185567 = 0.67 OR 67%
Which is over the recommendation for an int column 0.64 or 64%Thanks
Will teach me to actually read the post properly before jumping in.
When you create the sparse column it actually copies the data into a new column then removes the old column as sparse is a storage layer option not the logical table option, as such that operation requires more space to perform. Now if you got the page count before the cleanup happened as that's a background process we don't control, you may have got some dirty pages in that result set.
Do you have any indexes on TotalOutersShort or any indexes that covers that column? If so that index will grow also from 4 bytes per row to 8 bytes, you will be best looking at filtered indexes and excluding the NULL rows if its a single column index, gets tricky when its a covering index to filter out NULL rows as you filter out rows which may not need to be filtered.
September 18, 2017 at 8:07 am
anthony.green - Monday, September 18, 2017 7:51 AMhenrrypires - Monday, September 18, 2017 6:41 AMHi AnthonyThanks for the replay. But lets see the case of TotalOutersShort column
Null Values = 125175
Total Rows = 185567So 125175 / 185567 = 0.67 OR 67%
Which is over the recommendation for an int column 0.64 or 64%Thanks
Will teach me to actually read the post properly before jumping in.
When you create the sparse column it actually copies the data into a new column then removes the old column as sparse is a storage layer option not the logical table option, as such that operation requires more space to perform. Now if you got the page count before the cleanup happened as that's a background process we don't control, you may have got some dirty pages in that result set.
Do you have any indexes on TotalOutersShort or any indexes that covers that column? If so that index will grow also from 4 bytes per row to 8 bytes, you will be best looking at filtered indexes and excluding the NULL rows if its a single column index, gets tricky when its a covering index to filter out NULL rows as you filter out rows which may not need to be filtered.
Hi Anthony
Indeed the problem is the index. I remove all index on this table and with Sparse columns now contains 230 pages less
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply