July 4, 2018 at 7:27 am
Dear all,
We have a BI solution
In this solution we have stading area, a data mart and a cube.
We are thinnking in implementing data compression.
We will use columnstore indexes and I would like to ask, if we gain anything to apply data compression (page or row level) to all the columns, including the ones that will have column index.
Can someone help on this?
Thanks you,
Pedro
July 4, 2018 at 12:42 pm
Thank you for your reply. We have a bi application that takes information from views that that exist in an oracle db. We get this information into our side into our staging area using ssis then we use ssis again to put the information from staging into our data Mart. Then we process our cube which is tabular made. Our Oracle data source will start to send us millions of rows of information. One of step we are going for performance is use columnstors index. Question would be, I believe that every table that is suffer lots of inserts and deletes and not much updates is a great candidate to be compressed at row or page level. Not only from a storage perspective but also from a performance perspective. But we will use also columnstoredindex. On tables that use columnindex, do we still gain to have data compression on all it's field, even on the columns that are column indexed? Or I should not compress tables ( at row or page level) if they have columnstors index?
July 4, 2018 at 12:43 pm
Thank you for your support.
July 4, 2018 at 1:34 pm
river1 - Wednesday, July 4, 2018 12:42 PMThank you for your reply. We have a bi application that takes information from views that that exist in an oracle db. We get this information into our side into our staging area using ssis then we use ssis again to put the information from staging into our data Mart. Then we process our cube which is tabular made. Our Oracle data source will start to send us millions of rows of information. One of step we are going for performance is use columnstors index. Question would be, I believe that every table that is suffer lots of inserts and deletes and not much updates is a great candidate to be compressed at row or page level. Not only from a storage perspective but also from a performance perspective. But we will use also columnstoredindex. On tables that use columnindex, do we still gain to have data compression on all it's field, even on the columns that are column indexed? Or I should not compress tables ( at row or page level) if they have columnstors index?
I guess my take on this would be to do the aggregations for your datamart in the native environment for the raw data and then only store the results you need in your datamart. In this case, that would mean writing code in Oracle and then reading just those results using SQL Server.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 4, 2018 at 1:56 pm
river1 - Wednesday, July 4, 2018 7:27 AMDear all,
We have a BI solution
In this solution we have stading area, a data mart and a cube.
We are thinnking in implementing data compression.
We will use columnstore indexes and I would like to ask, if we gain anything to apply data compression (page or row level) to all the columns, including the ones that will have column index.
Can someone help on this?
Thanks you,
Pedro
Data compression will most likely be beneficial as the compressed pages are also compressed in memory, which means that more data pages will fit into memory. The columnstore indexes use columnstore compression and are not affected.
😎
July 4, 2018 at 3:21 pm
Thank you very much. Let me just add another point. We are using tabular model and we expect to have millions and millions of rows on the facts. I don't know if this will not enforce even more the need of a data page compression asside the fact that we will have columnstoredindexes.
July 4, 2018 at 5:30 pm
Tabular and columnstore by default use compression - and you can't really change it the same way you do the normal data - columnstore you can define one of COLUMNSTORE or COLUMNSTORE_ARCHIVE - read the manuals to figure out what they are/do.
What you may consider for these columns is look at its cardinality and data type and eventually split the columns into 2-3 or even more parts.
For example if you have a datetime column with the time portion populated splitting it into 2 columns , one for date part, the other for time part will significantly improve its compression on tabular/columnstore mode.
see https://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/ for further examples and explanation.
On tabular processing there are other things you can do to improve compression - some are on the example above, others are related to segment processing size - but this comes at a cost on processing time so needs to be tested and determine what is the best process point.
July 9, 2018 at 2:09 pm
Sorry this last answer I did not understood.
I got this reply in my company:
"On a object you can setup eithercolumnstore or page/row compression. Therefore you cannot technically have thesame table stored in CCI and at the same time use page/row compression."
I thought that on one object (table) I could set columnstore and page compreesion....
July 9, 2018 at 4:34 pm
Hi,
Some options for compression are still available with columnstore tables. See https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017#using-columnstore-and-columnstore-archive-compression
The stored procedure sp_estimate_data_compression_savings may also be useful for estimating what compression will do across different tables.
July 9, 2018 at 8:10 pm
I'm about a week from compressing a 1.4 TeraByte database to about 400GB and a 1 Terabyte database to slightly less than that... and I'm not using CCI and I'm not using row or page compression. It's called a "data cleanup", which no one seems to actually do anymore. 😉 Heh... even crap code is going to run about twice as fast.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply