Summary table for roll-up

Question

Summary table for roll-up

SvetNas

SSC-Addicted

Points: 476
More actions
October 4, 2016 at 9:15 am

#317998

Hi, I'm working with a large table (70 mil rows, 100+ columns and growing) used in reports which are slowing down as the table size grows. The data in the table is at the lowest granularity so all calculations happen in real time and take minutes to complete (which means very slow report generation). Not sure why it was designed the way it is (used tableau for 2-3 weeks only) but looking for a way to avoid real-time calculations that take minutes. My gut was to create a summary table and use it as a source. Tested it out aggregated the data to a higher granularity (size went down to 1 mil rows) and with the new source the report query time is down to 1 sec. All good for the base report but works only at the granularity chosen for the summary table and summing up to a higher level(rolling up) results in double counting some records that fall into more than 1 category.
Example:
'YEAR' - 'RETAIL CHAIN' - 'STORE' - 'STORE BRANCH' - 'LOCATION' - COUNT(Distinct EmployeeID)
In the summary table this returns correct distinct number of Employees per Store Branch (the lowest necessary level), but rolling up to Store or to Retail Chain inflates the counts since employees can work at more than 1 Store Branch or Store. For example, getting the distinct number of employees only for Stores 1, 2 and 3 can't be done by summing up the counts because of possible overlap.
How can I aggregate the data in advance in a way that it allows accurate roll up and drill down ( there are lower levels of granularity that are definitely not necessary)?
Thanks!
SN

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 76997 More actions · Answer 1

Unfortunately distinct counts is one of the properties that is not associative, so you need to do all levels of the calculation at the same time. When creating your summary table, use the ROLLUP option.

insert into #StoreSummary

([Year],[RETAIL CHAIN],[STORE],[STORE BRANCH],[Employee Count])

select [Year],[RETAIL CHAIN],[STORE],[STORE BRANCH],count(distinct [EmployeeID]) AS [Employee Count]

from #StoreHist

group by ROLLUP([Year],[RETAIL CHAIN],[STORE],[STORE BRANCH])

You may also want to use the GROUPING function to alter the output to get more descriptive results for the higher granularity summaries.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Eric M Russell SSC Guru Points: 125562 More actions · Answer 2

So, you have this flat denormalized table containing nearly 100 million rows and 100+ columns. The problem is you need to report from it; performing rollups at various levels, aggregation on computed columns, and this all needs to be done at runtime rather quickly. That's practically a text book application for Analysis Services(which comes bundled with SQL Server Enterprise Edition) or a similar OLAP tool. But this will involve a learning curve and some retrofitting to existing reports.

Otherwise, using the existing table and SQL as is, you can implement page compression, reducing page reads by 50% - 70%, which can potentially cut your runtime processing in half or better.

Other options are indexed views to support specific reporting case usages, or if there is a situation where most of your queries only reference a small subset of columns, then consider covering indexes.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

drew.allen SSC Guru Points: 76997 More actions · Answer 3

Eric M Russell (10/4/2016)
So, you have this flat denormalized table containing nearly 100 million rows and 100+ columns. The problem is you need to report from it; performing rollups at various levels, aggregation on computed columns, and this all needs to be done at runtime rather quickly. That's practically a text book application for Analysis Services(which comes bundled with SQL Server Enterprise Edition) or a similar OLAP tool. But this will involve a learning curve and some retrofitting to existing reports.
Otherwise, using the existing table and SQL as is, you can implement page compression, reducing page reads by 50% - 70%, which can potentially cut your runtime processing in half or better.
Other options are indexed views to support specific reporting case usages, or if there is a situation where most of your queries only reference a small subset of columns, then consider covering indexes.

SSAS involves a very steep learning curve. I tried to teach myself SSAS when I was much further along in my SQL knowledge than the OP currently is, and it was still a complete and utter disaster. It was only after we had purchased a third-party SSAS solution and I had had time to work with it that I knew enough to create my own SSAS solution.

I don't think that SSAS is a viable option in this case, unless you are volunteering to go set it up for them.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

SvetNas SSC-Addicted Points: 476 More actions · Answer 4

@drew.allen thanks for the great suggestion! Roll up definitely helps with some levels of aggregation. I don't think it could be used to pre-aggregate data for all subsets, could it ? Example: distinct number of employees for Stores 1, 2 & 3 in a Retail Chain with 10 stores, or a random combination that a user could choose to filter on - summing up the numbers will still inflate the total if there is any overlap.

Is there a way to do that? Or is this a lost cause?

drew.allen SSC Guru Points: 76997 More actions · Answer 5

SvetNas (10/4/2016)
@drew.allen thanks for the great suggestion! Roll up definitely helps with some levels of aggregation. I don't think it could be used to pre-aggregate data for all subsets, could it ? Example: distinct number of employees for Stores 1, 2 & 3 in a Retail Chain with 10 stores, or a random combination that a user could choose to filter on - summing up the numbers will still inflate the total if there is any overlap.
Is there a way to do that? Or is this a lost cause?

ROLLUP uses the hierarchy specified by the order of the expressions in the ROLLUP clause. If you want all possible levels of aggregation, you should use CUBE instead.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

SvetNas SSC-Addicted Points: 476 More actions · Answer 6

Eric M Russell (10/4/2016)
So, you have this flat denormalized table containing nearly 100 million rows and 100+ columns. The problem is you need to report from it; performing rollups at various levels, aggregation on computed columns, and this all needs to be done at runtime rather quickly. That's practically a text book application for Analysis Services(which comes bundled with SQL Server Enterprise Edition) or a similar OLAP tool. But this will involve a learning curve and some retrofitting to existing reports.
Otherwise, using the existing table and SQL as is, you can implement page compression, reducing page reads by 50% - 70%, which can potentially cut your runtime processing in half or better.
Other options are indexed views to support specific reporting case usages, or if there is a situation where most of your queries only reference a small subset of columns, then consider covering indexes.

Eric thanks for responding! I followed your advise and added page level compression to the table. There is a slight improvement in some of the smaller report queries, but not in the ones that perform the most calculations and take minutes to complete. The table has a non-clustered columnstore index on the columns used in joins and filters.

SvetNas SSC-Addicted Points: 476 More actions · Answer 7

drew.allen (10/4/2016)
SvetNas (10/4/2016)
@drew.allen thanks for the great suggestion! Roll up definitely helps with some levels of aggregation. I don't think it could be used to pre-aggregate data for all subsets, could it ? Example: distinct number of employees for Stores 1, 2 & 3 in a Retail Chain with 10 stores, or a random combination that a user could choose to filter on - summing up the numbers will still inflate the total if there is any overlap.
Is there a way to do that? Or is this a lost cause?
ROLLUP uses the hierarchy specified by the order of the expressions in the ROLLUP clause. If you want all possible levels of aggregation, you should use CUBE instead.
Drew

I'll use this as part of the solution. Thanks Drew!