Rollup/Aggregate tables and maintenance

Question

Rollup/Aggregate tables and maintenance

paul 69259

Hall of Fame

Points: 3068
More actions
June 27, 2018 at 9:40 am

#406795

Hello,
We have numerous customers, each with their own database and in those databases we have a table called jobs. This basically contains all deliveries and collections, packs, cost, charge and weight etc..... for our customer and their customers.
What we want to do is create and aggregate table on our cloud server called jobs_aggregate(or whatever) which will show us each customers total jobs, packs, weight, cost and charge. Using this cloud server we can create an android app that will let the managers of our various customers see at a glance the totals as above for their particular depot.
What's the best way to implement this, we can create a local aggregate table or even aggregate database and then have that to sync with our cloud server but what's the best way with least overhead to maintain the local aggregate database? Do we use triggers and every time a job is added, deleted or changed to update the aggregate or are the any other ways we could consider doing this?
Thank you in advance
Paul.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Sue_H SSC Guru Points: 90860 More actions · Answer 1

paul 69259 - Wednesday, June 27, 2018 9:40 AM
Hello,
We have numerous customers, each with their own database and in those databases we have a table called jobs. This basically contains all deliveries and collections, packs, cost, charge and weight etc..... for our customer and their customers.
What we want to do is create and aggregate table on our cloud server called jobs_aggregate(or whatever) which will show us each customers total jobs, packs, weight, cost and charge. Using this cloud server we can create an android app that will let the managers of our various customers see at a glance the totals as above for their particular depot.
What's the best way to implement this, we can create a local aggregate table or even aggregate database and then have that to sync with our cloud server but what's the best way with least overhead to maintain the local aggregate database? Do we use triggers and every time a job is added, deleted or changed to update the aggregate or are the any other ways we could consider doing this?
Thank you in advance
Paul.

Just curious but why wouldn't you be able to use a query or view to get the aggregated data?

Sue

paul 69259 Hall of Fame Points: 3068 More actions · Answer 2

Hi Sue,

If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records don’t change.

We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 3

Do indexed views work on the cloud? If they do, then that will likely have all of the features you seek for this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sue_H SSC Guru Points: 90860 More actions · Answer 4

paul 69259 - Thursday, June 28, 2018 2:36 AM
Hi Sue,
If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records donâ€™t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.

A million records isn't much these days. I've seen aggregates used with millions run in under 10 seconds so I think it really depends. But sometimes having those stored elsewhere and trying to keep things in sync can actually use more resources which is why I was asking.
Not sure how the view would be constantly changing unless the underlying tables are changing but maybe that is part of the issue. If the view and underlying tables would not be constantly changing then Jeff offered a great suggestion.

Sue

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 5

paul 69259 - Thursday, June 28, 2018 2:36 AM
Hi Sue,
If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records donâ€™t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.

"It Depends". If someone makes the mistake of filtering on a column that is calculated by the view then, yes, you'll have performance issues even in the presence of proper indexing because the view will have to fully materialize all of the rows before any filters can come into play and the underlying tables or indexes will almost certainly suffer full scans at the best.

Indexed views can be nasty fast for SELECTs but they will slow down inserts and updates not to mention that they will take up extra disk and memory space because they live in a fully materialized state just as surely as if you did a SELECT * from your current view and save the result in a table.

My recommendation is to check to see if any calculated columns in the view are being used to filter on. I'll also suggest that a properly written and parameterized stored procedure or Inline Table Valued Function might serve you much better than a view.

And, yes... I agree with Sue... a million rows is a pretty small thing to be having performance problems with in light of the crazy fast speeds of today's machines and the storage.

Most potential for performance will be found in the code and how the code is used. Unfortunately, we've not seen a lick of your code and can only continue with speculation that may or may not actually be pertinent. Please look under "Helpful Links" in my signature line below and read the article at the second link for how to properly post a performance issue to get the best help.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

paul 69259 Hall of Fame Points: 3068 More actions · Answer 6

Sue_H - Thursday, June 28, 2018 1:54 PM
paul 69259 - Thursday, June 28, 2018 2:36 AM
Hi Sue,
If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records donâ€™t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.
A million records isn't much these days. I've seen aggregates used with millions run in under 10 seconds so I think it really depends. But sometimes having those stored elsewhere and trying to keep things in sync can actually use more resources which is why I was asking.
Not sure how the view would be constantly changing unless the underlying tables are changing but maybe that is part of the issue. If the view and underlying tables would not be constantly changing then Jeff offered a great suggestion.
Sue

Hi Sue,

Yes, the records are constantly changing throughout the day with delivery and collections details of the jobs.

Thank you
Paul.

paul 69259 Hall of Fame Points: 3068 More actions · Answer 7

Jeff Moden - Thursday, June 28, 2018 2:43 PM
paul 69259 - Thursday, June 28, 2018 2:36 AM
Hi Sue,
If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records donâ€™t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.
"It Depends". If someone makes the mistake of filtering on a column that is calculated by the view then, yes, you'll have performance issues even in the presence of proper indexing because the view will have to fully materialize all of the rows before any filters can come into play and the underlying tables or indexes will almost certainly suffer full scans at the best.
Indexed views can be nasty fast for SELECTs but they will slow down inserts and updates not to mention that they will take up extra disk and memory space because they live in a fully materialized state just as surely as if you did a SELECT * from your current view and save the result in a table.
My recommendation is to check to see if any calculated columns in the view are being used to filter on. I'll also suggest that a properly written and parameterized stored procedure or Inline Table Valued Function might serve you much better than a view.
And, yes... I agree with Sue... a million rows is a pretty small thing to be having performance problems with in light of the crazy fast speeds of today's machines and the storage.
Most potential for performance will be found in the code and how the code is used. Unfortunately, we've not seen a lick of your code and can only continue with speculation that may or may not actually be pertinent. Please look under "Helpful Links" in my signature line below and read the article at the second link for how to properly post a performance issue to get the best help.

Hi Jeff,

We have no code as such yet. It's an idea my boss has and we are trying to work out the best way of implementing it.

I haven't really done anything with stored procedures or inline table valued functions. I will look into them.

Thank you
Paul.

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 8

I created all my ETL from scratch in Python. I used Azure DB to store the master scheduler and job logs. Worked out fine in that case. Everything goes to an entirely separate database that's not attached to the data warehouse or the data marts. I think we are still on the lowest possible scale for that Azure DB.

It works great and instead of an Android app, we use PowerBI to create reports off that table where it's refreshed on a schedule.

Sue_H SSC Guru Points: 90860 More actions · Answer 9

paul 69259 - Friday, June 29, 2018 1:59 AM
Sue_H - Thursday, June 28, 2018 1:54 PM
paul 69259 - Thursday, June 28, 2018 2:36 AM
Hi Sue,
If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records donâ€™t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.
A million records isn't much these days. I've seen aggregates used with millions run in under 10 seconds so I think it really depends. But sometimes having those stored elsewhere and trying to keep things in sync can actually use more resources which is why I was asking.
Not sure how the view would be constantly changing unless the underlying tables are changing but maybe that is part of the issue. If the view and underlying tables would not be constantly changing then Jeff offered a great suggestion.
Sue
Hi Sue,
Yes, the records are constantly changing throughout the day with delivery and collections details of the jobs.
Thank you
Paul.

Thanks but data changing does not mean the view is constantly changing. Those are two every different things.
Seems that if the data is constantly changing the table to hold the aggregates would constantly updating. So there may be contention considerations to keep in mind.

Sue