June 27, 2018 at 9:40 am
Hello,
We have numerous customers, each with their own database and in those databases we have a table called jobs. This basically contains all deliveries and collections, packs, cost, charge and weight etc..... for our customer and their customers.
What we want to do is create and aggregate table on our cloud server called jobs_aggregate(or whatever) which will show us each customers total jobs, packs, weight, cost and charge. Using this cloud server we can create an android app that will let the managers of our various customers see at a glance the totals as above for their particular depot.
What's the best way to implement this, we can create a local aggregate table or even aggregate database and then have that to sync with our cloud server but what's the best way with least overhead to maintain the local aggregate database? Do we use triggers and every time a job is added, deleted or changed to update the aggregate or are the any other ways we could consider doing this?
Thank you in advance
Paul.
June 27, 2018 at 11:50 am
paul 69259 - Wednesday, June 27, 2018 9:40 AMHello,We have numerous customers, each with their own database and in those databases we have a table called jobs. This basically contains all deliveries and collections, packs, cost, charge and weight etc..... for our customer and their customers.
What we want to do is create and aggregate table on our cloud server called jobs_aggregate(or whatever) which will show us each customers total jobs, packs, weight, cost and charge. Using this cloud server we can create an android app that will let the managers of our various customers see at a glance the totals as above for their particular depot.
What's the best way to implement this, we can create a local aggregate table or even aggregate database and then have that to sync with our cloud server but what's the best way with least overhead to maintain the local aggregate database? Do we use triggers and every time a job is added, deleted or changed to update the aggregate or are the any other ways we could consider doing this?
Thank you in advance
Paul.
Just curious but why wouldn't you be able to use a query or view to get the aggregated data?
Sue
June 28, 2018 at 2:36 am
Hi Sue,
If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records don’t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.
June 28, 2018 at 12:49 pm
Do indexed views work on the cloud? If they do, then that will likely have all of the features you seek for this.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 28, 2018 at 1:54 pm
paul 69259 - Thursday, June 28, 2018 2:36 AMHi Sue,If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records don’t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.
A million records isn't much these days. I've seen aggregates used with millions run in under 10 seconds so I think it really depends. But sometimes having those stored elsewhere and trying to keep things in sync can actually use more resources which is why I was asking.
Not sure how the view would be constantly changing unless the underlying tables are changing but maybe that is part of the issue. If the view and underlying tables would not be constantly changing then Jeff offered a great suggestion.
Sue
June 28, 2018 at 2:43 pm
paul 69259 - Thursday, June 28, 2018 2:36 AMHi Sue,If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records don’t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.
"It Depends". If someone makes the mistake of filtering on a column that is calculated by the view then, yes, you'll have performance issues even in the presence of proper indexing because the view will have to fully materialize all of the rows before any filters can come into play and the underlying tables or indexes will almost certainly suffer full scans at the best.
Indexed views can be nasty fast for SELECTs but they will slow down inserts and updates not to mention that they will take up extra disk and memory space because they live in a fully materialized state just as surely as if you did a SELECT * from your current view and save the result in a table.
My recommendation is to check to see if any calculated columns in the view are being used to filter on. I'll also suggest that a properly written and parameterized stored procedure or Inline Table Valued Function might serve you much better than a view.
And, yes... I agree with Sue... a million rows is a pretty small thing to be having performance problems with in light of the crazy fast speeds of today's machines and the storage.
Most potential for performance will be found in the code and how the code is used. Unfortunately, we've not seen a lick of your code and can only continue with speculation that may or may not actually be pertinent. Please look under "Helpful Links" in my signature line below and read the article at the second link for how to properly post a performance issue to get the best help.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 29, 2018 at 1:59 am
Sue_H - Thursday, June 28, 2018 1:54 PMpaul 69259 - Thursday, June 28, 2018 2:36 AMHi Sue,If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records don’t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.A million records isn't much these days. I've seen aggregates used with millions run in under 10 seconds so I think it really depends. But sometimes having those stored elsewhere and trying to keep things in sync can actually use more resources which is why I was asking.
Not sure how the view would be constantly changing unless the underlying tables are changing but maybe that is part of the issue. If the view and underlying tables would not be constantly changing then Jeff offered a great suggestion.Sue
Hi Sue,
Yes, the records are constantly changing throughout the day with delivery and collections details of the jobs.
Thank you
Paul.
June 29, 2018 at 2:03 am
Jeff Moden - Thursday, June 28, 2018 2:43 PMpaul 69259 - Thursday, June 28, 2018 2:36 AMHi Sue,If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records don’t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul."It Depends". If someone makes the mistake of filtering on a column that is calculated by the view then, yes, you'll have performance issues even in the presence of proper indexing because the view will have to fully materialize all of the rows before any filters can come into play and the underlying tables or indexes will almost certainly suffer full scans at the best.
Indexed views can be nasty fast for SELECTs but they will slow down inserts and updates not to mention that they will take up extra disk and memory space because they live in a fully materialized state just as surely as if you did a SELECT * from your current view and save the result in a table.
My recommendation is to check to see if any calculated columns in the view are being used to filter on. I'll also suggest that a properly written and parameterized stored procedure or Inline Table Valued Function might serve you much better than a view.
And, yes... I agree with Sue... a million rows is a pretty small thing to be having performance problems with in light of the crazy fast speeds of today's machines and the storage.
Most potential for performance will be found in the code and how the code is used. Unfortunately, we've not seen a lick of your code and can only continue with speculation that may or may not actually be pertinent. Please look under "Helpful Links" in my signature line below and read the article at the second link for how to properly post a performance issue to get the best help.
Hi Jeff,
We have no code as such yet. It's an idea my boss has and we are trying to work out the best way of implementing it.
I haven't really done anything with stored procedures or inline table valued functions. I will look into them.
Thank you
Paul.
June 29, 2018 at 5:54 am
I created all my ETL from scratch in Python. I used Azure DB to store the master scheduler and job logs. Worked out fine in that case. Everything goes to an entirely separate database that's not attached to the data warehouse or the data marts. I think we are still on the lowest possible scale for that Azure DB.
It works great and instead of an Android app, we use PowerBI to create reports off that table where it's refreshed on a schedule.
June 29, 2018 at 6:22 am
paul 69259 - Friday, June 29, 2018 1:59 AMSue_H - Thursday, June 28, 2018 1:54 PMpaul 69259 - Thursday, June 28, 2018 2:36 AMHi Sue,If you have a jobs table that has a million records in it, a view that aggregates that table will have to load the million rows into memory which is very resource hungry even if 99.9% of those records don’t change.
We would only update an aggregate table when a jobs record actually changes, so it takes less resource, and we then have a fixed table with unique guids that we can sync to another location (our cloud server) if required. A view would constantly change.
Thank you
Paul.A million records isn't much these days. I've seen aggregates used with millions run in under 10 seconds so I think it really depends. But sometimes having those stored elsewhere and trying to keep things in sync can actually use more resources which is why I was asking.
Not sure how the view would be constantly changing unless the underlying tables are changing but maybe that is part of the issue. If the view and underlying tables would not be constantly changing then Jeff offered a great suggestion.Sue
Hi Sue,
Yes, the records are constantly changing throughout the day with delivery and collections details of the jobs.
Thank you
Paul.
Thanks but data changing does not mean the view is constantly changing. Those are two every different things.
Seems that if the data is constantly changing the table to hold the aggregates would constantly updating. So there may be contention considerations to keep in mind.
Sue
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply