September 5, 2017 at 9:19 am
So we have a situation where we have to load data to a DW during business hours at times. How does one do this with minimal impact?
I was thinking of something like load to temp DB, then rename old DB to archive it, then rename new temp DB to become the production DW DB. But how does one actually go about implementing that, at the technical level? Or is there other strategies that would work?
September 5, 2017 at 12:49 pm
I'm afraid I have more questions than answers at this point:
- if this temporary database you're rebuilding things in is in the same instance, then wouldn't that also be utilizing the same resources, thus impacting production environment? Maybe even more so, because you'd have 2 copies of the same data competing for RAM
- to do the database renaming, you'd have to kick out all users from the live database, so you'd have to coordinate that
- were you planning it this way because you have to do a complete rebuild of some dimension and/or fact tables? typically you would have a delta process that only adds or updates the records that need changing, if possible, to improve performance
- how long does your process take to do the load?
September 5, 2017 at 1:00 pm
how about swapping objects in and out of a schema? doing a bunch of transfers is extremely fast, but like Chris said, you are consuming resources on production to do the work.
if you load data into say, a Staging Schema, you can then swap them around.
ALTER SCHEMA DEVELOPER TRANSFER dbo.FactTable
ALTER SCHEMA dboTRANSFER Staging.FactTable
ALTER SCHEMA Staging TRANSFER Developer.FactTable
Lowell
September 5, 2017 at 1:07 pm
Chris Harshman - Tuesday, September 5, 2017 12:49 PMI'm afraid I have more questions than answers at this point:
- if this temporary database you're rebuilding things in is in the same instance, then wouldn't that also be utilizing the same resources, thus impacting production environment? Maybe even more so, because you'd have 2 copies of the same data competing for RAM
- to do the database renaming, you'd have to kick out all users from the live database, so you'd have to coordinate that
- were you planning it this way because you have to do a complete rebuild of some dimension and/or fact tables? typically you would have a delta process that only adds or updates the records that need changing, if possible, to improve performance
- how long does your process take to do the load?
Ah, I should have been more clear. The issue isn't a performance concern, but rather an inconsistent data set as it's loading, because yes, as you guessed, it does a complete rebuild rather than delta. I agree delta would be best, but the people familiar with the data tell me it's not really possible for various reasons.
The process takes about 2-3 hours.
I don't think kicking out the users would be a deal breaker -- ideally it'd be coordinated to happen at some low use time, like at lunch.
September 5, 2017 at 1:09 pm
Lowell - Tuesday, September 5, 2017 1:00 PMhow about swapping objects in and out of a schema? doing a bunch of transfers is extremely fast, but like Chris said, you are consuming resources on production to do the work.
if you load data into say, a Staging Schema, you can then swap them around.
ALTER SCHEMA DEVELOPER TRANSFER dbo.FactTable
ALTER SCHEMA dboTRANSFER Staging.FactTable
ALTER SCHEMA Staging TRANSFER Developer.FactTable
Oh, that's better than my idea of swapping around an entire database. Yes, I like it! Performance isn't a concern right now, so I'm not worried about that.
Any ideas on how this behaves with people using the data? It's read-only, so it's not like we have to contend with update locks, but I'm not sure what happens if you transfer a schema that's being read from and such.
September 5, 2017 at 1:23 pm
Melllev - Tuesday, September 5, 2017 1:09 PMPerformance isn't a concern right now, so I'm not worried about that.
You have a process that takes 2-3 hours to run. You have users that you want to continue using an old copy of the data until a new copy can be created. What do you think the server will be doing during those 2-3 hours? It won't be sleeping... It's going to be nasty busy.
Performance always matters. It's second only to accuracy/data integrity and it's a very close second.
Shifting gears to the problem at hand, I agree that having two sets of identical objects is the way to go as long as you have a set of synonyms that you can almost instantly repoint to the newly rebuilt data tables when the new data is ready. Next time around, load the now offline tables and then flop the synonyms again when the new data is ready.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 5, 2017 at 2:03 pm
Using either the ALTER SCHEMA TRANSFER or SYNONYM methods, blocking would occur if someone is actively querying the current live table and you're trying to repoint to the new table.
September 5, 2017 at 2:43 pm
I have to use this kind of logic to maintain a suite of tables that are consistent as a group, ie all data s of 8am, and I can hot swap the schemas when the next load is ready. so the data within a given schema is consistent ro it's related tables ...no partially loaded table(s) that has keys that don't exist in other related tables.
my swaps all occur in a transaction, and all together, so they might wait for someone to finish querying; like you, my data is effectively read only for end users.
it works well for me, but like Jeff said, it costs double the disk space for two versions.
Lowell
September 5, 2017 at 3:03 pm
Chris Harshman - Tuesday, September 5, 2017 2:03 PMUsing either the ALTER SCHEMA TRANSFER or SYNONYM methods, blocking would occur if someone is actively querying the current live table and you're trying to repoint to the new table.
That's not a fault, though. It's a feature. You don't want either to happen if someone's in the middle of using the tables.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 6, 2017 at 6:42 am
Jeff Moden - Tuesday, September 5, 2017 3:03 PMChris Harshman - Tuesday, September 5, 2017 2:03 PMUsing either the ALTER SCHEMA TRANSFER or SYNONYM methods, blocking would occur if someone is actively querying the current live table and you're trying to repoint to the new table.That's not a fault, though. It's a feature. You don't want either to happen if someone's in the middle of using the tables.
True, if the physical table behind the logical name were to suddenly change in the middle of a query and the database engine allowed it, the results would be mostly meaningless.
September 6, 2017 at 12:26 pm
Chris Harshman - Wednesday, September 6, 2017 6:42 AMJeff Moden - Tuesday, September 5, 2017 3:03 PMChris Harshman - Tuesday, September 5, 2017 2:03 PMUsing either the ALTER SCHEMA TRANSFER or SYNONYM methods, blocking would occur if someone is actively querying the current live table and you're trying to repoint to the new table.That's not a fault, though. It's a feature. You don't want either to happen if someone's in the middle of using the tables.
True, if the physical table behind the logical name were to suddenly change in the middle of a query and the database engine allowed it, the results would be mostly meaningless.
I would expect that to be a fairly rare occurrence and would require both tables to be brought up to speed at some point. What I was referring to is if someone is mid-query into a table... the synonym would not change until the query(ies) that were using it had all completed.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 8, 2017 at 9:29 pm
I take it that putting a layer between the user and the data warehouse is not an option? If it is an option, why not go the data mart road and have more flexibility in the future to change the business rules of the data warehouse without directly impacting the user who would then be querying a secondary subset of the data in an entirely separate database (known as the data mart)? This would allow you to better break this big problem into smaller manageable chunks.
Outside of major changes like that, I agree with Jeff. Seems like the logical approach.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply