December 16, 2017 at 2:56 pm
I have a Stored Procedure which does some ETL "stuff"
At the end of the procedure I want to replace a CLIENTS table with the CLIENTS_STAGING table.
This process happens a number of times per day and the CLIENTS table needs to be available as much as possible.
How would one approach this?
Delete & Rename, Truncate & Insert?
Below is my present effort.
Cheers,
Julian
TRUNCATE TABLE CLIENTS
INSERT INTO CLIENTS
(
[zzpclient_id]
,... etc
,[Spiegel100TOT]
,[Spiegel-TOT]
,[Spiegel+TOT]
)
SELECT
[zzpclient_id]
,... etc
,[Spiegel100TOT]
,[Spiegel-TOT]
,[Spiegel+TOT]
FROM CLIENTS_STAGING
December 16, 2017 at 7:04 pm
Neither.
If the desired table name is "Clients", then DO NOT create a table named "Clients". Instead, create a VIEW called "Clients" and have two "staging" tables. Call one of the staging tables (for example) "ClientStaging1" and the other "ClientStaging2". Decide which table you'll initially populate and point the new view at that table. Let's say that you started of with the view pointing at ClientStaging1 and that's what the view is pointed at. The users are querying on the Clients view and that's actually querying on the content of the ClientStaging1 table.
The time comes for an update. Truncate the other table, ClientStaging2, and populate it as if it were the only staging table as you did before. Once you have successfully populated ClientStaging2, simply alter the view to point at ClientStaging2 instead of ClientStaging1. The "flop over" is measured in the low milliseconds.
The next time you need update the date, simply reverse the process. Populate ClientStaging1 and, if successful, repoint the view to it.
If something goes wrong with the population of whichever staging table you're trying to populate , simply don't repoint the view at it and the users will have no clue that something went haywire except the data may not be up to date. Still, they are probably able to continue to function on the old data, which is still available.
As a side benefit to this method, since you'll have both the previous data and the current data, you could do queries to see what recently changed.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 16, 2017 at 10:31 pm
Here's another options that differs from Jeff's good suggestion...
It's a bit more convoluted than Jeff's suggestion but I offer it up because, in a situation like this, you probably don't want any indexes on the load/stage table... and... you probably DO want indexes on the read copy of the table. If that's not the case, go with Jeff's solution.
This solution involves a wee bit of trickery know as partition switching. The following is an example...
USE RDatabaseName;
GO
/* =================================================================================================
Using the partition switch method...
================================================================================================= */
IF OBJECT_ID('dbo.Client', 'U') IS NOT NULL
DROP TABLE dbo.Client;
GO
CREATE TABLE dbo.Client (
ClientID INT NOT NULL IDENTITY(1,1),
ClientName VARCHAR(30) NOT NULL,
CONSTRAINT pk_Client PRIMARY KEY CLUSTERED(ClientID)
WITH(FILLFACTOR = 100)
);
GO
IF OBJECT_ID('dbo.Client_Stage', 'U') IS NOT NULL
DROP TABLE dbo.Client_Stage;
GO
CREATE TABLE dbo.Client_Stage (
ClientID INT NOT NULL IDENTITY(1,1),
ClientName VARCHAR(30) NOT NULL
-- this will be a "load only" table, so we don't want any indexes
);
GO
/* ================================================================================================
Here's how it works...
- The Client_Stage table gets loaded from whatever sources it gets loaded from.
- Just before doing the switch, add any Client indexes to Client_Stage. (the partition switch requires the two tables both have the exact same indexes).
(this will be the only portion of the procedure that will take time or be verbosly logged).
- In a single transaction, truncate dbo.Client and switch from dbo.Client_Stage to dbo.Client and drop any indexes that were added to dbo.Client_Stage
so that's ready to take in new data.
Note: the partition switch itself is mearly a metadata change, so the whole thing takes less than a millisecond and
dbo.Client is never offline or unavailable.
================================================================================================= */
-- start by adding some data to the dbo.Client_Stage table...
INSERT dbo.Client_Stage (ClientName)
SELECT TOP (1000000)
LEFT(ao1.name + ao2.name, 20)
FROM
sys.all_objects ao1
CROSS JOIN sys.all_objects ao2;
-- just a quick sanity check to verify that 1,000,000 rows have been added...
SELECT COUNT(1) FROM dbo.Client_Stage;
-- add a pk to dbo.Client_PreSwitch so that it matches dbo.Client...
ALTER TABLE dbo.Client_Stage
ADD CONSTRAINT pk_ClientStage
PRIMARY KEY CLUSTERED(ClientID)
WITH(FILLFACTOR = 100);
BEGIN TRY
BEGIN TRANSACTION
TRUNCATE TABLE dbo.Client; --<< we're assuming that dbo.Client already has data in it.
ALTER TABLE dbo.Client_Stage SWITCH TO dbo.Client;
ALTER TABLE dbo.Client_Stage DROP CONSTRAINT pk_ClientStage;
COMMIT TRANSACTION
END TRY
BEGIN CATCH
IF @@trancount > 0 ROLLBACK TRANSACTION;
DECLARE @msg nvarchar(2048) = error_message();
RAISERROR (@msg, 16, 1);
END CATCH;
SELECT COUNT(1) FROM dbo.Client;
SELECT COUNT(1) FROM dbo.Client_Stage;
Note: the above assumes that you want dbo.Client_Stage to be a clean slate after each switch. If that's not the case and subsequent loads are appended to the existing data, that can be accomplished with a fairly minor change.USE DatabaseName;
GO
/* =================================================================================================
Using the partition switch method...
================================================================================================= */
IF OBJECT_ID('dbo.Client', 'U') IS NOT NULL
DROP TABLE dbo.Client;
GO
CREATE TABLE dbo.Client (
ClientID INT NOT NULL IDENTITY(1,1),
ClientName VARCHAR(30) NOT NULL,
CONSTRAINT pk_Client PRIMARY KEY CLUSTERED(ClientID)
WITH(FILLFACTOR = 100)
);
GO
IF OBJECT_ID('dbo.Client_Stage', 'U') IS NOT NULL
DROP TABLE dbo.Client_Stage;
GO
CREATE TABLE dbo.Client_Stage (
ClientID INT NOT NULL IDENTITY(1,1),
ClientName VARCHAR(30) NOT NULL
-- this will be a "load only" table, so we don't want any indexes
);
GO
/* ================================================================================================
Here's how it works...
- The Client_Stage table gets loaded from whatever sources it gets loaded from.
- Use SELECT INTO syntax to create a Client_PreSwitch table that will act as an intermediary
between Client_Stage and Client.
- Add any Client indexes to Client_PreSwitch. (the partition switch requires the two tables both have the exact same indexes).
- In a single transaction, truncate dbo.Client and switch from dbo.Client_PreSwitch to dbo.Client and drop dbo.Client_PreSwitch.
Note: the partition switch itself is mearly a metadata change, so the whole thing takes less than a millisecond and
dbo.Client is never offline or unavailable.
================================================================================================= */
-- start by adding some data to the dbo.Client_Stage table...
INSERT dbo.Client_Stage (ClientName)
SELECT TOP (1000000)
LEFT(ao1.name + ao2.name, 20)
FROM
sys.all_objects ao1
CROSS JOIN sys.all_objects ao2;
-- just a quick sanity check to verify that 1,000,000 rows have been added...
SELECT COUNT(1) FROM dbo.Client_Stage;
SELECT
cs.ClientID,
cs.ClientName
INTO dbo.Client_PreSwitch
FROM
dbo.Client_Stage cs;
-- add a pk to dbo.Client_PreSwitch so that it matches dbo.Client...
ALTER TABLE dbo.Client_PreSwitch
ADD CONSTRAINT pk_ClientPreSwitch
PRIMARY KEY CLUSTERED(ClientID)
WITH(FILLFACTOR = 100);
BEGIN TRY
BEGIN TRANSACTION
TRUNCATE TABLE dbo.Client; --<< we're assuming that dbo.Client already has data in it.
ALTER TABLE dbo.Client_PreSwitch SWITCH TO dbo.Client;
DROP TABLE dbo.Client_PreSwitch;
COMMIT TRANSACTION
END TRY
BEGIN CATCH
IF @@trancount > 0 ROLLBACK TRANSACTION;
DECLARE @msg nvarchar(2048) = error_message();
RAISERROR (@msg, 16, 1);
END CATCH;
SELECT COUNT(1) FROM dbo.Client;
SELECT COUNT(1) FROM dbo.Client_Stage;
Even if the first assumption was correct and you do want staging to be a clean slate after each switch, you could also use a variation on this theme (using the "preswitch" table) if you aren't able (or simply don't want to) add indexes to the staging table.
you'd just want to truncate the staging table in an explicit transaction with the SELECT INTO preswitch.
HTH,
Jason
December 17, 2017 at 8:20 am
Just to cover all the bases, both methods will appear to be nearly instantaneous to the end users. Both methods will also be relatively slow and require every action to be fully logged if you're required to keep the database in the FULL Recovery Model during the load. Also remember that index rebuilds are fully logged in the FULL Recovery Model.
For each of the two methods, you can take advantage of "Minimal Logging" if an excursion to the Bulk Logged Recovery Model is allowed (or the database is in the Simple Recovery Model) and it can be done without the possibly unwanted creation of the previously noted "free space". You can do that (regardless of which method you've chosen to use) by having the Clustered Index already in place and following the steps required (Tablock, insert in same order as the Clustered Index, and possibly an OPTION(RECOMPILE) if variables are part of it all).
As for the Non-Clustered Indexes, simply disable them and rebuild them after the load OR explore the capabilities of Trace Flag 610. Here's an older link ( https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx ) on the subject that still very relevant today. It covers loading of tables in a Minimally Logged fashion (although it doesn't cover the two table method I frequently use) and it covers the SWITCH method that Jason speaks of (don't forget that the Clustered Index can already exist on empty tables and still achieve Minimal Logging even without Trace Flag 610, if you "follow the rules", which are easy).
There are many ways to solve this problem of "whole table replacement". The unfortunate part is that you're usually going to run into one problem or another. Either going to end up using twice the size of the table for MDF file space, cause some possibly substantial blocking for the users, or the data won't be available to the users during the load. Some of the methods can be troublesome in that if the load fails, no data will be available to anyone and I, of course, strongly recommend avoiding those particular methods.
As the ancient soldier said in the cave of the "Holy Grail" in the "Indiana Jones and the Last Crusade" said, "Chose wisely". 😉
The SWITCH method that Jason posted will work very well as will the two table and repointing of a view (or synonym) method with the latter having the advantage of the previous and current loads being available for comparison if needed. The SWITCH method could be modified to If you can take advantage of Minimal Logging, they can both be 2 or 3 times faster (even with the Clustered Index in place, sometimes more, depending on what's happening) than just a straight load. You could do a trick with an additional temporary file group that would allow the SWITCH method to do it's thing without the expansion of the MDF file for large file loads, if that's something that you want to avoid.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 17, 2017 at 12:57 pm
Thank you Jeff and Jason,
I will try Jeff's suggestion first.
Does the approach below mean that most code in the StoredProcedure needs to be 'dynamic'?
Which is some rework and I find the code not so readable.
DECLARE @Staging1 as datetime;
DECLARE @Staging2 as datetime;
DECLARE @NR AS CHAR(1);
DECLARE @STAGING_TABLE VARCHAR(50)
SET @Staging1 = (SELECT TOP 1 A.DATEEDIT FROM Clients_STAGING1 A);
SET @Staging2 = (SELECT TOP 1 B.DATEEDIT FROM Clients_STAGING2 B);
IF @Staging1 < @Staging2 BEGIN SET @NR=1 END ELSE BEGIN SET @NR=2 END;
SET @STAGING_TABLE = '[dbo].[Clients_STAGING'+@NR+']'
PRINT @staging_table
-- rewrite stored procedure for use with @Staging_Table:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Clients_STAGING]') AND type in (N'U'))
DROP TABLE @STAGING_TABLE;
CREATE TABLE @STAGING_TABLE(
[zzpclient_id] [int] NULL,
December 17, 2017 at 8:46 pm
JJR333 - Sunday, December 17, 2017 12:57 PMThank you Jeff and Jason,
I will try Jeff's suggestion first.
Does the approach below mean that most code in the StoredProcedure needs to be 'dynamic'?
Which is some rework and I find the code not so readable.
DECLARE @Staging1 as datetime;
DECLARE @Staging2 as datetime;
DECLARE @NR AS CHAR(1);
DECLARE @STAGING_TABLE VARCHAR(50)SET @Staging1 = (SELECT TOP 1 A.DATEEDIT FROM Clients_STAGING1 A);
SET @Staging2 = (SELECT TOP 1 B.DATEEDIT FROM Clients_STAGING2 B);IF @Staging1 < @Staging2 BEGIN SET @NR=1 END ELSE BEGIN SET @NR=2 END;
SET @STAGING_TABLE = '[dbo].[Clients_STAGING'+@NR+']'
PRINT @staging_table
-- rewrite stored procedure for use with @Staging_Table:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Clients_STAGING]') AND type in (N'U'))
DROP TABLE @STAGING_TABLE;CREATE TABLE @STAGING_TABLE(
[zzpclient_id] [int] NULL,
I believe you may have missed the point of my post. You wouldn't have to change any stored procedures that refer to the "Clients" table.
1. You need to create two tables that are identical in every way. Call one "ClientsStaging1" and call the other "ClientsStaging2". They should both be identical to what you currently have as the "Clients" table. If you have named constraints, you will need to change those names on one of the tables because constraints must be uniquely named within a database.
2. Populate "ClientsStaging1" as you would normally have populated your original staging table.
3. Rename your old "Clients" table to "ClientsOld" to get it out of the way without dropping it.
4. Create a view of the "ClientsStaging1" table and call it "Clients", which is the same as your original table name making it so that you need not make any changes to your stored procedures, etc. Once that view is in place, you be able to use it as if you were looking at the old "Clients" table except it will be looking at the new "ClientsStaging1" table.
5. Also create a view called "Clients_Staging" and point it at the new "ClientsStaging2" table. This will also keep you from having to change the procs that were using the "Clients_Staging" table before.
6. When the time comes to reload the staging table, reload it through the "Clients_Staging" view. The only thing that you may need to change in the proc is the TRUNCATE statement. It will need to be dynamic to truncate the new "ClientsStaging2" table for now and "ClientsStaging1" table during the next load iteration.
7. Once the "ClientsStaging2" table has been loaded successfully, dynamically alter the new "Clients" view to point at the freshly loaded "ClientsStaging2" table and you're done.
8. During the next load iteration, truncate and load the "ClientStaging1" table, alter the new "Clients" view to point at it, and your done.
9. Repeat the reversal in Step 8 above between the two new staging tables for each load iteration.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2017 at 4:47 am
Thanks for your patience Jeff.
I am fairly new to SQL. Having trouble getting my head around some concepts regarding Views. I see now that they can be Updated (updates the underlying table), as can CTE's, as you pointed out in another post.
Switching the View to one of the Staging tables, after a bulk insert of a csv file (& adding columns, shape the data, etc) I get.
Having trouble with point 5 and 6. 😉
How about:
1 copy data from Staging1 to Staging 2
2 point Client view to Staging2
3 bulk insert csv file, add columns, etc into Staging1
4 point Client view back to Staging1
J.
December 18, 2017 at 6:08 am
JJR333 - Monday, December 18, 2017 4:47 AMThanks for your patience Jeff.
I am fairly new to SQL. Having trouble getting my head around some concepts regarding Views. I see now that they can be Updated (updates the underlying table), as can CTE's, as you pointed out in another post.
Switching the View to one of the Staging tables, after a bulk insert of a csv file (& adding columns, shape the data, etc) I get.
Having trouble with point 5 and 6. 😉How about:
1 copy data from Staging1 to Staging 2
2 point Client view to Staging2
3 bulk insert csv file, add columns, etc into Staging1
4 point Client view back to Staging1J.
There's no need to copy data between the two staging tables. Just switch the view back and forth to the staging table that has the latest good load. It'll save on time, resources (reads, writes, CPU time), and log file space.
{Edit was for a spelling correction}
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2017 at 6:30 am
Is there any reason not to just use sp_rename to do this? Keep both copies of the table similar to what Jeff is suggestng but instead of a view and trying to figure out what table is current.
1. Always populate Client_staging
2. Rename Client to Client_Temp
3. Rename Client_staging to Client
4. Rename Client_temp to Client_staging
That seems simpler unless there's some nasty over head with renaming?
December 18, 2017 at 9:30 am
ZZartin - Monday, December 18, 2017 6:30 AMIs there any reason not to just use sp_rename to do this? Keep both copies of the table similar to what Jeff is suggestng but instead of a view and trying to figure out what table is current.1. Always populate Client_staging
2. Rename Client to Client_Temp
3. Rename Client_staging to Client
4. Rename Client_temp to Client_stagingThat seems simpler unless there's some nasty over head with renaming?
I would think that it would be much simpler to use a synonym. I believe it only requires updating the synonym to the correct table.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 18, 2017 at 10:31 am
drew.allen - Monday, December 18, 2017 9:30 AMZZartin - Monday, December 18, 2017 6:30 AMIs there any reason not to just use sp_rename to do this? Keep both copies of the table similar to what Jeff is suggestng but instead of a view and trying to figure out what table is current.1. Always populate Client_staging
2. Rename Client to Client_Temp
3. Rename Client_staging to Client
4. Rename Client_temp to Client_stagingThat seems simpler unless there's some nasty over head with renaming?
I would think that it would be much simpler to use a synonym. I believe it only requires updating the synonym to the correct table.
Drew
Hmm... but how do you determine what the correct table is update the synonym(or view) to? It seemed like that was causing some confusion.
December 18, 2017 at 6:21 pm
ZZartin - Monday, December 18, 2017 6:30 AMIs there any reason not to just use sp_rename to do this? Keep both copies of the table similar to what Jeff is suggestng but instead of a view and trying to figure out what table is current.1. Always populate Client_staging
2. Rename Client to Client_Temp
3. Rename Client_staging to Client
4. Rename Client_temp to Client_stagingThat seems simpler unless there's some nasty over head with renaming?
That'll work. It'll take slightly longer than altering a View but it has the advantage of not having to track which table to point the view at and you don't have to change any of the other code.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 20, 2017 at 4:49 am
All seems to be working now. Thanks all of you for your help!
On a virtual machine on my laptop the old procedure (TRUNCATE and INSERT) took about 26 milliseconds.
The 'rename' procedure takes about 3 milliseconds.
Cheers,
Julian
/*########################################################################################################
###
### 4 RENAME STAGING TABLE TO BEWONERS
###
### 1. Always populate Client_staging
### 2. Rename Client to Client_Temp
### 3. Rename Client_staging to Client
### 4. Rename Client_temp to Client_staging
###
######################################################################################################## */
DECLARE @Stopwatch DATETIME
SET @Stopwatch = GETDATE()
SET @MYSQL = 'sp_rename ''Bewoners'', ''Bewoners_TEMP'';'
EXEC (@MYSQL);
SET @MYSQL = 'sp_rename ''Bewoners_STAGING'', ''Bewoners'';'
EXEC (@MYSQL);
SET @MYSQL = 'sp_rename ''Bewoners_TEMP'', ''Bewoners_STAGING'';'
EXEC (@MYSQL);
PRINT 'Elapsed time Insert into Bewoners: ' + convert(varchar(11), DATEDIFF(MS, @Stopwatch, GETDATE())) + ' miliseconds'
November 19, 2019 at 12:41 pm
Hello Jeff,
I have decided to go down your route, because it gives more flexibility in my opinion, and to avoid problems with redefining the constraints.
Q: how do you "know" to which Staging (Clients_Staging_1 or Clients_Staging_2 ) table the view is pointing to?
I suppose I could use a lookup table (columns: DateTime, StagingTableNumber), or is there a better way? Eg, extract the number from the definition of the View....
Thanks for your help.
J.
November 19, 2019 at 2:15 pm
So, just to be sure, you're not going to use the code in the "All seems to be working now" post above?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply