I work for a small non-profit with an Admin office and several remote locations over a large geographic area. The database is a "home-grown" HR system, an "electronic file cabinet" to reduce the paper employee files they need to keep. The plan was to have the master copy of the database at the Admin office, and each remote location would have its own copy of the database on SQL Server Express. They wanted to be able to, for example, file an accident report at the remote location and submit it electronically to the Admin office over the internet. They would use a desktop application to create the accident report as a record in the local database. Then they click "Sync to Admin", which sends the accident report data and retrieves any updates from the Admin database since their last sync operation.
Since we are limited to SQL Server Web Edition, Replication is not an option, so I decided to try my own solution. How hard can it be to design an Update Log? It's like a tape recorder of changes to data over time. You can "record" all the changes in the Admin database and "play" the changes into the local database at each remote location. It doesn't need to be efficient because it is not a large database and there would not be more than 2 or 3 users at a time.
In this article I thought I would share some of the thought process that went into this project. The remote update part was "bench tested" but never put into production, but for about 2 1/2 years updates to the master copy of the database at the Admin office have been going through this update log.
I would be interested in hearing from the SSC community on whether this is a good solution, or if anyone has implemented anything similar. You can run the script to try out a simple version of the log.
My Solution
The solution includes an Update Log table where every row in the log represents a change to a field in a table in the database. When the application writes changes and additions to the database, it instead creates rows in the Update Log table, which are then applied in a separate procedure. When changes need to be transmitted to a remote department the Update Log rows are sent and applied there.
My SQL coding habits standards: Column names use an abbreviation of the table name as a prefix. For example in the Update Log table the prefix is "Log" and the fields are "LogId", "LogDateTime", etc. Every table has an integer ID as the key with identity-style sequential assignment. In this article I often refer to this as the "row ID." The row ID field is named with the table prefix plus "Id" like "LogId."
This is the script to create the table for the Update Log.
CREATE TABLE dbo.UpdateChangeLog ( LogId INT IDENTITY(1, 1) NOT NULL , LogDateTime DATETIME NULL , LogTableName VARCHAR(255) NULL , LogRowIdField VARCHAR(128) NULL , LogUpdateField VARCHAR(128) NULL , LogRowIdValue INT NULL , LogNewValue VARCHAR(MAX) NULL , LogFieldType INT NULL , LogDeptAllOrSystem INT NULL , LogOldValue VARCHAR(MAX) NULL , LogProcessedTime DATETIME NULL , LogSessionId INT NULL, ) ON [PRIMARY];
The main fields in this table define the data to update. The field to update is specified by varchar and integer fields: the name of the table, the row ID for the row in the table, the name of the field to be updated, the data type, and the old and new values for the field. These fields are used to build SELECT, INSERT and UPDATE statements during the update process.
The other fields in the table record when the data was changed, when the change was applied to the database, and the ID of the source department submitting the change.
The solution also includes an Update Change Log Errors table to save rows with errors detected in a TRY ... CATCH pair when processing the log. It has the same fields as the Update Log table, with an additional field for error messages.
The Current Update Status table saves Log ID of the last Update Log row that was applied. There is a row for each department. This is used to determine where to start the next update.
CREATE TABLE dbo.CurrentUpdateStatus ( CusId INT IDENTITY(1, 1) NOT NULL , CusDepartmentId INT NULL , CusLastUpdate DATETIME NULL , CusCurrentLogId INT NULL ) ON [PRIMARY];
This script inserts a row for Department = 1 to start at the beginning of the log.
INSERT CurrentUpdateStatus (CusDepartmentId, CusLastUpdate, CusCurrentLogId) VALUES (1, GETDATE(), 0)
The Next Row ID table is used to keep the next row ID for each table in the database. By knowing the "Identity" value in advance, all the change log entries for a row will have the same Row ID.
CREATE TABLE dbo.NextRowId ( NidTableName VARCHAR(255) NULL , NidCurrentId INT NULL ) ON [PRIMARY];
To avoid conflicts, I used a transaction inside a stored procedure to assign the next row ID:
CREATE PROCEDURE dbo.GetNextRowId @Table VARCHAR(255) AS BEGIN DECLARE @result INT = 0; BEGIN TRY BEGIN TRANSACTION; UPDATE NextRowId SET @result = NidCurrentId = NidCurrentId + 1 WHERE NidTableName = @Table; COMMIT TRANSACTION; END TRY BEGIN CATCH ROLLBACK TRANSACTION; END CATCH; RETURN @result; END;
Next is the stored procedure that applies the changes in the Update Log to the database. The procedure uses a cursor to process each log table row in sequence. Using the fields in the update log it builds and executes dynamic SQL to apply the updates.
Here is the code from the stored procedure that processes the updates:
DECLARE @LastRowId INT; DECLARE @UpdateOK TINYINT; DECLARE @DepartmentId INT = 1; DECLARE @LogId INT; DECLARE @Table VARCHAR(128); DECLARE @Field VARCHAR(128); DECLARE @IdField VARCHAR(128); DECLARE @IdValue INT; DECLARE @Value VARCHAR(MAX); DECLARE @OldValue VARCHAR(MAX); DECLARE @Type INT; DECLARE @sql NVARCHAR(MAX); -- Get the last row processed for the department SELECT @LastRowId = CusCurrentLogId FROM CurrentUpdateStatus WHERE CusDepartmentId = @DepartmentId; UPDATE CurrentUpdateStatus SET CusLastUpdate = GETDATE() WHERE CusDepartmentId = @DepartmentId; -- Cursor definition to get each row of the Update Change Log DECLARE UpdCursor CURSOR FOR SELECT LogId , LogTableName , LogUpdateField , LogRowIdField , LogRowIdValue , LogNewValue , LogFieldType , LogOldValue FROM UpdateChangeLog WHERE LogId > @LastRowId ORDER BY LogId; OPEN UpdCursor; FETCH NEXT FROM UpdCursor INTO @LogId, @Table, @Field, @IdField, @IdValue, @Value, @Type, @OldValue; WHILE @@FETCH_STATUS = 0 BEGIN -- Check if the row with the current ID exists in the table DECLARE @nRows INT; DECLARE @parms NVARCHAR(50); SET @sql = N'SELECT @nRows1 = COUNT(*) FROM ' + @Table + ' WHERE ' + @IdField + ' = ' + CONVERT(VARCHAR(10), @IdValue); EXEC sys.sp_executesql @sql , @parms = N'@nRows1 int OUTPUT' , @nRows1 = @nRows OUTPUT; IF ( @nRows = 0 ) -- If no row found BEGIN -- Insert a new record with row ID and updated field SET @sql = 'INSERT ' + @Table + '(' + @IdField + ', ' + @Field + ') Values (' + CONVERT(VARCHAR(10), @IdValue) + ', '; IF ( @Type = 1 ) -- if numeric SET @sql = @sql + @Value + '); '; ELSE SET @sql = @sql + '''' + @Value + '''); '; EXEC sys.sp_executesql @sql; END; ELSE -- Else, Did not just add a new row, just update the field BEGIN SET @sql = 'UPDATE ' + @Table + ' Set ' + @Field + ' = @value WHERE ' + @IdField + ' = ' + CONVERT(VARCHAR(10), @IdValue); EXEC sys.sp_executesql @sql , N'@value VarChar(max)' , @Value; END; -- Update the status to mark that this log row has been processed for the department UPDATE CurrentUpdateStatus SET CusCurrentLogId = @LogId WHERE CusDepartmentId = @DepartmentId; FETCH NEXT FROM UpdCursor INTO @LogId, @Table, @Field, @IdField, @IdValue, @Value, @Type, @OldValue; END; -- WHILE CLOSE UpdCursor; DEALLOCATE UpdCursor;
In the application the code that inserts rows into the Update Log was written in C#. I won't go into the details here, but this script will insert some rows into the Update Log that, when processed, create 3 new rows in a theoretical Employee table.
INSERT UpdateChangeLog ( LogDateTime, LogTableName, LogRowIdField, LogUpdateField, LogRowIdValue, LogNewValue, LogFieldType, LogDeptAllOrSystem, LogOldValue ) VALUES ( GETDATE(), 'Employee', 'EmpId', 'EmpFirstName', 1, 'Abraham', 2, 1, '' ), ( GETDATE(), 'Employee', 'EmpId', 'EmpLastName', 1, 'Lincoln', 2, 1, '' ), ( GETDATE(), 'Employee', 'EmpId', 'EmpFirstName', 2, 'Millard', 2, 1, '' ), ( GETDATE(), 'Employee', 'EmpId', 'EmpLastName', 2, 'Fillmore', 2, 1, '' ), ( GETDATE(), 'Employee', 'EmpId', 'EmpFirstName', 3, 'Theodore', 2, 1, '' ), ( GETDATE(), 'Employee', 'EmpId', 'EmpLastName', 3, 'Roosevelt', 2, 1, '' );
This script will create the theoretical Employee table for the Update Log rows inserted above:
CREATE TABLE dbo.Employee ( EmpId INT NOT NULL , EmpFirstName VARCHAR(50) NULL , EmpLastName VARCHAR(50) NULL, ) ON [PRIMARY];
If you run the sample update procedure it will insert three new employees.
EmpId | EmpFirstName | EmpLastName |
1 | Abraham | Lincoln |
2 | Millard | Fillmore |
3 | Theodore | Roosevelt |
As the cursor processes each row, the values in the update log table are used to build dynamic SQL. So for the first Update Log row we build a select statement to check whether the row ID is in the table, which for the first row will be:
SELECT @nRows1 = COUNT(*) FROM Employee WHERE EmpId = 1
The value of @nRows1 is returned to an ouput parameter of the sys.sp_executesql command. Since the row ID of 1 is not in the table, it builds an insert statement to insert a new record:
INSERT Employee(EmpId, EmpFirstName) Values (1, 'Abraham');
When it processes the second row the record also has row ID = 1 which was inserted by the previous iteration. Since it already exists, it will build and execute the SQL code to update the record with the last name:
UPDATE Employee Set EmpLastName = @value WHERE EmpId = 1
The parameter @value is supplied when we execute sys.sp_executesql.
Simple?
That's the basic update process, which seems simple. However what we need is more complicated. The "official" row ID is in the Next Row ID table in the database copy at the Admin office. When they add the accident report record at a remote department we need to assign a unique row ID that does not conflict.
My solution for this was to assign a temporary negative row ID for records inserted remotely. Then the rows are assigned a "real" (positive) ID when the log is synchronized to the main copy of the database at the Admin department. So in the Get Next Row ID procedure, I added logic to the stored procedure to assign a negative ID at a remote location.
Select @isRemote = DbiIsRemoteDatabase From DataBaseInstanceInformation IF (@isRemote = 1) BEGIN BEGIN TRY BEGIN TRANSACTION Update NextRowId Set @result = NidCurrentId = NidCurrentId - 1 Where NidTableName = 'RemoteID' COMMIT TRANSACTION END TRY BEGIN CATCH ROLLBACK TRANSACTION END CATCH END ELSE -- It's the master database.
The next ID for a remote location is assigned as if it's the next ID from the Next Row ID table for a table called "RemoteID." The Next Row ID row for the "RemoteID" "table" is initialized to zero and the code above will assign successive negative numbers.
When the Update Log entries are applied to the master database at Admin from a remote location, it uses the negative row ID as a signal to assign a "permanent" ID. If a row ID value in the log is negative, a new ID is requested for the table using the GetNextRowId procedure. Then an update is built and executed that changes the Update Log. Every log row with the temporary ID is updated to the new ID. Now the insert and all the updates in the Update Log will have the new row ID.
Here is the code that I added to the stored procedure to update the log with the new ID:
IF (@IdValue < 0) BEGIN Set @OldIdValue = @IdValue Exec @IdValue = GetNextRowId @Table Set @sql = N'UPDATE UpdateChangeLog Set LogRowIdValue = ' + CONVERT(varchar(10), @IdValue) + ' Where LogRowIdValue = ' + CONVERT(varchar(10), @OldIdValue) + ' And LogTableName = ''' + @Table + '''' exec sp_executesql @sql END
More Complications
Leaving aside the advisability of changing the data in the update log, there is another problem with this technique. When the accident report is created at the remote location it is assigned a negative row ID, say -77. When it is imported into the master database at Admin it is assigned a permanent row ID like 122. This is what the Admin database and the copies at all the other departments will use to identify that accident report. When updates are sent back to the remote department that sent the accident report, how will the update procedure know that the ID has changed and that 122 is the same accident report as -77?
My solution to this was to file another special entry in the Update Log table that is a signal back to the remote location. In these rows the changed field and the Row ID field are the same, and the old value is the negative row ID. These rows are the only rows where the changed field is the row ID field, so only the remote location that inserted the record needs to recognize that update.
When that special Update Log item is processed at the remote department, the Row ID in the table is updated from the negative value to the permanent value assigned by the master database at Admin. This is the code I added to the stored procedure to do the update of the row ID:
IF (@IsRemoteDatabase > 0 AND @IdField = @Field) BEGIN Set @sql = N'UPDATE ' + @Table + ' Set ' + @IdField + ' = ' + CONVERT(varchar(10), @IdValue) + ' WHERE ' + @IdField + ' = ' + CONVERT(varchar(10), @OldValue) exec sp_executesql @sql END
There is another problem here. When Admin sends its data changes back to the remote department that created the accident report, it includes the same accident report with row ID 122. If we insert that record we will have a duplicate record and/or a duplicate ID. Therefore any changes that originate from that location need to be ignored when they are sent back (with the new permanent row ID) from the database at Admin. This requires a slight change to the cursor definition when applying changes at a remote location to ignore log entries originating from the department processing the log:
DECLARE UpdCursor CURSOR FOR SELECT LogId , LogTableName , LogUpdateField , LogRowIdField , LogRowIdValue , LogNewValue , LogFieldType , LogOldValue FROM UpdateChangeLog WHERE LogId > @LastRowId AND LogDeptAllOrSystem <> @DepartmentId ORDER BY LogId;
There are other problems with this approach that were never handled. It does not maintain integrity if the temporary ID is in a foreign key relationship. There is a slight possibility that the same negative number could be used by different departments at the same time which could result in the second row's fields being "updated" onto the first instead of being inserted as a new row. There is also the possibility of concurrency errors if two departments change the same record at the same time.
Conclusion
Obviously the Transaction Log in SQL Server is a lot more than just a recorder of data changes. It is the most important part of maintaining data integrity and facilitates restoration of the database to the point of failure.
There is a lot more that the Transaction Log does that I did not implement in my limited solution:
- Commands can be grouped together in a transaction so that everything is completed as a unit, or rolled back if any part of it fails.
- The Transaction Log can be backed up separately from the database.
- Transactions can be rolled forward to a specific point in time on a restored copy.
You can see how a seemingly simple idea can quickly become very complex. A "simple"data recorder that just records changes and plays them back sounds simple, but if you want to copy everything accurately, avoid duplicates and maintain integrity you will need to do more. I'm not saying you shouldn't design your own Transaction Log, but you should consider all possibilities and be prepared to code for a lot of issues that you may not have considered.
I would be interested in hearing if anyone else has tried to do something similar, or if anyone has better ideas about how this could be implemented.