Importing New Data Without Emptying Existing Tables
The case study I am about to describe has evolved over the past two years
and is still changing as I learn new techniques.
Problem
Soon after I was given the position of SQL Programmer at bigdough.com I was
put in charge of all the DTS packages on our SQL Servers.
Over time I discovered the following problems with our main DTS package
which imports stock ownership data into our databases.
First, it seemed to be designed to be run manually with no fail safes nor
checks to verify that everything ran correctly.
Second, it emptied the tables before pumping the new data into them every
night. These were problems as
our customer base outside the United States was growing and we would get
complaints from those customers that they couldn’t see any data.
I remember distinctly getting a complaint from a customer in Japan.
Also during this time period we got a new DBA who wanted all our DTS
packages to be as dynamic as possible to take into account changes in the
existing databases without having to change the DTS packages every time.
When we switched data providers we had to change our import process to
handle a full update or a daily update depending on what we received on a
given day; The previous
provider simply provided a full update every night.
Business Challenge
The challenge for me was to devise a way to keep the data available at all
times while importing the new data, detect if a full or daily update was
received and run appropriate data pumps, put in sufficient fail safes to
ensure bad data would not get imported, and to make the process automatic
including notification to pagers upon failure.
Solution
Keeping The Data Available
The first thing I tackled was to keep the data available at all times.
I adjusted the DTS package to create tables identical to the existing tables
and added “_New” to the end of the table name.
This piece had to be dynamic in the sense that if new indexes were added to
the original table the DTS package needed to be able to handle this without
changing the DTS package. I
went to the system tables and found where info about tables, primary keys,
and indexes were stored and created a stored procedure that would script
tables with or without primary key and indexes.
The SP can also script the primary key and/or the indexes without the table.
The SP is available in the script section and can be viewed by clicking on
its name: http://www.sqlservercentral.com/scripts/contributions/543.asp
sp_CreateAndExecTableScript.
With this SP I create most of the tables with all their indexes except for
the ones that have nearly a million rows.
The data pump tasks go faster without the indexes and the placement of the
indexes after the data is in place takes about 2.5 minutes.
After the data is imported into the new tables I run the stored procedure
sp_rename multiple times to rename the existing tables with the extension
“_old”. Then I rename all the
tables with the extension
“_New” to the original table name.
All these rename commands are encased by a BEGIN/COMMIT TRANSACTION with a
ROLLBACK being issued if one of the rename commands fails.
This is an all or nothing transaction to avoid having old data in one table
live at the same time as new data in a different table.
If this step fails a failure e-mail is immediately sent to my cell phone
which is always beside my bed.
The phone beeps until I look at the e-mail.
After testing this we have found that data is always available.
Whoever is logged on during the import and/or swapping of tables will only
experience a slight slowness on the site.
This slowness was reduced when we upgraded from SQL Server 7.0 to SQL Server
2000.
Importing Full or Daily Update
When we changed data providers we had to change our DTS package so it could
detect if we received a full or daily update and make it run the appropriate
data pumps and other SQL tasks.
All files associated with the daily update have the word daily in front of
their name. Those for a full
update don’t. The DTS package
has a data pump for each file in a full update and each file in a daily
update.
To control which data pumps get executed I created two SQL Execute Tasks.
One is called 1st Gateway Daily Update and the other is called 1st Gateway
Full Update. The only code in
the task is a remark stating the purpose of the task.
All data pumps for the daily update can’t execute until the SQL task called
1st Gateway Daily Update succeeds.
All data pumps for the full update can’t execute until the SQL task called
1st Gateway Full Update succeeds.
I then created an ActiveX Script Task with a VBScript to detect which files
were received. If the files for
a daily update are received then the VBScript marks the SQL Execute Task
called 1st Gateway Full Update as completed.
This is interpreted by the DTS package as a failure and so none of the data
pumps for the full update are executed.
The task called 1st Gateway Daily Update is marked as completed when a full
update is received thus preventing those data pumps from executing.
There are other tasks after the appropriate data pumps execute that must
run, however they can’t run until the needed data pumps succeed.
To make sure they would run after the data pumps I created two more SQL
Execute Tasks similar to those described above and simply changed “1st”
to “2nd”. The 2nd
Gateway Daily Update can’t execute until all the daily update data pumps
succeed and the 2nd full update task can’t execute until the full
update data pumps succeed. All
tasks that follow can’t execute until both the 2nd gateway tasks
have completed.
So, if a daily update is received the ActiveX Script Task will mark both
full update gateways as completed.
This keeps all the full update data pumps from executing and makes sure the
tasks that must run after the daily data pumps succeed wait for the data
pumps. Now the DTS Package can
handle a full or daily update.
This method was used to avoid a failure entry for each data pump that didn’t
find a file. Half of the data
pumps would fail and be logged as failed in the error log.
This method reduces the number of known failures from about 10 to 2.
One for each task marked as completed.
Here is a shortened version of the VBScript that does what I described in
this section:
'********************************************************************** ' Visual Basic ActiveX Script '************************************************************************ Function Main() ' Specify the OLE DB provider. set DATABASENAME = CreateObject("ADODB.Connection") DATABASENAME.Provider = "sqloledb" ' Specify connection string on Open method. ProvStr = "Server=SERVERNAME;Database=DATABASENAME;UID=SQLSERVERLOGIN;PWD=PASSWORD" DATABASENAME.Open provStr 'Create temporary table query = "CREATE TABLE #fileexists (doesexist smallint, fileisdir smallint, direxist smallint, txtfilename varchar(45) NULL default 'unknown')" DATABASENAME.Execute(query) query = "INSERT INTO #fileexists (doesexist, fileisdir, direxist) EXEC master..xp_fileexist 'D:\TextFilesForImport\dailyupdate.txt' UPDATE #fileexists SET txtfilename = 'dailyupdate.txt' WHERE txtfilename = 'unknown'" DATABASENAME.Execute(query) query = "INSERT INTO #fileexists (doesexist, fileisdir, direxist) EXEC master..xp_fileexist 'D:\TextFilesForImport\fullupdate.txt' UPDATE #fileexists SET txtfilename = 'fullupdate.txt' WHERE txtfilename = 'unknown'" DATABASENAME.Execute(query) 'Start finding UpdateType '1 indicates Full Update. 0 or 2 indicates at least one file missing. 3 indicates Daily Update. UpdateTypeQuery = "SELECT doesexist AS UpdateType FROM #fileexists WHERE txtfilename = 'fullupdate.txt'" set LocalImportType = DATABASENAME.execute(UpdateTypeQuery) DTSGlobalvariables("UpdateType").value = LocalImportType("UpdateType") 'enters this if when file name institution.txt is not found If DTSGlobalvariables("UpdateType").Value = 0 Then UpdateTypeQuery = "SELECT doesexist + 2 AS UpdateType FROM #fileexists WHERE txtfilename = 'dailyupdate.txt'" set LocalImportType = DATABASENAME.execute(UpdateTypeQuery) DTSGlobalvariables("UpdateType").value = LocalImportType("UpdateType") End If 'End finding UpdateType query = "DROP TABLE #fileexists" DATABASENAME.Execute(query) Dim oPackage Set oPackage = DTSGlobalVariables.Parent 'This if handles daily updates. It marks the Gateway Full Update steps as completed. If DTSGlobalvariables("UpdateType").Value = 3 Then 'Mark 1st Gateway Full Update step as completed oPackage.Steps("DTSStep_DTSExecuteSQLTask_4").ExecutionStatus = DTSStepExecStat_Completed 'Mark 2nd Gateway Full Update step as completed oPackage.Steps("DTSStep_DTSActiveScriptTask_7").ExecutionStatus = DTSStepExecStat_Completed End If 'This if handles full updates. It marks the Gateway Daily Update steps as completed. If DTSGlobalvariables("UpdateType").Value = 1 Then 'Mark 1st Gateway Daily Update step as completed. oPackage.Steps("DTSStep_DTSExecuteSQLTask_1").ExecutionStatus = DTSStepExecStat_Completed 'Mark 2nd Gateway Daily Update step as completed. oPackage.Steps("DTSStep_DTSActiveScriptTask_6").ExecutionStatus = DTSStepExecStat_Completed End If Main = DTSTaskExecResult_Success End Function
Avoiding False Failure Alerts
The above solution created a new problem.
The job that executes the DTS package will always complete reporting failure
because of the tasks that get marked as completed.
To avoid this I modified the job to always report success even upon failure.
That avoids the false failure alert.
Unfortunately, the job will no longer alert me when there really is a
failure.
For that I created a table where the DTS package will log when it started
and when it finished. Then I
created a separate job that will count the number of times the job logged a
start for the day and how many times it logged a finish for the same day.
If the numbers aren’t the same then an e-mail gets sent to my cell phone to
wake me up and I groggily power up my computer to investigate the problem.
Fail Safes
We discovered a problem one day when all the expected data was not on our
production servers and wondered why we didn’t receive an e-mail to alert us
of a problem. Upon
investigation we discovered that we had received all the required files in
the proper format. The problem
was that the vendor had sent one or two of the files with less than half the
normal number of rows. Nothing
in the DTS package was designed to handle this problem since the few rows
received were correct and so the data pumps moved them to their respective
tables.
The fail safe I designed for this eventuality was to create an SQL Execute
Task that would count the number of rows in each table.
If the row count fell below a minimum level then the task would send an
e-mail to my cell phone and stop the job so that no more tasks were
executed. I placed this task
after the data pump tasks to the new tables and before the task that renames
all the tables to make the new data available on our web site.
Thus if there are not enough rows in any of the tables the new data will not
be made live.
Another problem we faced was when no new files were available the import DTS
package would still run and send an e-mail to tell us it had succeeded.
It had succeeded in importing yesterday’s data.
This was not acceptable. I
developed the following VBScript and placed it in an ActiveX Script task:
'********************************************************************** ' Visual Basic ActiveX Script '************************************************************************ 'This task checks the file creation date for all the text 'files in the zip file and if not today then it creates a directory 'and moves the file to that directory and then stops the 'DTS Package. Function Main() Dim TodayDate Dim FullUpdateDate Dim DailyUpdateDate TodayDate = Date Dim objFSO, objFolder, objFile, colFiles, objFSO2 Set objFSO2 = CreateObject("Scripting.FileSystemObject") Set objFSO = CreateObject("Scripting.FileSystemObject") Set objFolder = objFSO.GetFolder(DTSGlobalVariables("gvFilePath").Value) Set colFiles = objFolder.Files 'delete folder WrongDate if exists If objFSO.FolderExists(ObjFolder&"\WrongDate") Then objFSO.DeleteFolder(ObjFolder&"\WrongDate") End If 'begin checking dates on LionShares text files and move old files to WrongDate folder If colFiles.Count > 0 Then For Each objFile in colFiles 'checks dates for all files IF UCase(objFile.Name) = "DAILYUPDATE.TXT" or UCase(objFile.Name) = "FULLUPDATE.TXT" _ or UCase(objFile.Name) = "FUNDS.TXT" or UCase(objFile.Name) = "HOLDINGS.TXT" _ or UCase(objFile.Name) = "STOCKS.TXT" or UCase(objFile.Name) = "ANALYTICS.TXT" _ or UCase(objFile.Name) = "FUNDANALYTICS.TXT" Then DailyUpdateDate = TRIM(LEFT(objFile.DateLastModified, instr(objFile.DateLastModified, " "))) 'the below line is for debugging purposes. 'MsgBox "!" & DailyUpdateDate & "!" & TodayDate & "!" IF TRIM(TodayDate) <> DailyUpdateDate Then If objFSO.FolderExists(ObjFolder&"\WrongDate") Then objFile.Move(objFolder&"\WrongDate\") Else objFSO.CreateFolder(objFolder&"\WrongDate") objFile.Move(objFolder&"\WrongDate\") End If End If End If Next End If 'end checking dates on LionShares zip files and move old files to WrongDate folder Set objFSO = nothing Set objFolder = nothing Set colFiles = nothing Main = DTSTaskExecResult_Success End Function
This script will compare the date of each file with today’s date and if the
date is not the same then the text file will be moved to a different
directory. Now we can just let
the DTS package run. If no
files are left where the data pump tasks can find them then the entire
package will halt and the process described in the section called
Avoiding False Failure Alerts will send out an e-mail to my cell phone
since the package will have logged a start time but no finish time.
The reason we will let the package continue at this point is because we
don’t know if we are receiving a daily or a full update.
Since we’ll only receive one or the other then the one that was not received
today will have the wrong date and thus get moved to the WrongDate folder
before the task executes that determines if we have a full or daily update.
Benefits
We now have a DTS package that will not adversely affect our customers and
that will let everyone who monitors them sleep knowing that if something
really goes wrong a wakeup e-mail will be sent to our cell phones and pagers
to alert us of the problem. We
now have fewer problems with the import and keep incomplete or bad data off
our web site. We also have
eliminated complaints that say there is no data to be seen in the areas of
our web site effected by this DTS package.
Summary
In this case study I have explained how to keep data available while new
data is being imported and some of the fail safes that help automate the
import process. I have also
described one method for handling full and daily updates in the same DTS
package while minimizing the number of failures recorded in error logs.