When trying to implement data from an external source, very
rarely do you receive the data in perfect condition. Sometimes you receive the data in all caps or where fields like
social security number are in the improper format (like 232221113). In this article well discuss some of the
strategies to clean up your data using Data Transformation Services (DTS).
In this example, were going to use a sample file that is comma delimited using the text qualifier or
double-quotes. In DTS, your destination
doesnt have to always be SQL Server.
Indeed, your destination can be any OLE DB data source including a text
file. In our example, our source will
be a text file and our destination will be a text file. First add a text file source as the file you
downloaded and the text file properties should look like the following:
Next, add another connection as a text file
(destination). DTS will take care of
creating the destination file for you so you can specify a file that doesnt
exist yet as shown below.
Create a Transform Data Task between the two
connections. This will create an arrow
between the two connections.
Double-click on the arrow and this will bring up the Transform Data Task
properties. Accept the defaults on the
first tab and proceed to the destination tab.
When you go to this tab, youll be presented with the below screen where
youll need to click Execute. This will
create the destination file.
Go then to the Transformation tab. In this example, we will transform the social security number,
which is represented as123456789 to 123-45-6789. To do this, well delete the last transformation, which is the
arrow to the bottom of the below screen.
Click the transformation once and the hit the delete key or select the
Delete button.
Click and drag the word Col011 from the source to the
destination. After you release, the
below screen will popup. Select the
ActiveX Script method of
transformation.
Select properties as you enter the next screen. This will take you to the screen where you
can modify the transformation. By
default, youll see the below type of transformation:
DTSDestination("Col011") = DTSSource("Col011")
This basically writes the data exactly as is from the
source to the destination. We will
modify this using the below script to place dashes between in the social
security number. We use the Left() function to gather the first three digits
then we concatenate the string using the & operator with the dash. We then take the middle two digits with the
Mid() function then the last 4 digits with the Right() function.
'Copy each source column to the destination column Function Main() DTSDestination("Col011") = LEFT(DTSSource("Col011"),3)_ &"-"&MID(DTSSource("Col011"),4,2)&"-"&RIGHT(DTSSource("Col011"),4) Main = DTSTransformStat_OK End Function
You can test the data by clicking the Test button. This will transform the data to a sample
temporary text file. You can then click
View Results to see the results of the file as shown below.
Although this is a simplistic example, we can use this same
type of code for some more complex scrubbing of your data. In the next article in this series on
scrubbing your data we will work on the upper case strings in the text file.