November 8, 2006 at 11:16 am
Hello,
I've just started tinkering with Integration Services, and I had a question.
There exists a network share that contains about sixty text files at any given time. Each of these files has a file name that includes the date in mm/dd/yyyy format. These are fixed width files. Each file has about two million records in it. We have reporting needs with this data, will need to write ad hoc queries against the data, and I imagine that there will be parameters and groupings based on date (in the reports).
I've touched on the ForEach Loop Container, and how to set up basic Data Flows. Would it be possible to set up a package that could do the following;
1. Find the appropriate text files to loop through by parsing the date from the file name and checking that value against one or more date parameters.
2. Send the result set to an indexed container (maybe a record set container).
3. Have that container server as the data source for a report.
4. Make the container available as a 'linked table' so that a user could query against the 'record set' (from SS Management Studio), and send date parameter values to it.
The point would be to integrate the data so that the data could be querried and used in reports where it 'lives' without having to make a second copy of the data in the form of a SQL Server table.
Does this make any sense?
Also, when configuring a fixed width flat file source Connection Manager, is there a way to replace unknown characters with valid characters (like spaces)? In the files that I am working with, the last part of the string is numeric. Some numbers contain a hyphen at the end of the number to represent a negative number. There are no characters that follow the positive numbers, so the last column contains two black squares for each positive number. Where the hyphen appears, the second black square is forced into the beginning of the next row.
Thank you for your help!
CSDunn
November 9, 2006 at 3:39 am
For the first part of you question you need to use a variable enumerator which you populate using a script task.
I wrote this script task below to do a similar task. It reads a date from a control file created at the end of a extract and only returns a list of files in a folder which fit the criteria. You should be able to take it apart and get an idea of how you go about it. Basically it populates an ArrayList which is passed to the package via a variable which is the enumerator (The Files variable is declared as type Object in the packge).
If you want to go complete mental you can check out the following link I found. Maybe a bit over the top.
http://msdn2.microsoft.com/en-US/library/ms136120.aspx
For the second part about the data format you could do this in several days. You could use a derived columns task to trim, replace etc on each colunm before you load it into a table. Another approach would be load the data into a staging table and use t-sql to cleanse the data. personally I think the second approach is easier to maintain.
Hope this helps.
Regards
Daniel
'=======================================================
' Microsoft SQL Server Integration Services Script Task
' Write scripts using Microsoft Visual Basic
' The ScriptMain class is the entry point of the Script Task.
Imports System
Imports System.Data
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Collections
Imports System.IO
Public Class ScriptMain
Public Sub Main()
Dts.Variables("Files").Value = GetFiles(Dts.Variables("Folder").Value.ToString(), getETLDateFromFile(Dts.Variables("Folder").Value.ToString()))
Dts.TaskResult = Dts.Results.Success
End Sub
Function GetFiles(ByVal sFolderPath As String, ByVal ETLDate As DateTime) As ArrayList
Dim sFilename As String
GetFiles = New ArrayList
For Each sFilename In Directory.GetFiles(sFolderPath, Dts.Variables("FilePattern").Value.ToString())
If GetFileDate(sFilename) <= ETLDate Then
GetFiles.Add(sFilename)
End If
Next
End Function
Function GetFileDate(ByVal sFilename As String) As DateTime
Dim sTemp As String
Dim y As Int16
Dim m As Int16
Dim d As Int16
sTemp = sFilename.Substring(sFilename.IndexOf("2"), 8)
y = Convert.ToInt16(sTemp.Substring(0, 4))
m = Convert.ToInt16(sTemp.Substring(4, 2))
d = Convert.ToInt16(sTemp.Substring(6, 2))
GetFileDate = Convert.ToDateTime(DateSerial(y, m, d))
'DEBUG
'Dts.Variables("FileDate").Value = Convert.ToDateTime(DateSerial(y, m, d))
End Function
Private Function getETLDateFromFile(ByVal sFolder As String) As DateTime
Dim filePath As String
Dim TextFileStream As System.IO.TextReader
Dim FileContents As String
Dim d As Int16
Dim m As Int16
Dim y As Int16
Try
If sFolder.EndsWith("\") Then
filePath = sFolder + "ETLControl.csv"
Else
filePath = sFolder + "\ETLControl.csv"
End If
'Load the textfile into the stream
TextFileStream = System.IO.File.OpenText(filePath)
'Read to the end of the file into a String variable.
FileContents = TextFileStream.ReadToEnd
'Close the Stream object
TextFileStream.Close()
y = Convert.ToInt16(FileContents.Substring(0, 4))
m = Convert.ToInt16(FileContents.Substring(4, 2))
d = Convert.ToInt16(FileContents.Substring(6, 2))
getETLDateFromFile = Convert.ToDateTime(DateSerial(y, m, d))
Catch
getETLDateFromFile = Convert.ToDateTime(DateSerial(1900, 1, 1))
End Try
'DEBUG
'Dts.Variables("ETLDate").Value = Convert.ToDateTime(DateSerial(y, m, d))
End Function
End Class
November 9, 2006 at 4:10 am
Daniel,
Thanks for this contribution, I really appreciate it. The script files are pretty easy to decipher.
I wound up putting the text files into an MS Excel format, and that made it easer to work with that last column. I set up a calculated column in the spreadsheets to make the 'Totals' negative where needed, then brought it all into a SQL Server staging table before it went into the production table.
Thanks again
CSDunn
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply