Query Multiple Text Files

  • Hello,

    I've just started tinkering with Integration Services, and I had a question.

    There exists a network share that contains about sixty text files at any given time. Each of these files has a file name that includes the date in mm/dd/yyyy format. These are fixed width files. Each file has about two million records in it. We have reporting needs with this data, will need to write ad hoc queries against the data, and I imagine that there will be parameters and groupings based on date (in the reports).

    I've touched on the ForEach Loop Container, and how to set up basic Data Flows. Would it be possible to set up a package that could do the following;

    1. Find the appropriate text files to loop through by parsing the date from the file name and checking that value against one or more date parameters.

    2. Send the result set to an indexed container (maybe a record set container).

    3. Have that container server as the data source for a report.

    4. Make the container available as a 'linked table' so that a user could query against the 'record set' (from SS Management Studio), and send date parameter values to it.

    The point would be to integrate the data so that the data could be querried and used in reports where it 'lives' without having to make a second copy of the data in the form of a SQL Server table.

    Does this make any sense?

    Also, when configuring a fixed width flat file source Connection Manager, is there a way to replace unknown characters with valid characters (like spaces)? In the files that I am working with, the last part of the string is numeric. Some numbers contain a hyphen at the end of the number to represent a negative number. There are no characters that follow the positive numbers, so the last column contains two black squares for each positive number. Where the hyphen appears, the second black square is forced into the beginning of the next row.

    Thank you for your help!

    CSDunn

  • For the first part of you question you need to use a variable enumerator which you populate using a script task.

    I wrote this script task below to do a similar task. It reads a date from a control file created at the end of a extract and only returns a list of files in a folder which fit the criteria. You should be able to take it apart and get an idea of how you go about it. Basically it populates an ArrayList which is passed to the package via a variable which is the enumerator (The Files variable is declared as type Object in the packge).

    If you want to go complete mental you can check out the following link I found. Maybe a bit over the top.

    http://msdn2.microsoft.com/en-US/library/ms136120.aspx

    For the second part about the data format you could do this in several days. You could use a derived columns task to trim, replace etc on each colunm before you load it into a table. Another approach would be load the data into a staging table and use t-sql to cleanse the data. personally I think the second approach is easier to maintain.

    Hope this helps.

    Regards

    Daniel

    '=======================================================

    ' Microsoft SQL Server Integration Services Script Task

    ' Write scripts using Microsoft Visual Basic

    ' The ScriptMain class is the entry point of the Script Task.

    Imports System

    Imports System.Data

    Imports Microsoft.SqlServer.Dts.Runtime

    Imports System.Collections

    Imports System.IO

    Public Class ScriptMain

    Public Sub Main()

    Dts.Variables("Files").Value = GetFiles(Dts.Variables("Folder").Value.ToString(), getETLDateFromFile(Dts.Variables("Folder").Value.ToString()))

    Dts.TaskResult = Dts.Results.Success

    End Sub

    Function GetFiles(ByVal sFolderPath As String, ByVal ETLDate As DateTime) As ArrayList

    Dim sFilename As String

    GetFiles = New ArrayList

    For Each sFilename In Directory.GetFiles(sFolderPath, Dts.Variables("FilePattern").Value.ToString())

    If GetFileDate(sFilename) <= ETLDate Then

    GetFiles.Add(sFilename)

    End If

    Next

    End Function

    Function GetFileDate(ByVal sFilename As String) As DateTime

    Dim sTemp As String

    Dim y As Int16

    Dim m As Int16

    Dim d As Int16

    sTemp = sFilename.Substring(sFilename.IndexOf("2"), 8)

    y = Convert.ToInt16(sTemp.Substring(0, 4))

    m = Convert.ToInt16(sTemp.Substring(4, 2))

    d = Convert.ToInt16(sTemp.Substring(6, 2))

    GetFileDate = Convert.ToDateTime(DateSerial(y, m, d))

    'DEBUG

    'Dts.Variables("FileDate").Value = Convert.ToDateTime(DateSerial(y, m, d))

    End Function

    Private Function getETLDateFromFile(ByVal sFolder As String) As DateTime

    Dim filePath As String

    Dim TextFileStream As System.IO.TextReader

    Dim FileContents As String

    Dim d As Int16

    Dim m As Int16

    Dim y As Int16

    Try

    If sFolder.EndsWith("\") Then

    filePath = sFolder + "ETLControl.csv"

    Else

    filePath = sFolder + "\ETLControl.csv"

    End If

    'Load the textfile into the stream

    TextFileStream = System.IO.File.OpenText(filePath)

    'Read to the end of the file into a String variable.

    FileContents = TextFileStream.ReadToEnd

    'Close the Stream object

    TextFileStream.Close()

    y = Convert.ToInt16(FileContents.Substring(0, 4))

    m = Convert.ToInt16(FileContents.Substring(4, 2))

    d = Convert.ToInt16(FileContents.Substring(6, 2))

    getETLDateFromFile = Convert.ToDateTime(DateSerial(y, m, d))

    Catch

    getETLDateFromFile = Convert.ToDateTime(DateSerial(1900, 1, 1))

    End Try

    'DEBUG

    'Dts.Variables("ETLDate").Value = Convert.ToDateTime(DateSerial(y, m, d))

    End Function

    End Class

  • Daniel,

    Thanks for this contribution, I really appreciate it. The script files are pretty easy to decipher.

    I wound up putting the text files into an MS Excel format, and that made it easer to work with that last column. I set up a calculated column in the spreadsheets to make the 'Totals' negative where needed, then brought it all into a SQL Server staging table before it went into the production table.

    Thanks again

    CSDunn

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply