Dynamic DTS text file import

  • Hello I am new to SQL and DTS

    I am trying to find a way to import a text file to multi SQL tables, the text file has different row layouts and field lenghts.  I am trying to find a way to read each row of the text file and inspect the 1st few bytes to see which columns and tables need to be updated or added.  There will also be multi rows from the text files that need to be read before outputing the data to the Tables, can any one shed some light on this matter for me.

     

    Thank You

    Paul Combellack

  • I would import this to one table with just one field long enough to accomodate all type of records you may have - then I would use

    the DataDrivenQuery Task with the table just imported as the source and anything as destination ( as you do not have to use destination) - in the DDQ task script I would use (after the logic to decide what to do with the record ) lookups to update the table I want.

  • We have many similar situations where we need to read in a file with variable length records and then write them to many different tables.

    We have used the data transform together with active X script.

    You would then be able to examine the initial few bytes of each record using scripting and by using the constructs within DTS scripting to skip an insert for a row build up a single record from many source records. Also use multiple transforms to direct to different destinations.


    Thanks Jeet

  • I would take the route that ipTomN stated

    However, the question you need to ask before starting work on this is: How do you establish the entity relationships for the data provided?

    In other words, if you have different types of data on each line of data, and the data "identifier" is in the first few bytes of data, then what indicates the entity relationships between each line of data.

    If the data is sequential, then you may have to do some extra work to establish the data relationships involved

    Sequential file example:

    0000MemberID

    1234Name

    2345Address

    3456Phone

    4567Amount

    5678BillDate

    00001000

    1234Doe, John

    2345123 Oak St. Madison, WI 40404

    3456555-555-4564

    4567128.00

    567820040101

    00001003

    1234Lincoln, Abe

    2345145 President St. Chicago, IL 43333

    3456555-555-1313

    4567158.12

    567820031201

    Notice that each line has no entity key information - the file data layout assumes sequential reading of the data to establish the data relationships. The data descriptors are the first four characters in this example. The first six lines are for documentation purposes and will likely not be present in a real world data extract.

    One approach is to load each line of data into a raw processing table that has an identity key. DTS automatically reads and loads text data sequentially. Once the data is loaded, it will be a matter of creating a stored proc to parse the data into a meaningful data set by using the Identity key of the raw processign table and the data identifiers embedded in the data.

    The other, and likely more complicated approach is to parse the data via script before or during the import. I don't like this approach as much because the process is usually much slower and the coding is usually much more intense.

    To import this data, create a DTS where the text file import is delimited by a character that is very rarely used such as ~ or ` or a combination of characters that rarely happens. For the destination table, create a table that has two fields. One is a key field that is bigint and is an IDENTITY field. The other field can be varchar of whatever size you think would capture the largest length line of the data. Of course, each line of text would be placed into the varchar field.

    TABLE Example:

    CREATE TABLE [RawTable] (

    [RecID] [bigint] IDENTITY (1, 1) NOT NULL ,

    [RowData] [varchar] (800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    CONSTRAINT [PK_RawTable] PRIMARY KEY CLUSTERED

    (

    [RecID]

    ) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    Once you have the DTS package working where the data is loaded as expected, then you can use code similar to the following to actually parse the data in any way you see fit:

    Select Right(A.RowData,4) as MemberID,

    Right(B.RowData,Len(B.RowData)- 4) as MemberName,

    Right(C.RowData,Len(C.RowData)- 4) as Address,

    Right(D.RowData,Len(D.RowData)- 4) as Phone,

    Right(E.RowData,Len(E.RowData)- 4) as Amount,

    Right(F.RowData,Len(F.RowData)- 4) as BillDate

    from RawTable A

    Inner join RawTable B on B.RecID = A.RecID + 1

    Inner join RawTable C on C.RecID = A.RecID + 2

    Inner join RawTable D on D.RecID = A.RecID + 3

    Inner join RawTable E on E.RecID = A.RecID + 4

    Inner join RawTable F on F.RecID = A.RecID + 5

    where Left(A.RowData,4) = '0000'

    Of course, this assumes the data will always be in the same sequence with the same number of fields.

    If this is not the case, then a situation similar to the following example may come into play:

    0000MemberID

    1234Name

    2345Address

    3456Phone

    4567Amount BillDate

    00001000

    1234Doe, John

    2345123 Oak St. Madison, WI 40404

    3456555-555-4564

    4567128.00 20040101

    4567143.00 20040201

    4567121.00 20040301

    00001003

    1234Lincoln, Abe

    2345145 President St. Chicago, IL 43333

    3456555-555-1313

    4567158.12 20031201

    4567112.12 20040101

    4567110.12 20040201

    456748.12 20040301

    In this case, the use of a cursor will likely have to come into play. I don't have time to cover the details of how to do that, but it could be done.

    If your lucky enough to have data that has a key identifier of some kind in each data line, then your parsing process will be substantially different, but in a way a bit easier to troubleshoot and manage. In such a case, it would be just a matter of determining the type of data in each line, and parsing it out to the appropriate table for that kind of data, using the key identifier to establish the relationships for your database.

    Of course, I simplified this example tremendously. Your data will likely be far more complex.

    As I hope you can see however, it is important to find a way to establish the entity relationships within the data by key identifier or by sequence rules. If you can't establish a key identifier or standardized sequence rules, or a combination of both, then it is near impossible to determine what data is related.

    Of course, if you can work to establish a data extract standard with the people or systems that generate the data in the first place, you will save yourself as much a 90% effort on the import process development.

    I hope this makes sense

    Good luck,

    Al


    "I will not be taken alive!" - S. Hussein

  • Hello

    I am a newbie to DTS and SQL server but I would like to take the time to thank you all for your info and help.  I did not it working the way I want it to but your information has pointed in me in the right direction.

     

    Thank You

    Paul

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply