VBA Classes, Text File Clean up, with DTS Package

  • Let me preface this question by stating that I am very new to using DTS.  I am trying to import several text files that are preformatted for viewing.  Unfortunately, that is the only option now to get the data.  I am trying to import the data, but the reports have need values in the header, and also in a fixed field format (per page).

    I thought of using a VBA class that I have to read through a text file, and then compile an insert statement by reading the header, and then the fixed fields in the page.  But I am unsure how or if VBA classes can be used in SQL Server.

    Next thought was importing the file in a temp table that has nchar fields, and trying to get the data from it. 

    To sum it up, I am looking for some suggestions from other users experience imported formatted text files into SQL Server.  I would like to import these files through a DTS package for simplicity.

    Thanks again for looking at this thread.


    "Life without progression is entropy"
    Sam Jaynes

  • Have you tried to use Enterprise Manager (EM) to Import these text files? This will generate DTS transform for importing files.

    Otherwise, try posting a sample of a few rows of the text file. and the DDL (CREATE table) for the tables that need to be populated.

    Andy

  • You may want to look into Monarch if these are human-readable formatted text files. The Monarch Pro version should have a command-line interface so you can pass in files to process and output file names. You can then use DTS/SSIS to import those files.

    Ideally, see what the cost would be to generate some delimited or fixed-format file containing that data. My previous company did that and it took a while, but was well worth it for the amount the file was used.

    -Pete

  • I've imported a number of formatted .txt files using DTS just by inserting the entire row into a record and splitting it up once it landed in my database.

  • Are you using a insert trigger to split it?


    "Life without progression is entropy"
    Sam Jaynes

  • After reading this a little more closely, I'd have to agree with the poster who indicated to read it into a table, the split the text through TSQL commands. You can use VBScript in your transforms, but that will be painful. If you go up to SSIS, you can use something closer to VBA.Net.

    I think your best bet may be to import into a table with an Identity field. You should have sequential inserts at that point and can pull the "header" data out of one row, then all text up to the row before the next header. Format that all the same and put into a table or transform appropriately. Then work on the next set, etc. You could probably use global variables to store the most recent header and start/end ID of the data for that header. Your formats may even be semi-standard and can be generalized to a handful of stored procs that would take a format type, start row, and end row as parameters. That proc would then format the data appropriately.

    If the data is delimited on each page, but ragged-right (not same number of columns or even differing by page) check out one of the recent posts by Jamie Thompson (http://blogs.conchango.com/jamiethomson) - he had one on strangely structured delimited files @ http://blogs.conchango.com/jamiethomson/archive/2006/07/14/4226.aspx

    -Pete

  • Thank you for your responses... I got pulled away on another project.

    Using DTS Wizard, I was able to build a fixed-width DTS package to import the text at least.  Since the report has a jagged right edge... I ran into a couple problems , I had to force the wizard to skip the first 15 lines so it can see the true length each row (to create the column lengths).  Consequently, I am missing the first 15 lines which contain a Operator number I need.  Is there way to change it to look at the first line again?

    Now since the data is somewhat in a table, I need to accomplish a few things:

    1) Trim Left and Right of each column (after import or through vbs script on importing???) Suggestions?

    2)  Remove empty columns that seperate needed columns

    3) And lastly (for now), I need to read through the table, obtain the operator number in col0015, and apply it to the data in a few rows later.  By the way, the operator id, since chopped in this 'viewable text file report' looks like this in col0015 [erator:  0101].

    All rows in between can be ignored, deleted, or truncated.

    Any suggestions, tips, examples of looping through the table and holding either a local or global variable would be helpful.  Thanks again, and sorry for the delay in the re-post.


    "Life without progression is entropy"
    Sam Jaynes

  • Sam, you're best bet really will be Monarch in this case. It's not easy, but it is designed to take these human-readable text file reports and put them into something like Excel. You're next best bet would be to ask your mainframe maintainers to get you this data in either a fixed-format file or delimited file for you to parse.

    If you need to get this to work, I think you're going to have a lot of trouble because that Operator ID could easily be chopped off regularly throughout the file if they combine multiple Operators into one file and you'll end up parsing something incorrectly.

    One possibility that comes to mind would be to work on the first 15 rows of data in one task and the 16+ rows in another task. That way you could set up two types of formatting and read that Operator ID into a global variable.

    As for 1 - you can trim the data after import.

    2 - Don't map them to your main table - that will eliminate the empty columns. You'll still have to work with/around them, but you can define large blocks that contain data you don't need as individual columns. That may ease the pain some.

    I'd recommend a permanent table with an Identity value that you RESEED when starting this package. That will start your values over at a set point and you can use Integer Math (% and /) to get every ###th row and work on those as a set.

    Good luck and seriously - consider Monarch or getting a data dump of just the data. There will be some up-front cost, but you won't have to worry about the format changing suddenly or other oddities and it will give you all of the data you need and then some. I've been down this road and after spending several man-days on the project, we decided that paying the 4-8 hours of programming time to get a new file was _well_ worth it.

    -Pete

  • Thanks for your message... unfortunately, the datasource that creates these text files is an encrypted, vendor supported database.  Even the views that our apps support people have, are encrypted.  I have went that route, and got heavy feedback and astronomical prices for custom coding by the vendor.

    For the time being, we can get the text files import and cleaned up in Access (through VBA and ADODB.Recordsets), however, this is not the preferred method at all.  With the classes and libraries used in Access, I figured that this existing code would NOT work.

    You are correct, there is a heading for each operator and I beleive the semi-colon is only used once in the report (prior to each operator number).

    The first 15 rows hold data that is erranous, except in one area where the Operator Number is.

    I need to verify that the operator data is every nth row too, and one row per operator.


    "Life without progression is entropy"
    Sam Jaynes

  • Well, I will return to my first suggestion, then - Monarch. It still requires some work, but if you get the slightly more expensive version, it can be driven by a command line to take in a file and output it using whatever template/code you've defined around it. That can generate a delimited file or an Excel file and you can such that in through DTS. There's a learning curve, but you will be able to pull in just about anything on this report using it. I'm not a huge expert on it, but I can see where it's useful for things like this.

    -Pete

  • Unfortunately, Monarch is not an option right now. The thing that irks me is, the report will import just fine into Access or Excel set at OEM LF. SQL Server just destroys the data, as the first three rows are empty, and the rest are ragged right edges. Is there any transformation service that can be run via Excel into SQL, or can you create an adodb.recordset from the text file?


    "Life without progression is entropy"
    Sam Jaynes

  • If you can link to the file as a linked table, you can pull it in via Access using that linked table. Not wonderful, but I've used it before. I understand about the ragged-right thing. All too well.

    Do you have access to SSIS? That may work better for you as it handles ragged-right files better than DTS. MS used to sell SQL Developer edition for $50, not sure if that's still true for 2005, but it may be worth a quick try to see if it meets your needs and could be used in the future.

    -Pete

  • Thanks for your feedback Pete... I am using SQL Server 2000; haven't migrated to 2005 yet. I will explore the Access scenario or see if there is a way to run a BAT file that cleans up the text file.


    "Life without progression is entropy"
    Sam Jaynes

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply