Exporting to a fixed length FF with varying record lengths

  • I have two tables in SQL Server 2005 - header and detail. Every document will have exactly one header row and one to many detail rows. The two tables have different sets of columns, but each has a document number column. So if there are two documents in the db, what I need to end up with is a fixed length ff with CR/LF for a row delimiter. The first line in the file should be document #1 header, followed by doc#1 detail lines, doc#2 header, and finally doc#2 detail lines. The total length of the header row will be different than the length of the detail row - the sum of the header columns is 100 chars, and detail is for example 150.

    I've tried two OLE DB Sources, sorted by document number, into a Merge and a FF Destination, but am not having any luck. Any thoughts?

  • Is it the sorting, or the physical output that's failing?

  • I'm not quite sure how to get the output to be fixed length. I started by generating comma delimited output as described here:

    http://vsteamsystemcentral.com/cs21/blogs/steve_fibich/archive/2007/09/25/multi-record-formated-flat-file-with-

    ssis.aspx

    where you create a single derived column that contains all of your data, commas and all. That was working fine, but the requirements changed and now I need fixed length output. The method above is really the only experience I have in this area, and I wasn't sure how to shoehorn (cleanly) the fixed length requirement into that solution. I believe there is a monster SQL statement that would include adding pad chars and trimming each column, but I wanted to avoid that if at all possible.

  • You might be able to accomplish this using a script component in your data flow which finds the length of the longest output line, then pads all of the others with a FOR loop.

  • But I need the output lines to have fixed length columns. So if the Header columns total 125 characters, each header line in the output file needs to be 125 characters long, with every column padded to its particular length.

    Column 1 will always be positions 1-x, column 2 is positions x+1 to y, etc...

  • If you don't want to use BCP with a format file (absolutely the fastest), then something like this is probably in order... brute force 😉

    USE AdventureWorks

    GO

    SELECT STR(SalesPersonID,3)

    + STR(ISNULL(TerritoryID,0),2)

    + STR(ISNULL(SalesQuota,0),8,2)

    + STR(ISNULL(Bonus,0),7,2)

    + STR(ISNULL(SalesYtd,0),12,2)

    + STR(ISNULL(SalesLastYear,0),14,4)

    + CONVERT(CHAR(8),ModifiedDate,112)

    FROM Sales.SalesPerson

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • having had to do this in the past you have two choices.

    Use all the pads and trims and various other things your trying to avoid doing or

    build a separate application to perform this task.

    There is no inbuilt function of SQL Server (that I've ever found) that will output a fixed width file with differing record lengths.

  • Thanks Nick,

    I appreciate the word from the school of been there, done that! I am going to try again to push for a comma delimited solution out of SS and another step in the workflow to handle the delimited -> fixed length issue.

    Mike

  • By the way... there shouldn't be a header on fixed width output...

    ... and, contrary to what's been said, BCP makes a nice fixed width output if you use a format file.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • By the way... there shouldn't be a header on fixed width output...

    That depends (as per everything).

    When you have logical files within a physical file, or if you prefer, batches within a file, then you need some way of separating these. Headers/Footers are great for this purpose. Headers/Footers are also used to convey information about the complete batch and validations, when they can be separated by time (I come across this time and time again in the finance world where it is possible to be working with systems and structures more than 40 years old).

  • No... they are not necessary. When the format of the file changes, then you know there is a change. Headers are not necessary in fixed format.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And how do you know that the format has changed rather than just failed a validation, without some defining flag/feature to indicate a different type of record?

  • Because in a proper mixed format fixed field data set, each row will have a "rowtype" marker. Mixing header information in the data will frequently cause bulk loads to fail because the header will usually not contain the same datatypes.

    This is why a lot of folks use XML to transmit data... no discussion required on how to setup a transmission file correctly. People forget that a data file should follow the same basic rules as a table... things like a single datatype per column and that header information is part of the definition of the table, not part of the data, etc, etc.

    I understand that this type of file "Nirvana" is not the norm... I'm saying that header information being buried throughout the file should not be the norm. But people take even less time to design a correct transmission file than they do with their tables and they end up with some real crap in the files and then moan about how slow the imports are.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • So a "rowtype" marker is what I said before, that there should be some defining flag/feature or if you like value or attribute that indicates that this row has different meaning (whether type, structure etc).

    It is not always the case that a header will only ever be the very first row of a file. Sometimes you just can't have the perfect format.

    Part of this is the definition of a file.

    When transmitting a stream of data, let say down a TCP socket, you may never close the socket between each data set as opening and closing sockets is expensive, but you will end up with multiple header records to provide some break between data sets. Consider each data set a "file"

    Now apply the same concept to a text file. Opening and closing text files is an expensive operation (albeit, not as bad as a socket) and takes up handles and other resources, so if you have multiple sets of data, split by header records, then you have a more efficient data resource. You also have files in a file (or I like to consider them batches as it makes it much less confusing when trying to communicate with business people). Many specifications in finance have this concept (and tend to speak in "files" when they mean a Batch).

    The fact that utililities like BCP dont cope with this is understandable as it woudl then require complex processing logic, which is time and resource expensive, thus slowing down what should be a speedy process.

    I agree that XML is much better for defining data and types, and you can use Soap concepts of envelope and data, but after having had to deal with systems where data was transmitted over 300 baud modems (originally - pehaps even slower) I can understand why the creators of the of the specs didn't use a format like XML (not that it existed at the time, but anything that increased data size was a bad thing).

    The Nirvana that you hint about. This is only likely to exist when all IT systems are in a constant state of renewel, and there is no legacy to deal with. Trying to get a complete industry to change the standardised format for transmission of data is an uphill battle which takes years, and in some cases decades.

  • The Nirvana that you hint about. This is only likely to exist when all IT systems are in a constant state of renewel, and there is no legacy to deal with. Trying to get a complete industry to change the standardised format for transmission of data is an uphill battle which takes years, and in some cases decades.

    It's also likely to exist when a knowledgeable consumer of the data works with a knowledgeable and concerned producer of the data. It's easy... and not (concern must live on both sides). Most people just don't give a damn... just like some folks that write code for a living 😀

    And, yeah, I remember the days of 110 baud being the standard... 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply