SSIS, FlatFile Destination with Multiple record types and formats

  • Hello all,
    I have a requirement to create a flat file for a one of our vendors that has fixed length columns and multiple record types. Each Record type has a different length. Source data is staged in  SQL tables, I'm using VS2015 and SQL Server 2016.
    So far, I have set up my package with 6 data flows to :
    1. OLE DB Source to get File Header and write out to Flat File destination  > Fileabc.txt
    2. OLE DB Source to get Batch Header and write out to Flat File destination > Fileabc.txt (OverWrite is unchecked so that data is appended)
    3. OLE DB Source to get E Records and write out to Flat File destination > Fileabc.txt (OverWrite is unchecked so that data is appended)
    4. OLE DB Source to get W Records and write out to Flat File destination > Fileabc.txt (OverWrite is unchecked so that data is appended) **This is the part I'm having trouble with
    5. OLE DB Source to get Batch Trailer and write out to Flat File destination > Fileabc.txt (OverWrite is unchecked so that data is appended)
    6. OLE DB Source to get File Trailer and write out to Flat File destination > Fileabc.txt (OverWrite is unchecked so that data is appended)

    Header and trailer data is stored in an audit table. The E records and W records are stored in separate tables, both of which have EmployeeID.
    So, each dataflow selects concatenated data(as 1 column) from an sql table and writes to the same flat file destination, using separate flat file connection managers.  Each flat file connection manager has Format-Ragged Right, row delimiter is CR/LF.
    My question/difficulty is how to ensure each E record is followed by 0 to many related W records in the flat file.  The EmployeeID is not a field on the W record layout specification.  I am successful at getting the File and Batch headers, E Records and the File and Batch trailers in the proper layout and sequence.  I'm just not seeing how to add the W records into the mix in the correct sequence.  And then, B records too (haven't included details on those, once I get the W records done, I'll just apply the same technique)

    Here is a visual of what I need as output  (all columns are fixed length and should line up, but it is difficult to do that in the editor):

    HFile Header Record
    BBatch Header Record  
    E10  John       Smith     12 3rd St     AnyTown     NYYM2018-01-04          
    W2007    50.00     2   Chief Cook
    W2007.1 40.00     2   Chief Cook
    W1998    10.00     2   Chief Cook
    BOtherDataForJohnSmith
    E20    Jane      Doe     987 8th Ave   AnyTown     NYNF2018-08-172018-12-31
    W2002    25.00     1   Bottle Washer
    W2008     5.00      1   Bottle Washer
    W2022     2.00      1   Bottle Washer
    BOtherDataForJane
    TBatchTrailerRowCount
    FFileTrailerRowCount

    Here is the table defs and insert for play data:

    USE

    AHMC

    GO

    DROP

    TABLE IF EXISTS #WRecs

    CREATE TABLE #WRecs
    ([EmployeeID] [varchar](15) NOT NULL,

    [TaxCode] [varchar](12) NOT NULL,

    [QTDTotalWithheld] [decimal](12, 2) NULL,

    [Exemptions] [int] NULL,

    [JobTitle] [varchar](255) NULL,

    [ExtractDate] [datetime] NOT NULL

    )

    INSERT

    INTO #WRecs

    SELECT '10'

    ,'2007'

    ,50.00

    ,2

    ,'Chief Cook'

    ,GETDATE()

    UNION ALL

    SELECT '10'

    ,'2007.1'

    ,40.00

    ,2

    ,'Chief Cook'

    ,GETDATE()

    UNION ALL

    SELECT '10'

    ,'1998'

    ,10.00

    ,2

    ,'Chief Cook'

    ,GETDATE()

    UNION ALL

    SELECT '20'

    ,'2002'

    ,25.00

    ,1

    ,'Bottle Washer'

    ,GETDATE()

    UNION ALL

    SELECT '20'

    ,'2008'

    ,5.00

    ,1

    ,'Bottle Washer'

    ,GETDATE()

    UNION ALL

    SELECT '20'

    ,'2022'

    ,2.00

    ,1

    ,'Bottle Washer'

    ,GETDATE()

    ;

    DROP TABLE IF EXISTS #ERecs

    CREATE TABLE #ERecs

    ([EmployeeID] [varchar](15) NOT NULL,

    [EmployeeNumber] [varchar](20) NOT NULL,

    [FirstName] [varchar](50) NULL,

    [LastName] [varchar](50) NULL,

    [Address1] [varchar](50) NULL,

    [City] [varchar](50) NULL,

    [State] [varchar](5) NULL,

    [PostalCode] [varchar](12) NULL,

    [OnlinePostingInd] [varchar](1) NULL,

    [Gender] [varchar](1) NULL,

    [HireDate] [varchar](10) NULL,

    [TerminationDate] [varchar](10) NULL,

    [Dept] [varchar](12) NULL,

    [ExtractDate] [datetime] NOT NULL)


    INSERT INTO #ERecs

    SELECT '10'

    ,'200'

    ,'John'

    ,'Smith'

    ,'12 3rd St'

    ,'AnyTown'

    ,'NY'

    ,'10017'

    ,'Y'

    ,'M'

    ,'2018-01-04'

    ,NULL

    ,'D1'

    ,GETDATE()

    UNION ALL

    SELECT '20'

    ,'001'

    ,'Jane'

    ,'Doe'

    ,'987 8th Ave'

    ,'AnyTown'

    ,'NY'

    ,'10043'

    ,'N'

    ,'F'

    ,'2010-08-17'

    ,'2018-12-31'

    ,'D5'

    ,GETDATE()

    ;


    Thanks in advance for any help/suggestions...

  • So what i'm understanding is that you need to output different rec formats within the same flat file. 
    Anytime i need to output fixed width i usually stage the data in a table that is setup with Char columns at the widths i need them. 

    So I took your tables and changed them to be all Char types and inserted the data you provided.

    Then found an example of XML output that will get the entire row into a single string (one column) then unioned both these tables together so you can deal with all different record layouts within a single data flow since there is only 1 column of varchar(max type). probably need to change this to a fixed width Char(300) or something to make sure the last column is properly padded. 
    Here is a screenshot to show you what it looks like. 

    USE sandbox
    GO
    select CAST(
        (
       select (
         select ''+T2.N.value('./text()[1]', 'varchar(max)')
         from (
           select T.*
           for xml path(''), type
           ) as T1(N)
          cross apply T1.N.nodes('/*') as T2(N)
         for xml path(''), type
         ).value('substring(./text()[1], 3)', 'varchar(max)')
       --for xml path(''), type
       ) AS VARCHAR(MAX    )) AS myrecord
    from dbo.WRecs as T
    UNION ALL

    select CAST(
        (
       select (
         select ''+T2.N.value('./text()[1]', 'varchar(max)')
         from (
           select T.*
           for xml path(''), type
           ) as T1(N)
          cross apply T1.N.nodes('/*') as T2(N)
         for xml path(''), type
         ).value('substring(./text()[1], 3)', 'varchar(max)')
       --for xml path(''), type
       ) AS VARCHAR(MAX    )) AS myrecord
    from dbo.eRecs as T

    Edit: this came from an example i found here, Also there may be other methods to concatenate these but i've used the XML methods before so that's what i was looking for. 

  • Thanks for your suggestion and clear demonstration.  The issue I am having isn't so much getting different record layouts into a single flat file (I can do that with the headers, detail/E records and trailers), it is more about getting the related records in the correct sequence.  An E record has to be followed by the related W records.
    Here is a snip of a better representation of what I need. Just a partial portion of each record is shown. The W records have the tax code. The employee id is not in the output flat file on the W records.
    

  • OK, yeah i need to read things more closely. So you are already concatenating the columns then writing output to the SAME flat file using different flat file destinations. 
    So you do one data flow then dump to the file. (overwrite for the first one then append all the rest)
    does this happen in some kind of nested loop? Are there multiple batches in a single file?

    So what i would do is concatenate these and dump it all back into a sql table with the needed sort info so you can do a single query ordered by your sort fields and dump the concatenated data into the the flat file using a single destination.

  • No nested loop, one batch per file.  Here is what the control flow looks like:
    

    And each data flow is simply an sql query writing out to a flat file destination.
    Like this:
    

      Each dataflow has it's own flat file connection manager, all of which point to the same data set name. First one overwrites, all rest append.

    I'm starting to consider padding the E recs with spaces to make them the same length as the W recs. Adding the EmployeeID at the end of both E and W recs so I can Union All and sort.  Then I'll have to trim the w recs to the specified length.  Still struggling with this part...how can I keep the record sequence while writing rows with different lengths to the same flat file destination...
    I suspect a script component may be in my future, something I'm not well versed in...

  • AlphaTangoWhiskey - Tuesday, January 22, 2019 8:45 AM

    OK, yeah i need to read things more closely. So you are already concatenating the columns then writing output to the SAME flat file using different flat file destinations. 
    So you do one data flow then dump to the file. (overwrite for the first one then append all the rest)
    does this happen in some kind of nested loop? Are there multiple batches in a single file?

    So what i would do is concatenate these and dump it all back into a sql table with the needed sort info so you can do a single query ordered by your sort fields and dump the concatenated data into the the flat file using a single destination.

    This is the only way I could think of too. But making this work as a fixed-width output is a bit of a nightmare ... all of the columns in the queries will need to be padded out appropriately.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I guess going back to your original question of how to get the related W recs with this setup i would set a variable in the E recs data flow with the value that you need for selecting W recs. 
    Do this with a script component transformation that uses a read/write variable.
    Inside the script create a class level variable that you can set with the key value. 
    I'm thinking that you cant update the variable value withing the input buffer method... vaguely remembering something like that...could have been from SSIS  2005 🙂
    Anyway pretty sure you can do it in post execute method, something like Variables.Name.value = ClassLevelVariable
    Or maybe its something like DTS.Variables["VarNameFromPackage"].value = ClassLevelVariable
    Class level is just another way of saying a global variable. It must be accessible from the ProcessInput and from PostExecute methods. 

    Then when you hit the W recs data flow you add that var as a parameter to the OLE Source.

    MyWrecValue = ? -- in the sql statement
    then click on parms button and map the parm to the variable. If you need to have more variables then you just use the "?" and they map in order of appearance in the SQL statement.

  • This is what I have now...
    My OLE DB Source selects multiple columns concatenated into 1 column, along with EmployeeID and RecordType (E or W respectively)
    The E records are supposed to be 258 bytes long, while the W records are 651.
    The Union All and Sort work great and from the data viewer, it all looks really good.  I would just have to trim off the EmployeeID and the RecordType.
    But the Flat File destination seems to not be set up to allow for the inconsistent record lengths. Even though I have the flat file connection manager set to ragged right, all the CR/LF are in position 651, instead of 258 for the E records.
    But here is an interesting thing...when I copy the data out of the data viewer (after the sort), the CR/LF characters are where they should be for the E records, position 258.  And the CR/LF characters are in the correct position (651) for the W records as well.  Have I gotten something wrong with the flat file connection manager?  What else should I look at?

    

    Flat file Connection Manager:
    

    So close, yet so far...

  • Are you able to create a staging table?

    What I would do is have your data flows populate the staging table. Your staging table should have a LineText column for the text data, and then one or more sequencing columns to sort your rows. So it might look like this:

    When you populate your staging table from your different sources, you format LineText with the contents as it should appear in the flat file.

    In your SSIS dataflow source, use "SQL command" for the data access mode instead of a table or view:
     
    WARNING 1: Make sure you truncate the staging table before each run!
    WARNING 2: I would caution you against using the Sort SSIS transformation in SSIS. Sort is a performance hog that will kill your ETL Server. Instead I would toggle the IsSorted property for your OLEDB Source to "True", since you are sorting using your ORDER BY clause. (This is under the Input and Output Properties tab in the Advanced Editor).

  • I have created a staging table as suggested.

    CREATE TABLE [dbo].[ADPQuarterMixed](

    [ExtractDate] [datetime] NOT NULL,

    [EmployeeID] [varchar](15) NOT NULL,

    [RecordType] [varchar](1) NOT NULL,

    [RowData] [varchar](1000) NULL            --more about this issue later


    The RowData column contains all fields concatenated from either E or W rows from their respective tables.
    When Order By EmployeeID, RecordType, it looks great.
    Here is the Data Flow:
    

    The OLE DB query is simply

    SELECT RowData

    FROM ADPQuarterMixed

    ORDER BY EmployeeID, RecordType

    And the flat file connection manager:

    

    When I look at the data from the data viewer, it looks perfect.  The CR/LF position varies according to the record type, which is the main issue now. 
    But the Flat File has all the CR/LF in a fixed rightmost position.  So, this is not good.
    What  could be happening between the data viewer and the flat file destination?

    Getting back to the staging table...for the E records, the last 35 characters are blanks. So making the RowData varchar results in these being truncated.  If I make it char, it will force the row length to be as long as the (much longer)  W record types.

    So I even tried to change the query in the package to retrieve a substring, but same results

  • Use a For Each Loop on the E records to control what is returned for the W records and then the B records in separate Data Flow Tasks inside the loop. You can have a 2nd column returned in the E records for EmployeeID which you store in a variable to then use to filter the W records and B records.

    I have also used a merge join in similar instances where I add a surrogate sort order column to each data source then use a merge join and sort after the join using the surrogate values to give me the correct order. The additonal coluns are then ignored for the export into the file.

  • I think I've got it!  The problem was with the flat file connection manager after all.
    I changed it to Delimited and used a Column delimiter of CR and a Row Delimiter of LF.

    Now it finally has created a flat file that has the CRLF in varying positions depending on the record type.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply