SSIS Data Load Question.

  • I get all my data in flat files and the last row in the flat file is a total rows in the flat file.

    Question 1. In SSIS when I use a flat file source how do I avoid the last row in the data file while loading the data into the table.

    Question 2. Is there a way I can capture the last row that has the row count as the only column in a varible to validate that all rows above were loaded into my table. Would appreciate your response.

  • 1. The flat file source will not do this automatically for you.

    2. Maybe.

    The real question is whether you can tell the last record from all the other records, by something OTHER than it is the last record..

    CEWII

  • I cannot tell which one is the last record as it varies by file and every month. So once of the challenges is to identify the last record.

  • Yes, that is your big challenge. I don't know of a way to catch the last row any way. Even using a script component I don't know of a way to tell. Also, if that last row has a different structure it will probably break the flat file source.

    So that leaves an unpleasant method. You can generate a Script Source, read the file yourself and write each record into the buffer.

    Here is an example that takes variables that were passed into the Data-Flow script source and creates a buffer row for them:

    ' Microsoft SQL Server Integration Services user script component

    ' This is your new script component in Microsoft Visual Basic .NET

    ' ScriptMain is the entrypoint class for script components

    Imports System

    Imports System.Data

    Imports System.Math

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Public Class ScriptMain

    Inherits UserComponent

    Public Overrides Sub CreateNewOutputRows()

    '

    ' Add rows by calling AddRow method on member variable called "Buffer"

    ' E.g., MyOutputBuffer.AddRow() if your output was named "My Output"

    '

    With DiskUsageVariablesBuffer

    .AddRow()

    .Description = Variables.Description

    .DeviceID = Variables.DeviceID

    .FreeSpace = Variables.FreeSpace

    .Size = Variables.Size

    .ServerName = Variables.SystemName

    End With

    End Sub

    End Class

    The output from the task was called: DiskUsageVariables so you are adding records to the DiskUsageVariablesBuffer.

    I hope this gives you some ideas..

    CEWII

  • If the flat file source can read all rows, even the last one, you can try to filter the last row out by using a conditional split.

    The condition to find the last row, well, that is up to you, since I don't know your file structure.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • That has already been suggested and discussed..

    CEWII

  • Elliott W (3/3/2010)


    That has already been suggested and discussed..

    CEWII

    I'm confused. I saw nothing about a conditional split.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Must have been a different post.. Either way identifying the last record is a big problem, especially if the structure is different.

    CEWII

  • The file structure has 10 columns they are comma delimited but the last row has only one number hence the schema differs between all valid rows and the last row. Yes the challenge is identifying the last row for using a conditional split unless there is a way to do this that I am not aware off.

    Alternately I thought I could do the following

    1. Use the RowCount task with a flat file source to save the total rows in a variable TotalRows .

    2. Then use the bulk insert task to insert total (TotalRows -1).

    Do you think that will work?

  • So let me see if I understand this, the last row has a number and then all the other "fields" are null/blank?

    If that is true AND that NEVER happens on any other record then you have your criteria for the conditional split. The big problem was finding a condition that is true for the last record but not true for ANY other record. I hope this is it.

    CEWII

  • Yes that is corrrect the last row has only one column that is the rowcount. So do you think my rowcount approach will work.

  • I do. I"m looking at it to see how you are going to fail it if the numbers are different.

    You can capture the record and get the number but you need a record count component right before the destination, I say there because it is the last place that a record goes through before being inserted. The problem is that if you compare the record count at the moment you get the last record it will probably be different. The reason why is that records flow through the pipeline, so the number of records you got from the source is usually different than the number you are dealing with at the bottom of a pipeline, at least until it is all done. You might have to add in some wait logic to make it hold off the comparison for a bit. You could build some logic like if it doesn't match then wait 3 minutes and try again for a couple retries. If it still doesn't match return script failed as the result.

    CEWII

  • Maybe this is an option:

    right before you read your flat file, edit the file with a script task. Go to the last row, place something before the number and add some commas.

    For example:

    before

    xxx,yyy,zzz,fff

    frrrrrr,gggg,hhhh,u

    2 (this is last row, indicating that there are 2 rows of data. The data consists of 4 columns.)

    after

    xxx,yyy,zzz,fff

    frrrrrr,gggg,hhhh,u

    <last_row>,2,<dummy_data>,<dummy_data>,

    Now the last row also has 4 columns and it can be read by the flat file source.

    After the source, place a conditional split that searches for <last_row> and place the value of the second column in a variable (with a script transform for example).

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply