Compare two flat files

  • Essentially, I'm tasked with performing a SQL task without using SQL Server.

    Basically, we have an error file (csv) that is created once per day. When this is created, yesterdays error file becomes the archive file. What is required, is for an e-mail to be sent to a specified address, containing all of the items that appear in error file that don't appear in archive file. However, these files are not sorted and have no key to sort on.

    I achieved the desired result in a .net script task, asked to test it before putting it live since it had only been tested on my local machine, was ignore and two days later went live with no testing. It failed due to not being able to find Microsoft.Jet.OLEDB 4.0 on the server.

    Rather than add this, I've been tasked with re-writing the code ASAP in a different way. I had thought to do a dataflow task, then use a script component - e.g. (this isn't pretty 😛 )

    Imports System

    Imports System.Data

    Imports System.IO

    Imports System.Collections

    Imports System.Collections.Generic

    Imports Microsoft.SqlServer.Dts.Runtime

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Public Class ScriptMain

    Inherits UserComponent

    'Variables

    Dim ErrorArray As New ArrayList()

    Dim ArchiveArray As New ArrayList()

    Dim SomethingArray As New ArrayList()

    Dim myMetadata As IDTSComponentMetaData90

    Dim outerLoopIndex As Integer

    Dim innerLoopIndex As Integer

    Dim matchcheck As Integer

    Dim check As Integer = 0

    Dim email As String

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    While Row.NextRow()

    If Row.Type.ToString = "Error" Then

    ErrorArray.Add(Row.Filename.ToString)

    ElseIf Row.Type.ToString = "Archive" Then

    ArchiveArray.Add(Row.Filename.ToString)

    Else

    SomethingArray.Add(Row.Filename.ToString)

    End If

    End While

    If Row.EndOfRowset = True Then

    email = Me.Variables.fileServer.ToString

    Dim objWriter As New StreamWriter(email)

    Dim ThisText As String = "Email Text!"

    logWriter.Write("Log Text")

    objWriter.Write(ThisText)

    objWriter.Write("

    Errors

    ")

    For outerLoopIndex = 0 To ErrorArray.Count - 1

    matchcheck = 0

    For innerLoopIndex = 0 To ArchiveArray.Count - 1

    If ErrorArray(outerLoopIndex) Is Nothing Or ArchiveArray(innerLoopIndex) Is Nothing Then

    'Nothing

    ElseIf ErrorArray(outerLoopIndex).ToString <> ArchiveArray(innerLoopIndex).ToString Then

    'Nothing

    Else

    matchcheck = 1

    check = 1

    End If

    Next innerLoopIndex

    If matchcheck = 0 Then

    'Add to e-mail file

    objWriter.Write(ErrorArray(outerLoopIndex).ToString & "

    ")

    End If

    Next outerLoopIndex

    If check = 1 Then

    objWriter.Flush()

    objWriter.Close()

    objWriter = New StreamWriter(email, False)

    objWriter.Write(ThisText)

    End If

    objWriter.Flush()

    objWriter.Close()

    logWriter.Flush()

    logWriter.Close()

    End If

    End Sub

    End Class

    But for some reason, it appears that my arrayLists are not being populated. Any ideas? Or better yet, a better way to do it?


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hi,

    Is there a way you can have the primary key output from the system that produces the error file? This would essentially solve your problem. If the answer to that is no, and the error files for comparision have the same columns and datatypes, then you can actually just do a left outer join in SSIS from one file to another on all the columns. The non-matches will show in a blank column (right side of the join) which you can then filter to using a conditional split.

    Strick

  • Agreed. I would stay far away from using a script task for this. Use a data flow, compare the inputs as described by Strick.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • stricknyn (12/1/2010)


    Hi,

    Is there a way you can have the primary key output from the system that produces the error file? This would essentially solve your problem. If the answer to that is no, and the error files for comparision have the same columns and datatypes, then you can actually just do a left outer join in SSIS from one file to another on all the columns. The non-matches will show in a blank column (right side of the join) which you can then filter to using a conditional split.

    Strick

    Unfortunately, I'm messing with an old system that was not exactly designed well, so primary keys are not in use. I'll try a left outer join, thanks.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply