Check file name coming from FTP before loading to staging table

  • Hi, I am pretty new to SSIS. I am to come up with a package in SSIS to perform as below.

    Problem 1

    I have a csv files coming in through ftp with the file name 'FILE_yyyymmdd.csv' .Another department is responsible to send each days records and those records will have that day's file name. After each file is sent to ftp, a dummy csv file with the name 'FILE_yyyymmdd.csv.dummy' would be sent over to just indicate that the FTP process did not fail.I have to come up with a solution to only import files ONLY when the dummy appears there. Otherwise go back and check after 30 minutes and this continues until the dummy file arrives there which simply means the file has successfully come to the FTP site.The .csv file would be loaded into a staging table.

    Problem 2

    What would be the best SSIS data transformation choice, if I have 3 source file (.csv).Each file coming in once a day from different locations at different time zone.These 3 source files are to come through an ftp site.After which these files would be loaded to a single staging table.The SSIS package should be able to check if incoming rows already exist previously, perform updates if there were changes on any columns, and insert a new row if data does not exist.Source files coming in would have fields in comma delimitered format.The final destination would be a table with updated rows from all 3 source files.

    I hope someone would be able to help me out.I would be glad to give any other additional information if required.Do help me out ASAP !

    Thanking You in advance

    [font="TimesNewRoman"] “I haven't failed, I've found 10,000 ways that don't work”........Thomas Alva Edison[/font]

  • Hi..

    Can you please let me know if you got any solution for this. Even I too have similar requirement.

    Thanks in advance

    Suresh Kumar

  • Hi Suresh,

    I did not find any solution up till now.I have done a lot of reading and

    have made an assumption on the transformation that I might need to use.

    I am still working on how FTP would distinguish which file to process based on the file name.Let me know if you get a solution.

    [font="TimesNewRoman"] “I haven't failed, I've found 10,000 ways that don't work”........Thomas Alva Edison[/font]

  • I had a similar situation which I solved in this manner:

    I created a task to retrieve the file that indicates the real file has arrived successfully. (in your case the dummy file). If this task succeeds the package continues by retrieving the real file and processing the data in it; if it fails, it sends an email to myself and the sender and the package quits. In my case, the data is only sent once daily, in the evening after normal working hours. The failure message alerts us to problems which can then be resolved before the next file is due.

  • Thanks Sarah, but I do have 3 files coming at 3 different times and I have to automate this to run by itself each time selecting each day's date file. Do you have any idea how to automate the FTP to do this every time? I am kindda stuck here! How do we totally make the system to do it?

    [font="TimesNewRoman"] “I haven't failed, I've found 10,000 ways that don't work”........Thomas Alva Edison[/font]

  • You may have more luck having a OS process (like a .CMD file or a .NET console application) check for these files and fire the SSIS package off one FTP file at a time, instead of doing it all from within SQL Server.

    You can get there pretty quickly by looking up the .NET DIR() command (the examples show how to loop through directories and "do things" to files). SSIS has a command line utility call DTEXEC that would allow you to call it outisde of SQL Server; it also has mechanisms for passing in parameters (for example - the file you want it to work on).

    You can then use the Server's Scheduled Tasks interface to run this several times a day.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I do it by running a ftp job that sweeps the server picking up any file it sees mget *.* as often as needed. then I run a vbscript which uses objWMIService.ExecNotificationQuery _ & __InstanceCreationEvent.

    Then a Select CASE objFile.Name ....

    do... Sub ProcessJob()

    ' Execute our DTS Package stored in SQL Server

    It's a little kludgey but it works without fail for as many filenames you want to define.

    Rick

  • I have used a script task to look for specific files on an ftp site. The next task would be an FTP task to actually get the file which is in a variable set by this script task. Here is some sample code. Use this as a starting point to loop and check each of the three file names. The .Net docs have great examples of how to use the FtpWebRequest class.

    Public Class ScriptMain

    Dim ftpRequest As FtpWebRequest

    Dim ftpResponse As WebResponse

    Dim ftpReader As StreamReader

    Dim fileName As String

    Dim ftpErrRequest As FtpWebRequest

    Dim ftpErrResponse As WebResponse

    Dim ftpErrReader As StreamReader

    Dim errFileName As String

    Dim Retry As Integer

    Dim Wait As Timer

    Try

    Retry = 0

    Do

    'Do an "ls .out" on the ftp site

    ftpRequest = CType(FtpWebRequest.Create(New Uri(String.Format("ftp://ftp.site.com/{0}", Dts.Variables("User::DummyFileName").Value))), FtpWebRequest)

    ftpRequest.Credentials = New NetworkCredential("user", "password")

    ftpRequest.Method = WebRequestMethods.Ftp.ListDirectory

    ftpResponse = ftpRequest.GetResponse()

    ftpReader = New StreamReader(ftpResponse.GetResponseStream())

    fileName = ftpReader.ReadToEnd().Trim()

    ftpReader.Close()

    'Increment counters and check the results and perhaps wait to try it again in 60 seconds

    Retry = Retry + 1

    'MsgBox(String.Format("{0} {1} {2} {3}", fileName, fileName.Length, errFileName, errFileName.Length))

    'If the dummy file was not there wait 60 seconds and try again

    If fileName <> Dts.Variables("User::DummyFileName").Value.ToString() Then

    System.Threading.Thread.Sleep(60000)

    End If

    Loop Until fileName = Dts.Variables("User::DummyFileName").Value.ToString() Or Retry > 5

    Dts.TaskResult = Dts.Results.Success

    Catch ex As Exception

    'MsgBox(ex.Message)

    Dts.TaskResult = Dts.Results.Failure

    End Try

    End Sub

    End Class

  • Stupid smilies. Those should be a colon : followed by a D

  • Thanks. But do you know anything easier than this. I am not a hard core programming person. I really appreciate your help in this, but I just can't run the code. Too many things going wrong. Would you mind helping.

    1. There are new files uploaded in the FTP site everyday by other regions. 3 regions and 1 file each. That means 3 files at 3 different times. The file name will be that same day's date.csv. Example; today's file will be A_20071106.csv , B_20071106.csv and C_20071106.csv. Tomorrow's will be A_20071107.csv,B_20071107.csv and C_20071107.csv. How do I do this to run everyday and take the file only for that day from FTP straight into SQL Staging table?

    2. Everyday, immediately after each file is uploaded, to indicate the FTP file is loaded successfully, there will be a 20071106.dummy and so on for all three files will show up on the same folder in FTP. I must make this package runable only if the .done file arrives for that days date and then execute the .csv file. Otherwise check after 30 minutes if the .dummy is there yet or not. Do this until the .dummy file comes. Then execute the package that is on that time. Then do the same for the other two for that particular day. The times are 4pm for A, 6pm for B and 8.30pm for C

    3. Then, if there is a new row, it should INSERT, any changes (based on 5 columns), it should update and if there was a row yesterday and it is not there today, DELETE.

    4. All this must be automated.

    [font="TimesNewRoman"] “I haven't failed, I've found 10,000 ways that don't work”........Thomas Alva Edison[/font]

  • Try this:

    Using BIDS, create an SSIS package with the following components:

    1) a script task that builds the file name you are looking for -- put it into a variable

    2) an ftp task that looks for the dummy file

    3) if 2 succeeds, another ftp task to retrieve the .csv file

    4) a data transformation task to read through the .csv file and take the appropriate action

    5) a clean up ftp task to delete the files (dummy and csv) just processed from the ftp server.

    Once you have the package working as you want, deploy it to your sql server and just schedule it to run every 30 minutes. When the file arrives it will be processed, then deleted. 30 minutes later, the job will kick off, find nothing and quit. (and so on and so on until the next day's file arrives)

    Treat each file as a seperate entity -- create three similar packages/jobs.

  • As suggested earlier, you can pull these files down via a script task or use xp_cmdshell to build a command file for FTP and executed in a batch file (execute process task). In the past I've used the File Watcher Task (available via http://www.sqlis.com/23.aspx) to populate a boolean variable to determine whether or not the file exists - i.e.

    Public Class ScriptMain

    'Purpose: Determines if the file exists and sets the varFileExists boolean value.

    Public Sub Main()

    Dim fileLoc, fileName As String

    If Dts.Variables.Contains("User::varFileName") = True Then

    fileName = CStr(Dts.Variables.Item("User::varFileName").Value)

    'System.Windows.Forms.MessageBox.Show("FileDir:"fileName)

    If File.Exists(fileName) Then

    Dts.Variables.Item("User::varFileExists").Value = True

    'System.Windows.Forms.MessageBox.Show("File Exists!")

    Else

    Dts.Variables.Item("User::varFileExists").Value = False

    'System.Windows.Forms.MessageBox.Show("File Does Not Exist!")

    End If

    Dts.TaskResult = Dts.Results.Success

    Else

    Dts.TaskResult = Dts.Results.Failure

    End If

    End Sub

    End Class

    Based on your requirements, you might also find the file inserter transformation useful (http://www.sqlis.com/305.aspx). Regarding the conditional UPSERT data transformation, SQLIS also has some good articles on how to achieve this (http://www.sqlis.com/311.aspx).

    Good luck! 🙂

  • Do you have the task coding? What transformation? Thanks

    [font="TimesNewRoman"] “I haven't failed, I've found 10,000 ways that don't work”........Thomas Alva Edison[/font]

  • If your looking for specific examples, go to http://www.sqlis.com. All of the tasks, transformations (above) are available here with detailed instructions.

  • Thanks a bunch to Sarah there!!! At least a rescue. I have done as you said, and it worked great. But I am very weak on Script Task. If anybody can help me in detail, I mean very very detail for each things I need to set in the Script Task transformation, I will be done totally!!! I have attached what I have done, but if you see clearly, I have disabled the Script Task. (I will wait for somebody to give me a great advise on that, better of if they can write the whole code for me and show me exactly where I need to declare the variables; I have no idea about the variables or what so ever). Then I have put them in UNION and from there to SCD to make the changes for update and insert. Still looking how to delete a row if the row was there before but when new file comes in if it isn't there anymore. I will also need help on deleting those "not there anymore" rows. Little bit more help guys!!!

    PLEASE SEE THE ATTACHMENTS TO UNDERSTAND BETTER

    [font="TimesNewRoman"] “I haven't failed, I've found 10,000 ways that don't work”........Thomas Alva Edison[/font]

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply