CSV Files

  • Hi,

    I need to extract data from .CSV files and load into temporary table.

    Here are the details:

    For ex.

    Application_20091119_185530.csv

    Application_20091119_203530.csv

    Application_20091120_221530.csv

    I have three files with same name but different Date time stamp; I need to make sure I will load first file with least Date and timestamp

    Vishal

  • I would use a For Each loop container to loop through the files and extract the file name.

    Inside the container

    -insert these filenames into a staging table.

    -select filename based on minimum date, you will need to use substring to extract the date from the file name

    -use this file name in the dynamic connection string expression for the flat file source.

  • Hi,

    thanks for replying could you please elaborate more or is it possible for you to send me package itself.

    Thanks

    Vishal

  • What part do you want more information on?

    There is a nice example of the For-Each loop here

    http://www.sqlis.com/post/Looping-over-files-with-the-Foreach-Loop.aspx"> http://www.sqlis.com/post/Looping-over-files-with-the-Foreach-Loop.aspx

    your solution would be close to this, except you need to make select the connection string from a staging table instead of straight from the variable

  • Hi,

    I understood how to extract mulitiple files based on For Each loop.

    But my quastion is what if there are multiple files with different Date Timestamp.

    Here are the details:

    For ex.

    Application_20091119_185530.csv

    Application_20091119_203530.csv

    Application_20091120_221530.csv

    I have three files with same name but different Date time stamp; I need to make sure I will load first file with least Date and timestamp

    Vishal

  • I would load those filenames into a table and then strip out the timestamp part of the filename and then select the file based upon the earliest occuring timestamp

  • Please can you elaborate since i am new to SSIS.

    Please send me steps how to do the same.

    Vishal

  • -In your For Each Loop select to extract the file name

    -Click on Variable mappings and map the file name to a variable

    then

    -Create a Staging table on your Sql Server, and in the For Each container add a Execute SQL task, this task will execute for each file.

    -In this task add a SQL command to INSERT values into your staging table , to do this use a ? in place of your variable.

    http://technet.microsoft.com/en-us/library/ms140355.aspx"> http://technet.microsoft.com/en-us/library/ms140355.aspx

    -Click on parameter mappings and click add and select the variable that contains your file name from the loop.

    -Add a dataflow task after the container

    In this dataflow create a data source that selects the filename from your staging table based upon the earliest time-stamp.

    -Send this data to a Recordset destination

    -back in the control flow

    -add another for-each container that loops through and ADO dataset

    -use Variable mappings to map the value to a variable

    -Add a dataflow in the container with a flat file data source

    -In the flat file data source properties click on expresssions --> connection string and build a dymanic connection string based on the for-each mapped variable.

    http://www.mssqltips.com/tip.asp?tip=1084"> http://www.mssqltips.com/tip.asp?tip=1084

    -Send this data to your database.

  • thank so very much.........last help it is very difficult for me to grasp everything.

    is it possible for you to put everything in SSIS package and send me the .dtsx file.

    That would be great.

    Vishal

  • I am happy to answer questions and point you in the right direction, but as I am at work it would not be fair to my employer to spend time working on your project, sorry.

    You should start of with a simple example and then expand form there, if you are working in a dev environment then you will not do any damage. Also you could look at getting a local SSIS contractor in to build the package for you and you can learn from this, I wouldn't think it would take longer than a day.

  • Instead, You can use bulk insert to load files, if you are not comfortable with SSIS.

    Check if the below code helps you.

    Create procedure usp_ImportMultipleFiles @filepath varchar(500),

    @pattern varchar(100), @TableName varchar(128)

    as

    set quoted_identifier off

    declare @query varchar(1000)

    declare @max1 int

    declare @count1 int

    Declare @filename varchar(100)

    set @count1 =0

    create table #x (name varchar(200))

    set @query ='master.dbo.xp_cmdshell "dir '+@filepath+@pattern +' /b"'

    insert #x exec (@query)

    delete from #x where name is NULL

    select identity(int,1,1) as ID, name into #y from #x

    drop table #x

    set @max1 = (select max(ID) from #y)

    --print @max1

    --print @count1

    While @count1 <= @max1

    begin

    set @count1=@count1+1

    set @filename = (select name from #y where [id] = @count1)

    set @Query ='BULK INSERT '+ @Tablename + ' FROM "'+ @Filepath+@Filename+'"

    WITH ( FIELDTERMINATOR = ",",ROWTERMINATOR = "")'

    --print @query

    exec (@query)

    insert into logtable (query) select @query

    end

    drop table #y

    /*

    Execute the Procedure

    Execute the above procedure by passing parameters as shown below.

    Example 1: To import all .csv files from the folder c:\myimport to a table Account

    Exec usp_ImportMultipleFiles 'c:\myimport\', '*.csv', 'Account'

    Example 2: To import all files from the folder c:\myimport to a table Account

    Exec usp_ImportMultipleFiles 'c:\myimport\', '*.*', 'Account'

    */

  • Jus,

    I do imports in a manner similar to what you code does. I just keep running into DBA's that don't know how to protect their system and still allow the use of xp_CmdShell so I have to resort to other trickery. Here's a gift for you in case you ever run into the same thing... Judging by your code example, I'm pretty sure that you'll be able to expand upon it quite easily...

    EXEC xp_DirTree 'C:\',1,1

    Before you jump and say that xp_DirTree only returns directories, take a look at the command above... it has 3 operands instead of just the two that most folks know about. 😉 Of course the 'C:\' part can be replaced with either a valid UNC or file path. It won't take wild cards but I believe you won't find that to be a problem once you get the output loaded into a table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi All,

    thanks for the response and i understood now how to use For each loop but my question remains the same.

    how to work with date timestamp files, because one file may come 3-4 time a day in that case i have to run my SSIS Package 3-4 time.

    How to retrieve those file in SSIS and if files did nt come that particular day in that case i have to load from the day one my SSIS package.

    vishal

  • Thanks Jeff for your clue on xp_DirTree..It really helps....

  • Vishal,

    Sorry... I'd help if I knew enough about SSIS. Perhaps this post will act as a bit of a bump.

    I will say that you should probably take a look in Books Online (the help system that comes with SQL Server) and at least figure out how to import one file and how to use variables so you know what others are actually talking about when they try to help. I also wouldn't expect much in the form of complete package offerings... most folks are going to point you to a reference and you'll need to figure it out for yourself.

    I believe there's also a short tutorial on SSIS built into Books Online.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply