HT Import About 8000+ Text Files from 4 Hard Drive Folders Into SQL Table?

  • Hello,

    I am new to using MS SQL, but I hope to learn it well.  Currently I have SQL Server 2000.

    I have over 8000 text files residing in 4 folders on my hard drive and I need to import those files into a pre-existing table in SQL.  All the files have exactly the same format.  Also, each file has an initial column-header record (row).

    I have looked into DTS but it only seems to allow import of 1 file at a time.  Someone suggested I try to merge all 8000+ files into 1 large file and then use DTS to do the import.

    I am willing to do that, but I have no idea how to accomplish the merge.

    Can this be done in DTS, or can someone suggest a known pre-existing routine or procedure that I can use?

    Any suggestions to help me are appreciated.  Many thanks. 

    Josef

     

     

  • You could try this with the BULK INSERT command depending on complexity.

  • Would I be creating some logic for the Bulk Insert command from within Query Analyzer, or is there some way to access the Bulk Insert from within DTS?

    Please forgive my lack of awareness on the import subject.  It's all part of my learning process as a Newbie.

    Thanks for your reply.

    Josef

     

  • To merge the files, I would use a language called AWK to remove the 1st line from a file using a program such as:

    {if (lc++){print}}

    then create a batch file that would pipe all of the source files through this AWK program, appending each one to the end of a target file.

    Then import the resulting merged file.

     

     

  • here's how i would do it if restricted to just using TSQL:

    --a table to loop thru filenames

    CREATE TABLE ALLFILENAMES(WHICHPATH VARCHAR(255),WHICHFILE varchar(255))

    --the source table: yours already exists, but needed for this example.

    CREATE TABLE BULKACT(RAWDATA VARCHAR (8000))

    --some variables

    declare @filename varchar(255),

            @path     varchar(255),

     @sql      varchar(8000),

            @cmd      varchar(1000)

    --get the list of files to process:

    --#########################################

    SET @path = 'C:\MetaStock Ascii\AMEX\'

    SET @cmd = 'dir ' + @path + '*.txt /b'

    INSERT INTO  ALLFILENAMES(WHICHFILE)

    EXEC Master..xp_cmdShell @cmd

    UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null

    SET @path = 'C:\MetaStock Ascii\NASDAQ'

    SET @cmd = 'dir ' + @path + '*.txt /b'

    INSERT INTO  ALLFILENAMES(WHICHFILE)

    EXEC Master..xp_cmdShell @cmd

    UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null

    SET @path = 'C:\MetaStock Ascii\NYSE\'

    SET @cmd = 'dir ' + @path + '*.txt /b'

    INSERT INTO  ALLFILENAMES(WHICHFILE)

    EXEC Master..xp_cmdShell @cmd

    UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null

    SET @path = 'C:\MetaStock Ascii\Preferred\'

    SET @cmd = 'dir ' + @path + '*.txt /b'

    INSERT INTO  ALLFILENAMES(WHICHFILE)

    EXEC Master..xp_cmdShell @cmd

    UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null

    --#########################################

    --cursor loop

    declare c1 cursor for SELECT WHICHPATH,WHICHFILE FROM ALLFILENAMES where WHICHFILE like '%.txt%'

    open c1

    fetch next from c1 into @path,@filename

    While @@fetch_status <> -1

     begin

    --bulk insert won't take a variable name, so make a sql and execute it instead:

    --CHANGE THE TABLE NAME TO YOUR REAL TABLE

       set @sql = 'BULK INSERT StockData FROM ''' + @path + @filename + ''' '

           + '     WITH (

                   DATAFILETYPE = ''char'',

                   FIELDTERMINATOR = '','',

                   ROWTERMINATOR = ''\n'',

                   FIRSTROW = 2

                ) '

    print @sql

    exec (@sql)

     fetch next from c1 into @path,@filename

     end

    close c1

    deallocate c1

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Aye... nicely done, Lowell... and, I think you'll really like this, as well

    Just in case someone can't use xp_CmdShell, there's a little known undocumented stored procedure with an even less known option that you can use to get file names for a given path WITHOUT using xp_CmdShell... here's some code to demo that bad boy... read the comments, folks...

    /*******************
    drop table #MyFiles
    drop table #MyDir
    *******************/
    --===== Assign a path for the files to be found
    DECLARE @Path VARCHAR(1000)
        SET @Path = 'C:\' --Must be UNC if outside of server
    --===== Table to hold all the directory info from xp_DirTree.
         -- Will have both file names and directory names.
     CREATE TABLE #MyDir
            (
            FileNum  INT IDENTITY (1,1) PRIMARY KEY CLUSTERED,
            LongName VARCHAR(1000),
            Depth    INT,
            IsFile   INT
            )
    --===== Table to hold just file names and a number for
         -- loops without cursors
     CREATE TABLE #MyFiles
            (
            FileNum  INT IDENTITY (1,1),
            LongName VARCHAR(1000)
            )
    --===== Store all the directory info for path just one level deep
         -- Syntax note: Master.dbo.xp_DirTree path,level,markfiles
         -- "level" should probably always be "1". "0" means "all levels".
         -- "markfiles" will produce extra column "file" if is not null or "0".
         -- Could stop demo here but we'll go on to make life easier...
     INSERT INTO #MyDir
            (LongName,Depth,IsFile)
       EXEC Master.dbo.xp_DirTree @Path,1,1
    --===== Move just the file info to a separate table for ease of processing.
     INSERT INTO #MyFiles 
            (LongName)
     SELECT LongName
       FROM #MyDir
      WHERE IsFile = 1 --Can add other criteria to filter by extension, etc.
      ORDER BY LongName
    --===== For demo purposes, display the file names
     SELECT * FROM #MyFiles

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Excellent stuff, guys.

    I noticed that my awk program is too long.  It just needs to be "lc++"

    The following CMD will create a single file for all of the files in one folder, minus the first line of each file:

    for %I in (*.*) do type %I | awk "lc++" >> c:\ImportFile.txt

    Repeat for each directory, then import the file.

  • Lowell, Jeff Moden, Jeff Gray,

    Before anything else, I want to thank you all for your very interesting and helpful ideas!

    As a newbie, I feel a little overwhelmed.  As a result, I am not totally sure about how to proceed.  Perhaps I should provide some additional info on my situation to see if you all agree that each suggestion still applies to my problem or if 1 or more might be ruled out.

    I am running a P4 computer with Win 2K as my operating system, and SQL Server 2000 installed.  Consequently, I do not have the latest form of SQL Server, and I also do not have the benefit of anything which might tie-in to an XP or Vista operating system.  Also, I am unfamiliar with the program / routine called 'awk', and I do not know how to invoke it.

    Those things being said, let me define precisely the folders containing the files I want to import into SQL:

    C:\MetaStock Ascii\AMEX, C:\MetaStock Ascii\NASDAQ, C:\MetaStock Ascii\NYSE, C:\MetaStock Ascii\Preferred

    The 4 folders above contain my 8000+ text files.  Again, all files have exactly the same format, and each file has a 'Header' record (row).  By the way, I have already created a table in SQL Server 2000 called 'StockData' and my columns are clearly and precisely in-sync with the text data in the folders.

    I appreciate tremendously everyone's efforts and time.  I would also be happy to know if all solutions offered still apply, and so I welcome gratefully all further feedback.

    Josef  

       

  • Hi Josef,

    A windows executable can be downloaded from a link on this page  http://www.cs.bell-labs.com/cm/cs/awkbook/index.html 

    There is also a GNU version called GAWK somewhere out there.

    AWK is a shell utility that has been around for ages in the UNIX world. It is a very useful tool for doing things to text files.

    There may be a way to do it using what's available in the windows CMD shell, but I'm not so good with windows scripting.

    My suggestion would be to try the method that uses straight T-SQL, but I'll try to answer any questions should you decide to pursue the script method.

     

     

     

  • Josef,

    Need a couple of things from you...

    1. Would you post the first 10 lines from any of the files?
    2. Would you post the table creation statement for the StockData table?
    3. If possible, would you post the record layout for the files (not required if tab or comma delimited).
    4. Would you tell us which machine the C: drive is on?  If it's the server, just say it's on the server.  If it's on some other machine, we'll need the machine name and the "share" name for the C: drive on that machine.

    Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff Moden,

    1. Here is a printout of the first 10 records (rows) of one of the files:

    <TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>

    AAC,D,19990721,000000,2.4100,3.5500,2.4000,3.5500,14602,0

    AAC,D,19990722,000000,3.5000,4.4000,3.4000,4.3500,12531,0

    AAC,D,19990723,000000,4.4000,5.1500,4.3200,5.0000,7517,0

    AAC,D,19990726,000000,15.0000,16.0000,6.5000,9.0000,897,0

    AAC,D,19990727,000000,10.5000,11.5000,6.0000,8.0000,671,0

    AAC,D,19990728,000000,8.5000,8.5000,5.5000,6.0000,252,0

    AAC,D,19990729,000000,6.0000,6.0000,4.2500,4.7500,670,0

    AAC,D,19990730,000000,5.2500,5.2500,4.5000,4.5000,638,0

    AAC,D,19990802,000000,5.0000,7.6000,4.6300,4.6300,102,0

    2. Actually, I created the table StockData on my laptop computer, and --- foolishly --- I didn't save the creation statement for that table.  I thought I had, but I screwed up the 'save' of the Create Table statement.

    3. As you can see, the rows are comma-delimited.

    4. The C: drive is on my laptop.  No server is involved.

    With respect to item (2), if necessary I can re-create the table and make a point of correctly saving the create statement, if you wish.  Or, is there some way I can check the properties of the existing StockData table and note them down and then pass that info over to you?

    Thanks again to all of you for your assistance and encouragement.

    Josef

     

     

  • Assuming you have SQL server installed on your laptop, you should be able to open Query Analyzer, Press F8, which will open the object browser...find the database>> open the treeview for tables>>expand the folder>find the table StockData>>RightClick>>Select "Script Object To New Window As...>>Create

    that will give you the Create Table Statement.

    I edited my original example to inlcude your folders and your table name, and should work in QA if you've pointed to the correct database. the FIRSTROW=2 part skips the header row in your files as you described, if all the files end in .txt; otherwise, that portion would need to be tweaked.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks, Josef... I'm on my way to work... will take a whack at it tonight.

    Forgot to ask... can you provide a couple of file names from each directory?  And, Lowell is correct about generating the code for the table... that would be helpful, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Everyone,

    Thank you all very much!  I am heading out for a cup of coffee and then some chores, but I will take my laptop with me and make a point of following the suggestions made in the last few posts.

    I should be able to provide further feedback later today.

    Again, thanks!

    Josef

     

     

     

  • Hello All,

    Sorry I didn't get back to you as fast as I wanted to, but I got in late last night and found myself with things that had to be done at home.  However, here is some pertinent info:

    The create table statement:

    CREATE TABLE [StockData] (

     [TICKER] [varchar] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

     [PER] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

     [DTYYYYMMDD] [int] NOT NULL ,

     [TRNTIME] [int] NOT NULL ,

     [OPENPRC] [decimal](18, 0) NOT NULL ,

     [HIGHPRC] [decimal](18, 0) NOT NULL ,

     [LOWPRC] [decimal](18, 0) NOT NULL ,

     [CLOSEPRC] [decimal](18, 0) NOT NULL ,

     [VOLUME] [int] NOT NULL ,

     [OPENINT] [int] NOT NULL

    ) ON [PRIMARY]

    GO

    Here are the first 2 file names in my folder C:\MetaStock Ascii\AMEX --

    AACD.txt, AAUD.txt

    Josef

     

     

     

     

     

     

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply