September 3, 2007 at 4:04 am
Hello,
I am new to using MS SQL, but I hope to learn it well. Currently I have SQL Server 2000.
I have over 8000 text files residing in 4 folders on my hard drive and I need to import those files into a pre-existing table in SQL. All the files have exactly the same format. Also, each file has an initial column-header record (row).
I have looked into DTS but it only seems to allow import of 1 file at a time. Someone suggested I try to merge all 8000+ files into 1 large file and then use DTS to do the import.
I am willing to do that, but I have no idea how to accomplish the merge.
Can this be done in DTS, or can someone suggest a known pre-existing routine or procedure that I can use?
Any suggestions to help me are appreciated. Many thanks.
Josef
September 3, 2007 at 4:49 am
You could try this with the BULK INSERT command depending on complexity.
September 3, 2007 at 6:03 am
Would I be creating some logic for the Bulk Insert command from within Query Analyzer, or is there some way to access the Bulk Insert from within DTS?
Please forgive my lack of awareness on the import subject. It's all part of my learning process as a Newbie.
Thanks for your reply.
Josef
September 4, 2007 at 6:50 am
To merge the files, I would use a language called AWK to remove the 1st line from a file using a program such as:
{if (lc++){print}}
then create a batch file that would pipe all of the source files through this AWK program, appending each one to the end of a target file.
Then import the resulting merged file.
September 5, 2007 at 6:36 am
here's how i would do it if restricted to just using TSQL:
--a table to loop thru filenames
CREATE TABLE ALLFILENAMES(WHICHPATH VARCHAR(255),WHICHFILE varchar(255))
--the source table: yours already exists, but needed for this example.
CREATE TABLE BULKACT(RAWDATA VARCHAR (8000))
--some variables
declare @filename varchar(255),
@path varchar(255),
@sql varchar(8000),
@cmd varchar(1000)
--get the list of files to process:
--#########################################
SET @path = 'C:\MetaStock Ascii\AMEX\'
SET @cmd = 'dir ' + @path + '*.txt /b'
INSERT INTO ALLFILENAMES(WHICHFILE)
EXEC Master..xp_cmdShell @cmd
UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null
SET @path = 'C:\MetaStock Ascii\NASDAQ'
SET @cmd = 'dir ' + @path + '*.txt /b'
INSERT INTO ALLFILENAMES(WHICHFILE)
EXEC Master..xp_cmdShell @cmd
UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null
SET @path = 'C:\MetaStock Ascii\NYSE\'
SET @cmd = 'dir ' + @path + '*.txt /b'
INSERT INTO ALLFILENAMES(WHICHFILE)
EXEC Master..xp_cmdShell @cmd
UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null
SET @path = 'C:\MetaStock Ascii\Preferred\'
SET @cmd = 'dir ' + @path + '*.txt /b'
INSERT INTO ALLFILENAMES(WHICHFILE)
EXEC Master..xp_cmdShell @cmd
UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null
--#########################################
--cursor loop
declare c1 cursor for SELECT WHICHPATH,WHICHFILE FROM ALLFILENAMES where WHICHFILE like '%.txt%'
open c1
fetch next from c1 into @path,@filename
While @@fetch_status <> -1
begin
--bulk insert won't take a variable name, so make a sql and execute it instead:
--CHANGE THE TABLE NAME TO YOUR REAL TABLE
set @sql = 'BULK INSERT StockData FROM ''' + @path + @filename + ''' '
+ ' WITH (
DATAFILETYPE = ''char'',
FIELDTERMINATOR = '','',
ROWTERMINATOR = ''\n'',
FIRSTROW = 2
) '
print @sql
exec (@sql)
fetch next from c1 into @path,@filename
end
close c1
deallocate c1
Lowell
September 5, 2007 at 7:30 am
Aye... nicely done, Lowell... and, I think you'll really like this, as well
Just in case someone can't use xp_CmdShell, there's a little known undocumented stored procedure with an even less known option that you can use to get file names for a given path WITHOUT using xp_CmdShell... here's some code to demo that bad boy... read the comments, folks...
/******************* drop table #MyFiles drop table #MyDir *******************/
--===== Assign a path for the files to be found DECLARE @Path VARCHAR(1000) SET @Path = 'C:\' --Must be UNC if outside of server
--===== Table to hold all the directory info from xp_DirTree. -- Will have both file names and directory names. CREATE TABLE #MyDir ( FileNum INT IDENTITY (1,1) PRIMARY KEY CLUSTERED, LongName VARCHAR(1000), Depth INT, IsFile INT )
--===== Table to hold just file names and a number for -- loops without cursors CREATE TABLE #MyFiles ( FileNum INT IDENTITY (1,1), LongName VARCHAR(1000) )
--===== Store all the directory info for path just one level deep -- Syntax note: Master.dbo.xp_DirTree path,level,markfiles -- "level" should probably always be "1". "0" means "all levels". -- "markfiles" will produce extra column "file" if is not null or "0". -- Could stop demo here but we'll go on to make life easier... INSERT INTO #MyDir (LongName,Depth,IsFile) EXEC Master.dbo.xp_DirTree @Path,1,1
--===== Move just the file info to a separate table for ease of processing. INSERT INTO #MyFiles (LongName) SELECT LongName FROM #MyDir WHERE IsFile = 1 --Can add other criteria to filter by extension, etc. ORDER BY LongName
--===== For demo purposes, display the file names SELECT * FROM #MyFiles
--Jeff Moden
Change is inevitable... Change for the better is not.
September 5, 2007 at 8:10 am
Excellent stuff, guys.
I noticed that my awk program is too long. It just needs to be "lc++"
The following CMD will create a single file for all of the files in one folder, minus the first line of each file:
for %I in (*.*) do type %I | awk "lc++" >> c:\ImportFile.txt
Repeat for each directory, then import the file.
September 5, 2007 at 3:14 pm
Lowell, Jeff Moden, Jeff Gray,
Before anything else, I want to thank you all for your very interesting and helpful ideas!
As a newbie, I feel a little overwhelmed. As a result, I am not totally sure about how to proceed. Perhaps I should provide some additional info on my situation to see if you all agree that each suggestion still applies to my problem or if 1 or more might be ruled out.
I am running a P4 computer with Win 2K as my operating system, and SQL Server 2000 installed. Consequently, I do not have the latest form of SQL Server, and I also do not have the benefit of anything which might tie-in to an XP or Vista operating system. Also, I am unfamiliar with the program / routine called 'awk', and I do not know how to invoke it.
Those things being said, let me define precisely the folders containing the files I want to import into SQL:
C:\MetaStock Ascii\AMEX, C:\MetaStock Ascii\NASDAQ, C:\MetaStock Ascii\NYSE, C:\MetaStock Ascii\Preferred
The 4 folders above contain my 8000+ text files. Again, all files have exactly the same format, and each file has a 'Header' record (row). By the way, I have already created a table in SQL Server 2000 called 'StockData' and my columns are clearly and precisely in-sync with the text data in the folders.
I appreciate tremendously everyone's efforts and time. I would also be happy to know if all solutions offered still apply, and so I welcome gratefully all further feedback.
Josef
September 5, 2007 at 3:33 pm
Hi Josef,
A windows executable can be downloaded from a link on this page http://www.cs.bell-labs.com/cm/cs/awkbook/index.html
There is also a GNU version called GAWK somewhere out there.
AWK is a shell utility that has been around for ages in the UNIX world. It is a very useful tool for doing things to text files.
There may be a way to do it using what's available in the windows CMD shell, but I'm not so good with windows scripting.
My suggestion would be to try the method that uses straight T-SQL, but I'll try to answer any questions should you decide to pursue the script method.
September 5, 2007 at 4:49 pm
Josef,
Need a couple of things from you...
Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 6, 2007 at 5:01 am
Hi Jeff Moden,
1. Here is a printout of the first 10 records (rows) of one of the files:
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
AAC,D,19990721,000000,2.4100,3.5500,2.4000,3.5500,14602,0
AAC,D,19990722,000000,3.5000,4.4000,3.4000,4.3500,12531,0
AAC,D,19990723,000000,4.4000,5.1500,4.3200,5.0000,7517,0
AAC,D,19990726,000000,15.0000,16.0000,6.5000,9.0000,897,0
AAC,D,19990727,000000,10.5000,11.5000,6.0000,8.0000,671,0
AAC,D,19990728,000000,8.5000,8.5000,5.5000,6.0000,252,0
AAC,D,19990729,000000,6.0000,6.0000,4.2500,4.7500,670,0
AAC,D,19990730,000000,5.2500,5.2500,4.5000,4.5000,638,0
AAC,D,19990802,000000,5.0000,7.6000,4.6300,4.6300,102,0
2. Actually, I created the table StockData on my laptop computer, and --- foolishly --- I didn't save the creation statement for that table. I thought I had, but I screwed up the 'save' of the Create Table statement.
3. As you can see, the rows are comma-delimited.
4. The C: drive is on my laptop. No server is involved.
With respect to item (2), if necessary I can re-create the table and make a point of correctly saving the create statement, if you wish. Or, is there some way I can check the properties of the existing StockData table and note them down and then pass that info over to you?
Thanks again to all of you for your assistance and encouragement.
Josef
September 6, 2007 at 5:56 am
Assuming you have SQL server installed on your laptop, you should be able to open Query Analyzer, Press F8, which will open the object browser...find the database>> open the treeview for tables>>expand the folder>find the table StockData>>RightClick>>Select "Script Object To New Window As...>>Create
that will give you the Create Table Statement.
I edited my original example to inlcude your folders and your table name, and should work in QA if you've pointed to the correct database. the FIRSTROW=2 part skips the header row in your files as you described, if all the files end in .txt; otherwise, that portion would need to be tweaked.
Lowell
September 6, 2007 at 8:22 am
Thanks, Josef... I'm on my way to work... will take a whack at it tonight.
Forgot to ask... can you provide a couple of file names from each directory? And, Lowell is correct about generating the code for the table... that would be helpful, as well.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 6, 2007 at 4:27 pm
Everyone,
Thank you all very much! I am heading out for a cup of coffee and then some chores, but I will take my laptop with me and make a point of following the suggestions made in the last few posts.
I should be able to provide further feedback later today.
Again, thanks!
Josef
September 7, 2007 at 1:36 pm
Hello All,
Sorry I didn't get back to you as fast as I wanted to, but I got in late last night and found myself with things that had to be done at home. However, here is some pertinent info:
The create table statement:
CREATE TABLE [StockData] (
[TICKER] [varchar] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[PER] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[DTYYYYMMDD] [int] NOT NULL ,
[TRNTIME] [int] NOT NULL ,
[OPENPRC] [decimal](18, 0) NOT NULL ,
[HIGHPRC] [decimal](18, 0) NOT NULL ,
[LOWPRC] [decimal](18, 0) NOT NULL ,
[CLOSEPRC] [decimal](18, 0) NOT NULL ,
[VOLUME] [int] NOT NULL ,
[OPENINT] [int] NOT NULL
) ON [PRIMARY]
GO
Here are the first 2 file names in my folder C:\MetaStock Ascii\AMEX --
AACD.txt, AAUD.txt
Josef
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply