June 19, 2008 at 4:04 am
Hi all,
I am trying to load txt files to sql server 2005 databases.
These txt files are coming from an other process with a fixed format and with no fields delimiters!
Here is a txt line simple : "0124xxErrkckhhty45569aaerzt0h4h..."
The matter is that it seems to be impossible to extract fields values, by position, from this txt line!
The solution is provided in oracle, and the user could extract fields by caracter positions:
sample :
LOAD DATA
INFILE 'table_with_one_million_rows.dat'
INTO TABLE TABLE_WITH_ONE_MILLION_ROWS
(
COL1 POSITION(1:4) INTEGER EXTERNAL
, COL2 POSITION(6:9) INTEGER EXTERNAL
, COL3 POSITION(11:46) CHAR
, col4 position(48:83) CHAR
, col5 position(85:120) CHAR
, COL6 POSITION(122:130) DATE "MMDDYYYY"
)
It's strange that Microsoft did not think about this!!!:w00t:
So would mind please telling me if there are a similar way to work with non delimited fields when loading txt file to DB?
Thx a lot
June 19, 2008 at 5:43 am
Any answers!!??
June 19, 2008 at 8:12 am
Hi all,
It seems that Microsoft has not thought about this matter yet!;)
I keep on searching and hope i'll find some thing!:ermm:
In case, please make me know your opinions.
Regards
June 19, 2008 at 1:16 pm
[font="Arial"]
Hello,
You could use bcp or DTS for this process. There are more ways to do this process too. But these two are the simplest and some times the most compatible for operational considerations.
DTS will accept a text file as input and will let you prescribe the fixed field lengths. It will let you define each columns type, lengh or mathmatical precision.
bcp lets you set the fixed field lengths and types to a lesser degree.
As a suggestion, if there were less than 65,000 rows in these .txt files instead of 1 million, I would import them into Excel and use the verticle cursors to set the field lengths exactly how you want them. Then I'd export the data as tab delimited txt. Then I'd use DTS to import them in about 30 seconds. Be sure to set the definitions for each field and don't take the defaults or you will run out of column space in a row toot suite.
For the case of 1 million rows of unknown length, I highly recommend you read the bcp instructions. It is very fast. I have to post a 1/2 million row table to about 20 data sets each month and it takes less than 5 minutes per data set using bcp. Unless you have data quality issues it's the way to go for large import text files.
The bcp tool will let you build and save a .fmt file that will let you batch load a known fixed field length file routinely if that's the type of operational data load situation you have. DTS also has that capability, along with the ability to reform columnar data via VB code if you need to. These two processes can be added to a scheduler too for complete automation on a repetive basis.
Not knowing your operational environment goals, I would say that if this is a one time deal, use DTS and you'll see it's quite straight forward. If this will be come a routine data feed, do the bcp thing. If not...post more questions! They are free and no cupons required!
I hope this coaching helps.
Regards,
Terry
[/font]
June 19, 2008 at 2:58 pm
Don't EVER load file straight into a table.
Even if that file is perfectly formatted for your favourite upload process.
Load it into staging table (single varchar column, maybe even text), DO DATA VALIDATION(!!!) and then if everything is all right load data into database using SUBSTRING function.
_____________
Code for TallyGenerator
June 20, 2008 at 3:44 am
Hi all,
First i would like to thanks Terry and Sergiy for the help they gave me : thanks a lot!
Well, i found the solution for this matter it's exactly what Sergiy said : using the Substring() function to parse data from a Temptable.
Here is a simple :
/*
bcp1.txt
aaammm0120030101
bbbnnn0220030102
cccooo0320030103
bcp2.txt
abcxyz5320030104
defhhh1020030105
cdezzz1120030106
fsajku9920030107
*/
Create the table
create table BCPData
(
fld1 varchar(20) ,
fld2 varchar(20) ,
fld3 int ,
fld4 datetime
)
Now run the import
exec ImportFiles 'c:\Transfer\' , 'c:\Transfer\Archive\' , 'bcp*.txt', 'MergeBCPData'
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ImportFiles]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[ImportFiles]
GO
create procedure ImportFiles
@FilePath varchar(1000) = 'c:\Transfer\' , --Import files from here
@ArchivePathvarchar(1000) = 'c:\Transfer\Archive\' , --Move loaded files to here
@FileNameMaskvarchar(1000) = 'bcp*.txt' ,
@MergeProcvarchar(128) = 'MergeBCPData'
AS
set nocount on
declare @ImportDate datetime
select @ImportDate = getdate()
declare @FileName varchar(1000) ,
@File varchar(1000)
declare @cmd varchar(2000)
create table ##Import (s varchar(8000))
create table #Dir (s varchar(8000))
/*****************************************************************/
-- Import file
/*****************************************************************/
select@cmd = 'dir /B ' + @FilePath + @FileNameMask
delete #Dir
insert #Dir exec master..xp_cmdshell @cmd
delete #Dir where s is null or s like '%not found%'
while exists (select * from #Dir)
begin
select @FileName = min(s) from #Dir
select@File = @FilePath + @FileName
select @cmd = 'bulk insert'
select @cmd = @cmd + ' ##Import'
select @cmd = @cmd + ' from'
select @cmd = @cmd +' ''' + replace(@File,'"','') + ''''
select @cmd = @cmd +' with (FIELDTERMINATOR=''|'''
select @cmd = @cmd +',ROWTERMINATOR = ''' + char(10) + ''')'
truncate table ##Import
-- import the data
exec (@cmd)
-- remove filename just imported
delete#Dir where s = @FileName
exec @MergeProc
-- Archive the file
select @cmd = 'move ' + @FilePath + @FileName + ' ' + @ArchivePath + @FileName
exec master..xp_cmdshell @cmd
end
drop table ##Import
drop table #Dir
go
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[MergeBCPData]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[MergeBCPData]
GO
create procedure MergeBCPData
AS
set nocount on
-- insert data to production table
insertBCPData
(
fld1 ,
fld2 ,
fld3 ,
fld4
)
select
fld1= substring(s,1,3) ,
fld2= substring(s,4,3) ,
fld3= convert(int,substring(s,7,2)) ,
fld4= convert(datetime,substring(s,9,8))
from##Import
go
Hope that this will help those how found the same problems;)
Regards
Houcem
June 20, 2008 at 3:45 am
Any comments would be more than welcome!
June 20, 2008 at 8:10 am
Yeaup... Segiy is absolutely correct... never load directly into the final table. Always use a staging table.
So far as the fixed field format you're talking about... yeah, you can load it all into one huge column and use substrings to split the data... but that's slow. Lookup BCP Format files and BULK INSERT... that's the way to do this load if the data in the file is in reasonably stable condition. Using the combination of those two things, you should be able to populate a staging table with individual columns for each field in the file at a rate of nearly 5 million rows per minute.
Lookup BCP Format File and BULK INSERT in Books Online (comes free with SQL Server).
--Jeff Moden
Change is inevitable... Change for the better is not.
June 20, 2008 at 8:28 am
Does BCP format file insure string parsing?
I mean could i extract a field from a string fields by caracter position using the BCP format file?
I think that it's impossible unless i go through Substring()!
June 20, 2008 at 8:43 am
wolfdeeper (6/20/2008)
Does BCP format file insure string parsing?I mean could i extract a field from a string fields by caracter position using the BCP format file?
I think that it's impossible unless i go through Substring()!
Heh... nope... it "ensures" it. Look it up in Books Online... the format file can be used many ways and can also mix formats. Fixed field parsing is one of the fastest methods for a Format File. :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
June 20, 2008 at 8:49 am
Ok i'm on this trace till i find some thing "E"nsuring" what i want:D
But i remember have focused into bcp format file and have not seen such an option!! strange!
I'l carry on searching but if you have a key word or a tip please inform me.
Thx a lot for your help
June 20, 2008 at 8:54 am
wolfdeeper (6/20/2008)
Ok i'm on this trace till i find some thing "E"nsuring" what i want:DBut i remember have focused into bcp format file and have not seen such an option!! strange!
I'l carry on searching but if you have a key word or a tip please inform me.
Thx a lot for your help
The "key" is to not identify column delimiters in the BCP Format file... you would just use "" with nothing in between and you would ensure that the 3rd column had precisely the correct number to cover the characters in the fixed field of the file.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 20, 2008 at 9:32 am
Thanks a lot Jeff!
I will follow this trace and come to you with answers as soon as possible.
Regards
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply