BulkInsert, bcp,dts,...is there a solution!!!?

  • Hi all,

    I am trying to load txt files to sql server 2005 databases.

    These txt files are coming from an other process with a fixed format and with no fields delimiters!

    Here is a txt line simple : "0124xxErrkckhhty45569aaerzt0h4h..."

    The matter is that it seems to be impossible to extract fields values, by position, from this txt line!

    The solution is provided in oracle, and the user could extract fields by caracter positions:

    sample :

    LOAD DATA

    INFILE 'table_with_one_million_rows.dat'

    INTO TABLE TABLE_WITH_ONE_MILLION_ROWS

    (

    COL1 POSITION(1:4) INTEGER EXTERNAL

    , COL2 POSITION(6:9) INTEGER EXTERNAL

    , COL3 POSITION(11:46) CHAR

    , col4 position(48:83) CHAR

    , col5 position(85:120) CHAR

    , COL6 POSITION(122:130) DATE "MMDDYYYY"

    )

    It's strange that Microsoft did not think about this!!!:w00t:

    So would mind please telling me if there are a similar way to work with non delimited fields when loading txt file to DB?

    Thx a lot

  • Any answers!!??

  • Hi all,

    It seems that Microsoft has not thought about this matter yet!;)

    I keep on searching and hope i'll find some thing!:ermm:

    In case, please make me know your opinions.

    Regards

  • [font="Arial"]

    Hello,

    You could use bcp or DTS for this process. There are more ways to do this process too. But these two are the simplest and some times the most compatible for operational considerations.

    DTS will accept a text file as input and will let you prescribe the fixed field lengths. It will let you define each columns type, lengh or mathmatical precision.

    bcp lets you set the fixed field lengths and types to a lesser degree.

    As a suggestion, if there were less than 65,000 rows in these .txt files instead of 1 million, I would import them into Excel and use the verticle cursors to set the field lengths exactly how you want them. Then I'd export the data as tab delimited txt. Then I'd use DTS to import them in about 30 seconds. Be sure to set the definitions for each field and don't take the defaults or you will run out of column space in a row toot suite.

    For the case of 1 million rows of unknown length, I highly recommend you read the bcp instructions. It is very fast. I have to post a 1/2 million row table to about 20 data sets each month and it takes less than 5 minutes per data set using bcp. Unless you have data quality issues it's the way to go for large import text files.

    The bcp tool will let you build and save a .fmt file that will let you batch load a known fixed field length file routinely if that's the type of operational data load situation you have. DTS also has that capability, along with the ability to reform columnar data via VB code if you need to. These two processes can be added to a scheduler too for complete automation on a repetive basis.

    Not knowing your operational environment goals, I would say that if this is a one time deal, use DTS and you'll see it's quite straight forward. If this will be come a routine data feed, do the bcp thing. If not...post more questions! They are free and no cupons required!

    I hope this coaching helps.

    Regards,

    Terry

    [/font]

  • Don't EVER load file straight into a table.

    Even if that file is perfectly formatted for your favourite upload process.

    Load it into staging table (single varchar column, maybe even text), DO DATA VALIDATION(!!!) and then if everything is all right load data into database using SUBSTRING function.

    _____________
    Code for TallyGenerator

  • Hi all,

    First i would like to thanks Terry and Sergiy for the help they gave me : thanks a lot!

    Well, i found the solution for this matter it's exactly what Sergiy said : using the Substring() function to parse data from a Temptable.

    Here is a simple :

    /*

    bcp1.txt

    aaammm0120030101

    bbbnnn0220030102

    cccooo0320030103

    bcp2.txt

    abcxyz5320030104

    defhhh1020030105

    cdezzz1120030106

    fsajku9920030107

    */

    Create the table

    create table BCPData

    (

    fld1 varchar(20) ,

    fld2 varchar(20) ,

    fld3 int ,

    fld4 datetime

    )

    Now run the import

    exec ImportFiles 'c:\Transfer\' , 'c:\Transfer\Archive\' , 'bcp*.txt', 'MergeBCPData'

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ImportFiles]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)

    drop procedure [dbo].[ImportFiles]

    GO

    create procedure ImportFiles

    @FilePath varchar(1000) = 'c:\Transfer\' , --Import files from here

    @ArchivePathvarchar(1000) = 'c:\Transfer\Archive\' , --Move loaded files to here

    @FileNameMaskvarchar(1000) = 'bcp*.txt' ,

    @MergeProcvarchar(128) = 'MergeBCPData'

    AS

    set nocount on

    declare @ImportDate datetime

    select @ImportDate = getdate()

    declare @FileName varchar(1000) ,

    @File varchar(1000)

    declare @cmd varchar(2000)

    create table ##Import (s varchar(8000))

    create table #Dir (s varchar(8000))

    /*****************************************************************/

    -- Import file

    /*****************************************************************/

    select@cmd = 'dir /B ' + @FilePath + @FileNameMask

    delete #Dir

    insert #Dir exec master..xp_cmdshell @cmd

    delete #Dir where s is null or s like '%not found%'

    while exists (select * from #Dir)

    begin

    select @FileName = min(s) from #Dir

    select@File = @FilePath + @FileName

    select @cmd = 'bulk insert'

    select @cmd = @cmd + ' ##Import'

    select @cmd = @cmd + ' from'

    select @cmd = @cmd +' ''' + replace(@File,'"','') + ''''

    select @cmd = @cmd +' with (FIELDTERMINATOR=''|'''

    select @cmd = @cmd +',ROWTERMINATOR = ''' + char(10) + ''')'

    truncate table ##Import

    -- import the data

    exec (@cmd)

    -- remove filename just imported

    delete#Dir where s = @FileName

    exec @MergeProc

    -- Archive the file

    select @cmd = 'move ' + @FilePath + @FileName + ' ' + @ArchivePath + @FileName

    exec master..xp_cmdshell @cmd

    end

    drop table ##Import

    drop table #Dir

    go

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[MergeBCPData]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)

    drop procedure [dbo].[MergeBCPData]

    GO

    create procedure MergeBCPData

    AS

    set nocount on

    -- insert data to production table

    insertBCPData

    (

    fld1 ,

    fld2 ,

    fld3 ,

    fld4

    )

    select

    fld1= substring(s,1,3) ,

    fld2= substring(s,4,3) ,

    fld3= convert(int,substring(s,7,2)) ,

    fld4= convert(datetime,substring(s,9,8))

    from##Import

    go

    Hope that this will help those how found the same problems;)

    Regards

    Houcem

  • Any comments would be more than welcome!

  • Yeaup... Segiy is absolutely correct... never load directly into the final table. Always use a staging table.

    So far as the fixed field format you're talking about... yeah, you can load it all into one huge column and use substrings to split the data... but that's slow. Lookup BCP Format files and BULK INSERT... that's the way to do this load if the data in the file is in reasonably stable condition. Using the combination of those two things, you should be able to populate a staging table with individual columns for each field in the file at a rate of nearly 5 million rows per minute.

    Lookup BCP Format File and BULK INSERT in Books Online (comes free with SQL Server).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Does BCP format file insure string parsing?

    I mean could i extract a field from a string fields by caracter position using the BCP format file?

    I think that it's impossible unless i go through Substring()!

  • wolfdeeper (6/20/2008)


    Does BCP format file insure string parsing?

    I mean could i extract a field from a string fields by caracter position using the BCP format file?

    I think that it's impossible unless i go through Substring()!

    Heh... nope... it "ensures" it. Look it up in Books Online... the format file can be used many ways and can also mix formats. Fixed field parsing is one of the fastest methods for a Format File. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok i'm on this trace till i find some thing "E"nsuring" what i want:D

    But i remember have focused into bcp format file and have not seen such an option!! strange!

    I'l carry on searching but if you have a key word or a tip please inform me.

    Thx a lot for your help

  • wolfdeeper (6/20/2008)


    Ok i'm on this trace till i find some thing "E"nsuring" what i want:D

    But i remember have focused into bcp format file and have not seen such an option!! strange!

    I'l carry on searching but if you have a key word or a tip please inform me.

    Thx a lot for your help

    The "key" is to not identify column delimiters in the BCP Format file... you would just use "" with nothing in between and you would ensure that the 3rd column had precisely the correct number to cover the characters in the fixed field of the file.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks a lot Jeff!

    I will follow this trace and come to you with answers as soon as possible.

    Regards

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply