create fixed width file using BCP

  • Any one has any script or knows how to create a fixed width file from a table using BCP? I have about 300 mill rows in the table so I need to use BCP.

    I am creating the format file as follows:

    master..xp_cmdshell 'bcp DB.dbo.myTable format nul -c -f c:\ABC.fmt -T'

    and using the BCP out as:

    exec master..xp_cmdshell 'bcp "SELECT top 10000 * FROM myServer.DB.dbo.myTable WHERE col1 is not null" queryout "c:\test\ABC.txt" -f"c:\ABC.fmt" -SmyServer -T'

    The format file seems good. One of the columns in the table is a Comments column which could have line feeds or carriage returns in the field values which is causing the values to skip the formatting.

    sample data:

    ----------------------------------------------------------------------------

    2272237Claassen, White and Asociates, P.C.Joliet36-4129374

    2286837Mahoney, Silverman and Cross, LtdJoliet36-4201623

    2288737Nadelhoffer, Kuhn, Mitche, Moss, SalWarrenville36-3539212

    Deactivated per Colleen Ritter - See ID AA1159. 9/3/2004 11:27:43 AM

    2345337Sachnoff & Weaver, Ltd.Chicago36-2735711

    ----------------------------------------------------------------------------

    The value "Deactivated..." is in a separate line because of a carriage return. How do we fix these.

    Thanks in advance.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Use native format instead of text

    or

    Update the data before exporting, replacing line breaks 

  • The text files I create have to be passed through some 3rd party ETL tool for encryption. So the files should be readable. If I use native format, the files come out in a non-readable format.

    Also, I cannot modify existing data.

    any other ideas?

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • If you can't touch the existing data, you may scrub it while retrieving.

    You can use a UDF to strip out CR/LF chars, something like below.

    create function dbo.stripCRLF(@s as Nvarchar(4000))

    returns Nvarchar(4000)

    as

    begin

    while patindex('%' + char(10) + '%', @s-2) > 0

     begin

      set @s-2 = replace(@s, substring(@s, patindex('%' + char(10) + '%', @s-2), 1), '')

     end

    while patindex('%' + char(13) + '%', @s-2) > 0

     begin

      set @s-2 = replace(@s, substring(@s, patindex('%' + char(13) + '%', @s-2), 1), '')

     end

    return @s-2

    end

    go

    Optionally you can create a view of the table with the 'stripped' column.

    create view dbo.strippedFoo

    as

    select col1, col2, dbo.stripCRLF(someText) as col3, col4

    from dbo.foo

    go

    Then you can bcp out either through the view or by query

    bcp "select col1, col2, dbo.stripCRLF(someText), col4 from foo" queryout c:\myfile.txt -c -t; -T

    bcp mydb.dbo.strippedFoo out c:\myfile.txt -c -t; -T

    =;o)

    /Kenneth

     

  • Dinakar,

    Kenneth's suggestion is an excellent one but, if the Comments column is VARCHAR or NVARCHAR, you don't need to create the UDF... just create the view with a small modification.... should run quite a bit faster...

    create view dbo.strippedFoo

    as

    select col1, col2, REPLACE(REPLACE(REPLACE(someText,CHAR(13),' '),CHAR(10),' '),'  ',' ') as col3, col4

    from dbo.foo

    go

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The view idea seems more feasible. I have tables with 200 mil rows. Need to see how it works out when I batch the bcp util. Thanks for the suggestion.

    I also have issues with listing out column names as there are about 40+ columns and the length of the string is exceeding the limit of 1k. So the view might help there too.

    Also I am having issues with NULL and empty values disturbing the alignment. Anyone has any ideas on how to fix that?

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Rather than trying to BCP directly from a 200 Million row table (with or without the view for the mod), why don't you simply do a SELECT/INTO with the correct criteria (and mod formula, by the way) into a new table and BCP from that?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We are encrypting some of our columns in the table. The encrypting tool needs a fixed width file. So I need to create multiple smaller fixed width files which will be fed into the tool. The tool returns another text file with the columns encrypted. Then I would need to BCP IN or BULK INSERT the text files into a new table. Finally, add indexes, constraints, drop the old table, rename the new table to old.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Wouldn't it be easier to spend a little extra money to buy a tool that does it in SQL Server instead of playing around with text files and all?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree. But thats not my call. But this is a one time ordeal for encrypting existing data. For any new data it will be encrypted through the application layer.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Ok... so why can't you use the application to encrypt the data on a "one time basis"?  Sorry, Dinakar, I just don't understand why this is a problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Use a view as Jeff decribes to strip out cr/lf and format data

    use -c and -t parameters instead of format file

    exec master..xp_cmdshell 'bcp "SELECT top 10000 * FROM myServer.DB.dbo.myView WHERE col1 is not null" queryout "c:\test\ABC.txt" -c -t "" -SmyServer -T'

    output layout from using -c and -t parameters will not be affected by nulls or empty columns just make sure column widths are correct and the data is formatted correctly where necessary

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Jeff

    How would I encrypt existing data? I would need to write a tool to read each record, then encrypt it and update the record back in the table which will take forever.

    David

    I am trying out the view option. for a set of 5000 records the file turned out to be 80 MB. I was thinking of creating files of 500,000 records each. I had created flat files earlier using BCP + format files, and for 500,000 record batch the file size was around 150 MB. I just realized the ETL tool provided by the encrypting company also accepts flat files with delimiters too and not just fixed width files. So I would like to try that option. That will greatly reduce the file size.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • I thought you said you have an app that will do the encryption of new data that goes into the table... I thought that maybe you could "trick" it into thinking that all the data was new data without having to do an export/encrypt/import.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • yes it would but it would take a very long time to feed all the data through the app and let it encrypt and put the data in the table. Our web application will actually make a call to a separate server that is hosting the encryption passing in the SSN/other info that needs to be encrypted. The encrypting server will return an encrypted string. The string along with other info is fed into the table. So to simulate thie for millions of records would be almost impossible. The company providing the encryption has provide us an ETL tool that will take a fixed width file and return the file encrypting the columns that needed to be.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply