bcp format file and import fixed width text file

  • I'm trying to set up a format file using bcp but I'm unsure of how to set it up.

    I used "bcp de_identified.dbo.test format -c -t -f c:\sql\test3.fmt -T" and it seems to take me to the data type, prefix length and field terminator. How do I know what to put for those sections? How do I get it to start on the location I want? I am trying to import a fixed width text file.  Sorry if I'm not clear enough, it is the first time  I've tried. Thanks.

    I have the import specs, these are the first couple:

    delete

     

    This is what the data looks like

    100107065400011987012922991231PEACHMAN TOM A 056442487S 1044 BEST AVE STEELTOWN OH4120410063304803658AP2658613 0107065401

    2001070654000131311200507102299123135049435 22991231VMD A

    2001070654000131318198701292008092535049435 22991231VMD A

    100167693700032002020120160925ESTRADA PATTY V 110823270S 433339 NERRY HIGHWAY GREENVILLE PA16151 4402533428BE5508900G619930167693706

    20016769370003313181998013020080925MD062293L 20201231VMD C

    20016769370003313222005071020160925MD062293L 20201231VMD C

     

    Table I created:

    CREATE TABLE DE_IDENTIFIED.DBO.TEST
    (
    RECORD_TYPE TEXT
    ,PROMISE_NUMBER TEXT
    ,LOCATION TEXT
    ,START_DATE TEXT
    ,END_DATE TEXT
    ,PROVIDER_NAME TEXT
    ,IRS TEXT
    ,IRS_INDICATOR TEXT
    ,MEDICARE_NUMBER TEXT
    ,COUNTY_CODE TEXT
    ,ADDRESS TEXT
    ,CITY TEXT
    ,STATE TEXT
    ,ZIP TEXT
    ,BOX_NUMBER TEXT
    ,PHONE TEXT
    ,DEA_NUMBER TEXT
    ,UPIN TEXT
    ,NPI_MAMIS TEXT
    ,PROVIDER_TYPE TEXT
    ,SPECIALTY TEXT
    ,SPECIALTY_BEGIN TEXT
    ,SPECIALTY_END TEXT
    ,LECENS_NUMBER TEXT
    ,PRACTICE_TYPE TEXT
    ,MEDICA_DEGREE TEXT
    ,STATUS TEXT
    ,FIELD8 TEXT
    )

     

  • Your file format does not appear to match the data you have provided (eg, row 1 is at least 122 characters in length, whereas your layout is 78 characters in length).

    The data which you have provided appears to contain rows with different record types.

    Can you clarify?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Do you mean the spec part? If so that is just a sample of the first few. I can post them all if needed.

    • This reply was modified 4 years, 7 months ago by  smattiko83.
  • Do you have a different source file for every layout?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • and please please do not create those fields as TEXT - define them as varchar if they are character based - and if the data is numeric or date and can be automatically converted by BCP then define them as such.

  • If this file is truly a fixed width file - then every record should be the same length.  However, it looks like this might be a ragged-right type of file where each record type is a fixed length but not necessarily the same length as the other record types.

    For this type of file - BCP will not work even with a format file.  There are other methods - but to determine which one would be better we would need to see the full file specification.  Specifically we need to understand how each record type is related and whether or not the relationship is maintained in each record or by position in the file.

    And then - we would need to be able to identify how to load each record type.  Is the destination into a single table - or will there be separate tables for each record type?

    There are a lot more questions to be asked and answered before a solution can be developed.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Right now they are being imported into access. There is one spec for the rows that start with '1' and one spec for the rows that start with '2'. Then joined on the promise id. Can I do that with a bpc or what other way?

  • I'm a fan of SSIS, and it can certainly be done that way (but it's not trivial).

    You could write a simple PoSh script to pre-process the file & split it into two, based on record type. From there, BCP should work.

    You could import the data into a fairly flat (RecordType, PromiseNumber, EverythingElse) SQL Server staging table and split it out from there.

    Resident T-SQL master and SSISophobe Jeff Moden will almost certainly have a sub-nanosecond way to do this using only T-SQL, so hopefully he will see this post and respond soon.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks. Unfortunately I do not have ssis. Only SSMS.

  • This was removed by the editor as SPAM

  • smattiko83 wrote:

    I'm trying to set up a format file using bcp but I'm unsure of how to set it up.

    I used "bcp de_identified.dbo.test format -c -t -f c:\sql\test3.fmt -T" and it seems to take me to the data type, prefix length and field terminator. How do I know what to put for those sections? How do I get it to start on the location I want? I am trying to import a fixed width text file.  Sorry if I'm not clear enough, it is the first time  I've tried. Thanks.

    I have the import specs, these are the first couple:

    delete

    This is what the data looks like

    100107065400011987012922991231PEACHMAN TOM A 056442487S 1044 BEST AVE STEELTOWN OH4120410063304803658AP2658613 0107065401

    2001070654000131311200507102299123135049435 22991231VMD A

    2001070654000131318198701292008092535049435 22991231VMD A

    100167693700032002020120160925ESTRADA PATTY V 110823270S 433339 NERRY HIGHWAY GREENVILLE PA16151 4402533428BE5508900G619930167693706

    20016769370003313181998013020080925MD062293L 20201231VMD C

    20016769370003313222005071020160925MD062293L 20201231VMD C

    Table I created:

    CREATE TABLE DE_IDENTIFIED.DBO.TEST
    (
    RECORD_TYPE TEXT
    ,PROMISE_NUMBER TEXT
    ,LOCATION TEXT
    ,START_DATE TEXT
    ,END_DATE TEXT
    ,PROVIDER_NAME TEXT
    ,IRS TEXT
    ,IRS_INDICATOR TEXT
    ,MEDICARE_NUMBER TEXT
    ,COUNTY_CODE TEXT
    ,ADDRESS TEXT
    ,CITY TEXT
    ,STATE TEXT
    ,ZIP TEXT
    ,BOX_NUMBER TEXT
    ,PHONE TEXT
    ,DEA_NUMBER TEXT
    ,UPIN TEXT
    ,NPI_MAMIS TEXT
    ,PROVIDER_TYPE TEXT
    ,SPECIALTY TEXT
    ,SPECIALTY_BEGIN TEXT
    ,SPECIALTY_END TEXT
    ,LECENS_NUMBER TEXT
    ,PRACTICE_TYPE TEXT
    ,MEDICA_DEGREE TEXT
    ,STATUS TEXT
    ,FIELD8 TEXT
    )

    As Jeffrey Williams pointed out, the data you posted is decidedly NOT fixed field.  It may be fixed field on your end but what you have posted is definitely NOT fixed field.  This site may have "swallowed" multiple adjacent spaces that would make it fixed field when you pasted it onto this forum.

    Because this is like an EDI file (built in hierarchy of different record types but without all the * place holders), you won't be able to use a simple format file to solve this problem.  SSIS wouldn't like it very much either.

    As Phil Parkin pointed out, there is, however, a fairly easy and nasty fast way to handle this in a set-based manner.  The key will be to establish (on your end) if this is actually a fixed field format or some bastard mix that some genius provider ginned up while drinking profuse amounts of gin. 😀

    Do that first.  All of the type 1 records should be exactly the same length as the rest of the type 1 records and they must match the record layout you have for type 1 records.  The same hold true for the type 2 records.  They must all be the same length and they must match the record layout for type 2 records.  If you have more than those two record types, then the same applies for each of those record types AND we need to know about them.

    If you find that the records are not actually fixed field, you need to call your data provider to either get them to follow the fixed field rules they established or compose the data as TAB delimited (DO NOT USE COMMA DELIMITED FOR THIS).

    Once that's done, post back on this thread and we can show you a fairly easy method (actually, there are two different but fairly easy methods) to import and parse these types of files to the correct destination tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks. I think they are the same when including spaces. They come from the government as a .dat file and they won't change so I'm stuck with these. I've added in the current import specs. The one is for the lines with '1' and the other is for lines with '2'.

     

    414

  • smattiko83 wrote:

    Thanks. I think they are the same when including spaces. They come from the government as a .dat file and they won't change so I'm stuck with these. I've added in the current import specs. The one is for the lines with '1' and the other is for lines with '2'.

    414

      If you could attach those as a spreadsheet, it would make this a lot easier.  Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've attached a sample text and the import specs. Thanks!

    Attachments:
    You must be logged in to view attached files.
  • smattiko83 wrote:

    I've attached a sample text and the import specs. Thanks!

    Ok... again, there's something wrong with the sample data you provided.  Here's the data using the fixed width font of the code window...

    100107065400011987012922991231PEACHMAN                 TOM    A           056442487S            1044 BEST AVE                                            STEELTOWN        OH4120410063304803658AP2658613      0107065401
    2001070654000131311200507102299123135049435 22991231VMD A
    2001070654000131318198701292008092535049435 22991231VMD A
    100167693700032002020120160925ESTRADA PATTY V 110823270S 433339 NERRY HIGHWAY GREENVILLE PA16151 4402533428BE5508900G619930167693706
    20016769370003313181998013020080925MD062293L 20201231VMD C
    20016769370003313222005071020160925MD062293L 20201231VMD C

    As you can see, there is a column in record type 1 that has Tom and Patty starting at character 56 but there is no column that starts at character 56 in the spec.  Likewise, the address information doesn't start at the same character, etc, etc, etc.

    Did you modify those lines to try to obfuscate some data?  If not, then the provider of the data has screwed you to the floor because the data does NOT meet the spec for row record type #1.  If you did try to obfuscate it, you have to be very careful to do character replacements rather than inserting spaces or deleting characters.

    If you want, PM me with the real UNEDITED data with my promise as a DBA that I'll correctly obfuscate the data and destroy the original.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply