How to efficiently import data in .rpt format into SQL 2008

  • If your file ends with a 'rows affected' line or extra CRLF then AFAIK you will not be able to import using the format file as the line does not conform to the format and BCP will not arbitrarily stop processing or ignore the format just because a line starts with CRLF. I will be glad to be proved wrong if Jeff can find a working solution.

    As I see it, only two possibilities

    1. Pre process the data to remove the erroneous data before load, you could use something like AWK to do this.

    2. Import the data as one column and then split the data in T-SQL.

    Either way costs processing and time.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Reading about this after all the discussion, I'm starting to think it would have been a lot faster to try the import with VB or C#, or at least fix the format using VB or C#.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Alvin Ramard (7/23/2014)


    Reading about this after all the discussion, I'm starting to think it would have been a lot faster to try the import with VB or C#, or at least fix the format using VB or C#.

    I just can't believe this: it looks like a normal task, doesn't it? bcp should be able to handle this back to SQL2005 :angry:

    Using program to import data would be unrealistic to GB size file.

    Thanks.

  • It looks like a normal task, but you task was made more complicate by the fact that the file is not properly formatted for a "data exchange" file. Sometimes, when that happens, you have to process the file with some sort of "pre-process" in order to properly format the file.

    So say that sucks, but I say better chances of employment for me. ;-D

    They didn't tell me about all of this when I was at Dal. ;-(



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Alvin Ramard (7/23/2014)


    It looks like a normal task, but you task was made more complicate by the fact that the file is not properly formatted for a "data exchange" file. Sometimes, when that happens, you have to process the file with some sort of "pre-process" in order to properly format the file.

    So say that sucks, but I say better chances of employment for me. ;-D

    They didn't tell me about all of this when I was at Dal. ;-(

    Glad to know you are from Dal

  • David Burrows (7/23/2014)


    If your file ends with a 'rows affected' line or extra CRLF then AFAIK you will not be able to import using the format file as the line does not conform to the format and BCP will not arbitrarily stop processing or ignore the format just because a line starts with CRLF. I will be glad to be proved wrong if Jeff can find a working solution.

    As I see it, only two possibilities

    1. Pre process the data to remove the erroneous data before load, you could use something like AWK to do this.

    2. Import the data as one column and then split the data in T-SQL.

    Either way costs processing and time.

    BCP will actually handle such trailing garbage lines quite well, BCP FORMAT File or not. I go through that at work but I also realize that's just my word and no one should adopt such a claim based on such a word. I got fairly busy today (GOOD vacation) but I'll try to setup a demo for that tomorrow.

    Since the OP says he can use ULTRA EDIT on the file, it means the OP can also run things from the Command Line. That means that he could use BCP for this although, since he can't use xp_CmdShell (which also makes me think he couldn't use an EXEC TASK in a scheduled job), it will continue to resist automation as a scheduled job.

    What would be MUCH better, though, would be if someone broke the horns off the people providing the data and got them to provide the data in proper format that didn't include a row count, etc. It would be an easy thing to do.

    That being said, I also question the whole exercise because the source of the data is obviously the output of a query in SSMS in the "Results to File" mode. There are much better ways to transfer data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • halifaxdal (7/23/2014)


    Alvin Ramard (7/23/2014)


    Reading about this after all the discussion, I'm starting to think it would have been a lot faster to try the import with VB or C#, or at least fix the format using VB or C#.

    I just can't believe this: it looks like a normal task, doesn't it? bcp should be able to handle this back to SQL2005 :angry:

    Using program to import data would be unrealistic to GB size file.

    Thanks.

    Actually, a "normal task" would be to have properly formatted data in the file. 😉 See my comments in my most recent post above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • halifaxdal (7/22/2014)


    For the 400M file, I remove the first two rows (non data), I also added BULKSIZE = 200 to the Bulk Insert, and I was able to load all the data until it hits the last row which is also not a data row.

    But I can use UltraEdit to open the 400M data file and remove the first two rows, I don't want to try to open the 4G data, I can imagine the system will hang there forever. I believe the bcp or Bulk Insert must be able to handle situations like this.

    And sorry I can't upload the data file anywhere: 1. it is against my company's policy; 2. it is too big

    Thank you, any clue is appreciated.

    I'm just not feeling the love here. You COULD modify the file you provided so that it contains the exact anomalies you spoke of, right? That would do two things... it would help me help you because I wouldn't have to take the time to create the data you need help on and it would take all of the guesswork out of the problem because I don't actually know what the anomalies in the data look like. You know how to use Ultra Edit.. modify the file you've already provided.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/23/2014)


    BCP will actually handle such trailing garbage lines quite well, BCP FORMAT File or not.

    Yes you are right Jeff 😀

    I tried several tests with the data supplied and your format file, the test data loaded with the following exception

    If the BOM chars are left at the beginning of the file then an error will be shown

    Invalid character value for cast specification

    and the first row ignored

    If the file contains row(s) affected line (produced by SSMS if NOCOUNT is not switched on)

    then an error will be shown

    Unexpected EOF encountered in BCP data-file

    (This can be ignored as the data loads)

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Jeff Moden (7/23/2014)


    halifaxdal (7/22/2014)


    For the 400M file, I remove the first two rows (non data), I also added BULKSIZE = 200 to the Bulk Insert, and I was able to load all the data until it hits the last row which is also not a data row.

    But I can use UltraEdit to open the 400M data file and remove the first two rows, I don't want to try to open the 4G data, I can imagine the system will hang there forever. I believe the bcp or Bulk Insert must be able to handle situations like this.

    And sorry I can't upload the data file anywhere: 1. it is against my company's policy; 2. it is too big

    Thank you, any clue is appreciated.

    I'm just not feeling the love here. You COULD modify the file you provided so that it contains the exact anomalies you spoke of, right? That would do two things... it would help me help you because I wouldn't have to take the time to create the data you need help on and it would take all of the guesswork out of the problem because I don't actually know what the anomalies in the data look like. You know how to use Ultra Edit.. modify the file you've already provided.

    Yes and I just realize I can cut off the middle of the data file and keep the head and tail, I'll do it later when I have time. Thanks

  • Jeff Moden (7/23/2014)


    halifaxdal (7/23/2014)


    Alvin Ramard (7/23/2014)


    Reading about this after all the discussion, I'm starting to think it would have been a lot faster to try the import with VB or C#, or at least fix the format using VB or C#.

    I just can't believe this: it looks like a normal task, doesn't it? bcp should be able to handle this back to SQL2005 :angry:

    Using program to import data would be unrealistic to GB size file.

    Thanks.

    Actually, a "normal task" would be to have properly formatted data in the file. 😉 See my comments in my most recent post above.

    I agree Jeff.

    We all know there's no better way than to use Excel to provide data for import. :w00t:

    Just kidding, of course. 😀



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • halifaxdal (7/24/2014)


    Jeff Moden (7/23/2014)


    halifaxdal (7/22/2014)


    For the 400M file, I remove the first two rows (non data), I also added BULKSIZE = 200 to the Bulk Insert, and I was able to load all the data until it hits the last row which is also not a data row.

    But I can use UltraEdit to open the 400M data file and remove the first two rows, I don't want to try to open the 4G data, I can imagine the system will hang there forever. I believe the bcp or Bulk Insert must be able to handle situations like this.

    And sorry I can't upload the data file anywhere: 1. it is against my company's policy; 2. it is too big

    Thank you, any clue is appreciated.

    I'm just not feeling the love here. You COULD modify the file you provided so that it contains the exact anomalies you spoke of, right? That would do two things... it would help me help you because I wouldn't have to take the time to create the data you need help on and it would take all of the guesswork out of the problem because I don't actually know what the anomalies in the data look like. You know how to use Ultra Edit.. modify the file you've already provided.

    Yes and I just realize I can cut off the middle of the data file and keep the head and tail, I'll do it later when I have time. Thanks

    Heh... when you have time... you're the one with the problem so we can wait. 😉

    But, I'll also point out that if you didn't have to do so much manually, you might have more time. :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Alvin Ramard (7/24/2014)


    Jeff Moden (7/23/2014)


    halifaxdal (7/23/2014)


    Alvin Ramard (7/23/2014)


    Reading about this after all the discussion, I'm starting to think it would have been a lot faster to try the import with VB or C#, or at least fix the format using VB or C#.

    I just can't believe this: it looks like a normal task, doesn't it? bcp should be able to handle this back to SQL2005 :angry:

    Using program to import data would be unrealistic to GB size file.

    Thanks.

    Actually, a "normal task" would be to have properly formatted data in the file. 😉 See my comments in my most recent post above.

    I agree Jeff.

    We all know there's no better way than to use Excel to provide data for import. :w00t:

    Just kidding, of course. 😀

    OMG. I'm going through this right now. MS sure didn't do us any favors with the CSV exports from Excel.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/24/2014)


    Alvin Ramard (7/24/2014)


    Jeff Moden (7/23/2014)


    halifaxdal (7/23/2014)


    Alvin Ramard (7/23/2014)


    Reading about this after all the discussion, I'm starting to think it would have been a lot faster to try the import with VB or C#, or at least fix the format using VB or C#.

    I just can't believe this: it looks like a normal task, doesn't it? bcp should be able to handle this back to SQL2005 :angry:

    Using program to import data would be unrealistic to GB size file.

    Thanks.

    Actually, a "normal task" would be to have properly formatted data in the file. 😉 See my comments in my most recent post above.

    I agree Jeff.

    We all know there's no better way than to use Excel to provide data for import. :w00t:

    Just kidding, of course. 😀

    OMG. I'm going through this right now. MS sure didn't do us any favors with the CSV exports from Excel.

    Certainly not.

    FYI, some of the issues with importing data from Excel can sometimes manifest themselves in generating a CSV file from Excel. I guess it's a driver issue.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Viewing 14 posts - 46 through 58 (of 58 total)

You must be logged in to reply to this topic. Login to reply