Best way to insert records from a CSV file

  • A freind has asked me to import records from a CSV file. There are 10 fields and 208,000 records per file. There will be 30 files to be uploaded at a time.

    I used All Tasks >> IMPORT to load the file into a table. This was quick, less than 30 seconds. But I forsee problems.

    The Import routine creates 10 VarChar(50) fields. I need to define the field types ahead of time. There are decimal, character and DateTime fields.

    There are no indexes.

    I do not know how to IMPORT the second CSV file into the same table.

    I only want to IMPORT 4 fields, not all 10.

    I want all the files to be IMPORTed into the same table without interaction by the user.

    As a VB programmer, my solution would be to write a VB progam to do this. But I do not believe it would be as fast as the IMPORT process in SQL 2008.

    What would be the best way to get some of the fields in all these files into a single table with minimal user interaction and quickly?

    I read the article "SQL Server 2008 Table Valued Parameters Performance " but dealt with XML files and not CSVs.

    Thanks,

    pat

  • Do a books online search for BULK INSERT or BCP. They are exactly what you are looking for. You'll need a formatfile to describe the structure of the files and from there you just import the sections you need.

  • Jeff,

    Thanks for pointing me in the right direction. It looks like the Format File will do all that I need. I assume if I do not specify a field in the Format File that the field in the .txt file will be skipped. Correct?

    However, BOL does not give a very good example of a Format File. Nor does it reference a good list of examples.

    Do all Format Files use XML? Where can I find the specifics of how to write a Format File with the SQL data Types Decimal(9,4), Int, varChar(30) and smalldatetime? The BOL example only shows a Float data type, "SQLFLT8". How do I translate, e.g Decimal(9,4), into a format recognized by the XML Format File?

    Thanks,

    pat

  • I haven't found the complete answer but I wanted to update this post with what I have found.

    The following Bulk Insert command will move data from a text file to a SQL table.

    BULK INSERT Price

    FROM 'C:\Price.csv' WITH (FORMATFILE='C:\PriceFormat.txt')

    Below is the table create statement and one line of data from the text file.

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[Price](

    [name] [varchar](50) NULL,

    [TimePoint] [smalldatetime] NULL,

    [rtPrice] [decimal](9, 2) NULL,

    [rtLoss] [decimal](9, 2) NULL,

    [rtCongestion] [decimal](9, 2) NULL,

    [daPrice] [decimal](9, 2) NULL,

    [daCongestion] [decimal](12, 4) NULL,

    [daLoss] [decimal](12, 4) NULL,

    [versifyId] [bigint] NULL,

    [externalNodeId] [bigint] NULL

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    now the text file:

    ,2010-06-17 05:00:00,35.38,0.04,0.09,35,-0.07,0.01,1711,1

    I have modified the original FormatFile to exclude some data. This excluded data can be identified by having a 0 column number in the SQL Field no column.

    9.0

    10

    1SQLCHAR00","0name""

    2SQLCHAR00","2TimesPoint""

    3SQLCHAR00","3rtPrice""

    4SQLCHAR00","0rtLoss""

    5SQLCHAR00","0rtCongestion""

    6SQLCHAR00","6daPrice""

    7SQLCHAR00","0daCongestion""

    8SQLCHAR00","0daLoss""

    9SQLCHAR00","9versifyId""

    10SQLCHAR00"\r"10externalNodeID""

    Most of my research came from: http://msdn.microsoft.com/en-us/library/ms191479.aspx

    What I still cannot do and would like some advice on is how do is to import 5 fields into a SQL table that has only 5 fields in the SQL table when the original text file has 10 fields.

    Here is the new table Create statement

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[Price1](

    [TimePoint] [smalldatetime] NULL,

    [rtPrice] [decimal](9, 2) NULL,

    [daPrice] [decimal](9, 2) NULL,

    [versifyId] [bigint] NULL,

    [externalNodeId] [bigint] NULL

    ) ON [PRIMARY]

    My attempt at the FormatFile. This DID NOT work.

    9.0

    10

    2SQLCHAR00","1TimesPoint""

    3SQLCHAR00","2rtPrice""

    6SQLCHAR00","3daPrice""

    9SQLCHAR00","4versifyId""

    10SQLCHAR00"\r"5externalNodeID""

    Can anyone help with what the FormatFile would look like to do this?

    thanks,

    pat

  • can try SSIS foreach container to loop the files and load into one table?

  • Changebluesky,

    Thanks for the idea. I will be using VB to download them from the FTP site, to loop through the files and to do the Bulk Insert. The only thing I am having difficulty with at the moment is structuring the FormatFile file so that I only import 5 of the 10 text fields into the SQL Table that only contains 5 fields.

    pat

  • Using OPENROWSET BULK you can specify the column list eg

    INSERT Price1

    ([TimePoint],

    [rtPrice],

    [daPrice],

    [versifyId],

    [externalNodeId]

    )

    SELECT

    TimesPoint,

    rtPrice,

    daPrice,

    versifyId,

    externalNodeID

    FROM OPENROWSET (BULK 'C:\Price.csv', FORMATFILE = 'C:\PriceFormat.txt') AS Z

  • Steve,

    Thank you for adding the Openrowset code. I tried it and it did not work. I received the message, "Invalid column number in the Format File c:\Formatfile.txt". I googled Openrowset csv but couldn't find any examples that import fewer columns than are in originating csv/txt file.

    Could you explain what is wrong with the FormatFile.txt I listed above?

    Thanks,

    pat

    P.S. There is an error in my post and it was copied into your solution. It should be TimePoint and not TimesPoint. Sorry.

  • It seems to me that the formatfile should look more like

    9.0

    10

    1SQLCHAR00","0""

    2SQLCHAR00","2TimePoint""

    3SQLCHAR00","3rtPrice""

    4SQLCHAR00","0""

    5SQLCHAR00","0""

    6SQLCHAR00","6daPrice""

    7SQLCHAR00","0""

    8SQLCHAR00","0""

    9SQLCHAR00","9versifyId""

    10SQLCHAR00"\r"10extermalNodeID""

    If I do not include all the columns from the csv file in this definition, SQL server will not know when the csv file Field 1 begins or ends. In my other shortened FormatFile in the previous reply, the file begins with Field 2. It makes sense to me that this would fails since I didn't tell SQL server what Field 1 in the csv file looks like.

    But I get the same "Invalid column..." error message with the complete FormatFile above.

    This is all very confusing to a first time user of FormatFile.

  • Pat, you will still need to use the full file definition you posted. The format file still has to define each column but when you use OPENROWSET, you can only select the fields you really want.

    9.0

    10

    1 SQLCHAR 0 0 "," 0 name ""

    2 SQLCHAR 0 0 "," 2 TimesPoint ""

    3 SQLCHAR 0 0 "," 3 rtPrice ""

    4 SQLCHAR 0 0 "," 0 rtLoss ""

    5 SQLCHAR 0 0 "," 0 rtCongestion ""

    6 SQLCHAR 0 0 "," 6 daPrice ""

    7 SQLCHAR 0 0 "," 0 daCongestion ""

    8 SQLCHAR 0 0 "," 0 daLoss ""

    9 SQLCHAR 0 0 "," 9 versifyId ""

    10 SQLCHAR 0 0 "\r" 10 externalNodeID ""

    /* Anything is possible but is it worth it? */

  • I believe you need to give each column a name.

    mpdillon (7/15/2010)


    It seems to me that the formatfile should look more like

    9.0

    10

    1SQLCHAR00","0""

    2SQLCHAR00","2TimePoint""

    3SQLCHAR00","3rtPrice""

    4SQLCHAR00","0""

    5SQLCHAR00","0""

    6SQLCHAR00","6daPrice""

    7SQLCHAR00","0""

    8SQLCHAR00","0""

    9SQLCHAR00","9versifyId""

    10SQLCHAR00"\r"10extermalNodeID""

    If I do not include all the columns from the csv file in this definition, SQL server will not know when the csv file Field 1 begins or ends. In my other shortened FormatFile in the previous reply, the file begins with Field 2. It makes sense to me that this would fails since I didn't tell SQL server what Field 1 in the csv file looks like.

    But I get the same "Invalid column..." error message with the complete FormatFile above.

    This is all very confusing to a first time user of FormatFile.

    /* Anything is possible but is it worth it? */

  • Steve,

    I have stumbled upon a half a##ed solution. If I first import the whole csv file into a SQL table that has all the rows. Then I do an Insert using from the intermediate table, Selecting only the columns I want, my final table. This produces the net effect I want but it seems like a round about way and kind of sloppy.

    So I would still like to find the solution that skips the intermediate table and posts only 5 of the 10 fields in the csv file into a SQL table that has 5 fields.

  • Can you show us the code you used to import into your staging table?

    mpdillon (7/15/2010)


    Steve,

    I have stumbled upon a half a##ed solution. If I first import the whole csv file into a SQL table that has all the rows. Then I do an Insert using from the intermediate table, Selecting only the columns I want, my final table. This produces the net effect I want but it seems like a round about way and kind of sloppy.

    So I would still like to find the solution that skips the intermediate table and posts only 5 of the 10 fields in the csv file into a SQL table that has 5 fields.

    /* Anything is possible but is it worth it? */

  • Gatekeeper,

    Thanks. I did not see your post while I continued to work on this issue. You were 100% correct. When i used the full FormatFile the import was successful.

    This contradicts what I posted just a few minutes ago. Obviously, I made some mistake in my testing which led to that post. Please ignore my previous reply if you find this thread while searching for an answer to a similar problem.

    Thank everyone.

    pat

  • Glad to know you got it figured out. BTW, BCP format file issues are a pain to understand 🙂

    You can use the http://msdn.microsoft.com/en-us/library/ms191479.aspx MSDN article to understand the format better.

    mpdillon (7/15/2010)


    Gatekeeper,

    Thanks. I did not see your post while I continued to work on this issue. You were 100% correct. When i used the full FormatFile the import was successful.

    This contradicts what I posted just a few minutes ago. Obviously, I made some mistake in my testing which led to that post. Please ignore my previous reply if you find this thread while searching for an answer to a similar problem.

    Thank everyone.

    pat

    /* Anything is possible but is it worth it? */

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply