How to import multiple CSV files.

  • Guys, you are all brilliant:-D

    If only I had posted on here sooner I would have saved myself a lot of pain.

    Just been trying the SSIS development in visual studio (never even knew about this, I thought SSIS was the import wizard in sql management studio:blush:). I will definitely be learning this.

    In the short term the bulk insert method is perfect. I have never used bulk insert before as I simply used OPENROWSET from my vb code and never had a moments trouble. My hacking attempts at bulk insert with an xml format file were clearly not very good:(

    If you guys were in the area it would be free beers all night, I have gone crazy with this.

    I had been attempting to plaster over the old solution of OPENROWSET with any manner of patches posted around the net. I am not sure exactly what the problem is with ODBC and 64 bit sql server, MDAQ is at the heart of the problem I believe, there is plenty posted about the problem around the net.

    Thanks all, I can sleep soundly tonight:-)

  • Lowell (6/23/2011)


    ok, BULK insert's about a zillion times easier to use than SSIS.

    your original table definition that you posted is missing two columns that appear in your data, country, and selection i think it was..

    this works perfectly, 10K rows in 0 seconds, i'd have to trace it to get the milliseocds it was so quick::

    CREATE TABLE [dbo].[TBLEXAMPLE] (

    [SPORTS_ID] INT NULL,

    [EVENT_ID] INT NULL,

    [Country] VARCHAR(255) NULL,

    [SETTLED_DATE] VARCHAR(255) NULL,

    [FULL_DESCRIPTION] VARCHAR(255) NULL,

    [COURSE] VARCHAR(255) NULL,

    [SCHEDULED_OFF] VARCHAR(255) NULL,

    [EVENT] VARCHAR(255) NULL,

    [ACTUAL_OFF] VARCHAR(255) NULL,

    [SELECTION_ID] INT NULL,

    [SELECTION] VARCHAR(255) NULL,

    [ODDS] decimal(19,4) NULL,

    [NUMBER_BETS] INT NULL,

    [VOLUME_MATCHED] decimal(19,4) NULL,

    [LATEST_TAKEN] VARCHAR(255) NULL,

    [FIRST_TAKEN] VARCHAR(255) NULL,

    [WIN_FLAG] INT NULL,

    [IN_PLAY] VARCHAR(255) NULL)

    BULK INSERT tblexample FROM 'c:\data\abc.csv'

    WITH (

    DATAFILETYPE = 'char',

    FIELDTERMINATOR = ',',

    ROWTERMINATOR = '\n',

    FIRSTROW = 2

    )

    now from thhere, if you have thousands of files, if you have xp_cmdShell enabled, you could use that and a cursort to get the list of all files forma given directly, and then loop thru each file with sime dynamic SQL.

    I've adapted an old example (this part is untested!) to do exactly that ; use this as a model:

    --a table to loop thru filenames drop table ALLFILENAMES

    CREATE TABLE ALLFILENAMES(WHICHPATH VARCHAR(255),WHICHFILE varchar(255))

    --the source table: yours already exists, but needed for this example.

    CREATE TABLE BULKACT(RAWDATA VARCHAR (8000))

    --some variables

    declare @filename varchar(255),

    @path varchar(255),

    @sql varchar(8000),

    @cmd varchar(1000)

    --get the list of files to process:

    --#########################################

    SET @path = 'C:\DB\'

    SET @cmd = 'dir ' + @path + '*.csv /b'

    INSERT INTO ALLFILENAMES(WHICHFILE)

    EXEC Master..xp_cmdShell @cmd

    UPDATE ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null

    --#########################################

    --cursor loop

    declare c1 cursor for SELECT WHICHPATH,WHICHFILE FROM ALLFILENAMES where WHICHFILE like '%.txt%'

    open c1

    fetch next from c1 into @path,@filename

    While @@fetch_status <> -1

    begin

    --bulk insert won't take a variable name, so make a sql and execute it instead:

    set @sql = 'BULK INSERT TBLEXAMPLE FROM ''' + @path + @filename + ''' '

    WITH (

    DATAFILETYPE = 'char',

    FIELDTERMINATOR = ',',

    ROWTERMINATOR = '\n',

    FIRSTROW = 2

    ) '

    print @sql

    exec (@sql)

    fetch next from c1 into @path,@filename

    end

    close c1

    deallocate c1

    i am going to copy this down for sure.

    Never stop learning or you will be left behind.
  • quoted wrong post..

    Still cannot get Bulk insert working, this is what I am now ending up with, following the above text to the letter:

    Msg 4866, Level 16, State 8, Line 1

    The bulk load failed. The column is too long in the data file for row 1, column 18. Verify that the field terminator and row terminator are specified correctly.

    Msg 7301, Level 16, State 2, Line 1

    Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".

  • your table has to exactly match the file you bulk insert...it cannot have more or less columns than the file you import;

    i mentioned i had to add two columns, and posted the definition of that table...can you double check your destination table?

    could it be that a different file has a different definition than the abc.csv example you provided,a nd has more or less columns?

    could it be that the file ends lines UNIX style, with \r(Return) and not \n(Carriage Return Line Feed)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks for the reply.

    I have just done a test and extracted the first 100,000 records from the file that I am trying to bulk insert. Works fine:-D Indicates something lower down in the file is causing the problem, no idea what this can be. The same problem does not happen using my old method of OPENROWSET with the following syntax:-

    SELECT * INTO table1 FROM" & _

    " OPENROWSET ('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=F:\r\;', 'SELECT * from " + mfile + "')"

    Strange problem, so the problem is not getting bulk insert to work, but getting it to work with my data!

  • Ive definitely seen that error when my files were -slash-r terminiated and my script expected -slash-n;

    especially since your error said it happened on line1 /column 18 of what we think is an 18 column table(and the last columns is varchar(255), expecting 2 chars....

    do some of your files come from different sources?

    I've used .net to change file formats to make sure they match my PC-style slash-n that i expect.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hmmm, that's a point. Maybe when I go into excel to extract the records as per the abc.csv for this thread and my most recent bigger sample of 100,000 rows, and save the extracted rows as .csv it is correcting the end of line terminator. The raw data file perhaps has an inconsistent or non standard eol character sequence.

    It would explain why the small test samples extracted using excel work and the big raw data file doesn't.

  • @dji I sense a little frustration in your posts 🙂 It seems like you're "in it" and will be "in it" for a little while longer so I wanted to try and clear some of the air for you in hopes it will help you get around the issues you're facing and prevent you from running towards MySQL which is a hot mess. If you're going to leave the SQL Server platform at least go to Postgres or one of the other commercial vendors.

    MSDASQL is the OLE DB adapter for ODBC and as you've learned it is no longer being actively developed or supported by Microsoft. It was actually left out of SSIS 2005 but no proper replacement was offered, i.e. no native options for using an ODBC data source in SSIS 2005 (I thought it sucked they left that out of an ETL tool). However they made good on it in SSIS 2008 by adding ADO.NET data sources that can reference ODBC connections. MSDASQL is slooooow and is old technology. Even if it were still included in the product I would not recommend using it.

    dji (6/23/2011)


    The codeproject solution relies on ODBC, which is part of the problem, MS decided to stop support for ODBC in 64 bit SQL. This is the reason OPENROWSET from a flat file no longer works.

    What you said is not 100% accurate. SQL Server ODBC drivers are alive and well in 64-bit Windows, but the way you can leverage those drivers has changed as told above. It looks as though Lowell gave you a working solution using BULK INSERT and there are no ODBC worries there.

    Re: CSV files...hopefully you are not hitting this issue, but there are no published standards surrounding what a CSV file really is, so each vendor has their own interpretation of what one is supposed to be. And, wouldn't you know it, Microsoft's interpretation does not match a lot of other vendors, nor is it even consistent across all their own products. For instance SSIS's idea of a CSV is much different from what DTS's was, which is different from what Excel and BULK INSERT can do with a so-called CSV file. This article is the closest thing you'll find to a CSV standard and TMK MS does not fully support this definition in any of their products. It's a shame really because CSV files are extremely useful. Where MS usually diverges from the RFC is with respect to embedded delimiters and text-qualifiers. Have you looked at the data rows that are failing when running BULK INSERT? I hope you're not running into the issue I mentioned because there is no good workaround other than to pre-process the file into the "CSV" format BULK INSERT likes. Luckily you can easily instruct BULK INSERT to pipe all failed rows to an error file and continue importing data from the file as if nothing bad happened. Look at the max_errors and error_file parameters http://msdn.microsoft.com/en-us/library/ms188365.aspx.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for the post, opc.three.

    In a nutshell, win64 and sql 2008r2 was a baptism of fire. At fairly short notice I was required to work at my (sort of) bosses house with a lot of db stuff I had been developing without a hitch on win xp SQL2005. I bought a decent laptop which all seem to come with win7 64bit these days, I loaded all the software I needed. Loaded a backup of my sql db's without a hitch and thought all was well. Then the doo daa hit the fan when I was out on site, I suddenly found that OPENROWSET was not supported. Try explaining that to a non tech person when all they want is an update to the data!! Furthermore, support for the old foxpro driver I was using to access other files was also removed, these issues were the source of my frustration. Ultimately I had to use a workaround using MYSQL which seemed so easy to use with flat files compared to MSSQL.

    Anyway, I have digressed massively. I am a MS groupie at heart and will solve this problem properly but was on the verge of going to the other side.

    For anyone else planning an upgrade from legacy methods of importing csv files, please do give some thought to an issue you might have otherwise overlooked.

    The source of data I use is from a commercial organisation who provide the data 'as is' as an incidental to their service with a 'take it or leave it' attitude. Dates come in any manner of formats, hence needing to import as varchar and then converting with a function I wrote.

  • I would be interested in seeing the error that you get when importing the csv file using the import wizard.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The execution was successful

    - Initializing Data Flow Task (Success)

    - Initializing Connections (Success)

    - Setting SQL Command (Success)

    - Setting Source Connection (Success)

    - Setting Destination Connection (Success)

    - Validating (Success)

    - Prepare for Execute (Success)

    - Pre-execute (Success)

    Messages

    Information 0x402090dc: Data Flow Task: The processing of file "F:\BF_Data_unzipped\Horses\NewStyle\bfinf_horse_080707to080713_080716083026.csv" has started.

    (SQL Server Import and Export Wizard)

    - Executing (Success)

    - Copying to [BetfairData_Sports].[dbo].[aaa] (Success)

    454818 rows transferred

    Messages

    Information 0x402090de: Data Flow Task: The total number of data rows processed for file "F:\BF_Data_unzipped\Horses\NewStyle\bfinf_horse_080707to080713_080716083026.csv" is 454819.

    (SQL Server Import and Export Wizard)

    Information 0x402090df: Data Flow Task: The final commit for the data insertion has started.

    (SQL Server Import and Export Wizard)

    Information 0x402090e0: Data Flow Task: The final commit for the data insertion has ended.

    (SQL Server Import and Export Wizard)

    - Post-execute (Success)

    Messages

    Information 0x402090dd: Data Flow Task: The processing of file "F:\BF_Data_unzipped\Horses\NewStyle\bfinf_horse_080707to080713_080716083026.csv" has ended.

    (SQL Server Import and Export Wizard)

    - Cleanup (Success)

    Messages

    Information 0x4004300b: Data Flow Task: "component "Destination - aaa" (64)" wrote 454818 rows.

    (SQL Server Import and Export Wizard)

    No problems at all, the BULKINSERT does seem very fussy about what it will import. Using Load Data Infile on MYSQL runs without a hitch, just BULKINSERT is not happy with the data.

    It is sooooo much quicker using the import wizard than using the openrowset method on sql2005. It is time to get stuck into the SSIS programming, the example provided earlier in this thread looks just the ticket.

  • dji (6/25/2011)


    The execution was successful

    Sounds like your definition does not match or something?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • You may want to try and narrow the size of your csv file that you are working with to help identify the problem.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I think it odd that people say "Bulk Insert is too picky". It checks for errors in the data and, when it finds an anomoly, it creates an error. It prevents you from loading garbage data which is precisely what every good bit of ETL should do.

    If you check in Books Online, you'll see that you can allow it to skip a certain number of errors (or, all of them, like I do) and use certain "switches" in the command to sequester just the bad rows... all in one nice, tight, little command. Or... you can learn SSIS. 😉

    I'll also agree that MS has made it way too difficult to do any ETL in T-SQL or even in SSIS. Try importing CSV with quoted identifiers placed only on elements that have an embedded comma.

    If you're fortunate to have "True" CSV data where all textual data is quoted, you can make a BCP format file (it's like mapping in SSIS) to accomodate that but it's not documented how to do that. It's also a PITA that the header must have precisely the same delimiters or you can't correctly skip the header using BCP or BULK INSERT.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I know I am raking up an old thread by replying, it is entirely deliberate for a couple of reasons.

    1) A collective thank you to all those that helped, especially you guys suggesting I invested some time in SSIS development. It is truly fantastic.

    2) Anyone looking for solutions to the problem of importing CSV files into SQL, stop looking for band aid solutions, and get stuck into SSIS, within a short space of time all your prayers will be answered! You will wonder how you ever managed without it!

    Setting up a foreachloop and importing dozens of flat files of a million plus records each will absolutely fly making old OPENROWSET solutions look totally prehistoric. Additionally the levels of erorr checking / ignore available makes life so much simpler overall.

    thanks again

Viewing 15 posts - 16 through 30 (of 43 total)

You must be logged in to reply to this topic. Login to reply