Importing and Parsing CSV file into SQL server 2005

  • Hi,

    I am new to this forum and new to databases, in particular to SQL 2005. I have a task of parsing a csv file (which is generated by IOmeter-benchmarking software for Storage devices) and storing it in the database (SQl 2005).

    The csv file looks like the one below..

    ********************************************************************

    'Test Type,Test Description

    6,

    'Version

    2003.12.16

    'Time Stamp

    2006-01-04 16:05:49:406

    'Access specifications

    'Access specification name,default assignment

    512B Seq Read,1

    'size,% of size,% reads,% random,delay,burst,align,reply

    512,100,100,0,0,1,0,0

    'End access specifications

    'Results

    'Target Type,Target Name,Access Specification Name,# Managers,# Workers,# Disks,IOps,Read IOps,Write IOps,MBps,Read MBps,Write MBps,Transactions per Second,Connections per Second,Average Response Time,Average Read Response Time,Average Write Response Time,Average Transaction Time,Average Connection Time,Maximum Response Time,Maximum Read Response Time,Maximum Write Response Time,Maximum Transaction Time,Maximum Connection Time,Errors,Read Errors,Write Errors,Bytes Read,Bytes Written,Read I/Os,Write I/Os,Connections,Transactions per Connection,Total Raw Read Response Time,Total Raw Write Response Time,Total Raw Transaction Time,Total Raw Connection Time,Maximum Raw Read Response Time,Maximum Raw Write Response Time,Maximum Raw Transaction Time,Maximum Raw Connection Time,Total Raw Run Time,Starting Sector,Maximum Size,Queue Depth,% CPU Utilization,% User Time,% Privileged Time,% DPC Time,% Interrupt Time,Processor Speed,Interrupts per Second,CPU Effectiveness,Packets/Second,Packet Errors,Segments Retransmitted/Second

    ALL,All,512B Seq Read,1,4,4,51416.472139,51416.472139,0.000000,25.105699,25.105699,0.000000,51416.472139,0.000000,0.076987,0.076987,0.000000,0.076987,0.000000,1.816365,1.816365,0.000000,1.816365,0.000000,0,0,0,1581445120,0,3088760,0,0,-1,,,,,,,,,,0,0,1,60.934827,5.790807,55.136289,5.530546,14.639680,,26015.437869,843.794509,0.066627,0.000000,0.000000

    MANAGER,INTEL-STORAGE-1,512B Seq Read,,4,4,51416.472139,51416.472139,0.000000,25.105699,25.105699,0.000000,51416.472139,0.000000,0.076987,0.076987,0.000000,0.076987,0.000000,1.816365,1.816365,0.000000,1.816365,0.000000,0,0,0,1581445120,0,3088760,0,0,-1,853683156693,0,853683156693,0,6520752,0,6520752,0,215663649507,0,0,1,60.934827,5.790807,55.136289,5.530546,14.639680,3590000000.000000,26015.437869,843.794509,0.066627,0.000000,0.000000

    PROCESSOR,CPU 0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,52.710580,5.699715,47.003133,0.000000,8.614638,3590000000.000000,63.995052,,,,

    PROCESSOR,CPU 1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,69.159074,5.881898,63.269445,11.061092,20.664722,3590000000.000000,25951.442816,,,,

    WORKER,Worker 1,512B Seq Read,,,1,12818.693258,12818.693258,0.000000,6.259128,6.259128,0.000000,12818.693258,0.000000,0.077183,0.077183,0.000000,0.077183,0.000000,1.749289,1.749289,0.000000,1.749289,0.000000,0,0,0,394272256,0,770063,0,0,-1,213374752677,0,213374752677,0,6279948,0,6279948,0,215663649507,0,0,1,,,,,,3590000000.000000,,,,,

    DISK,PHYSICALDRIVE:1,,,,,12818.693258,12818.693258,0.000000,6.259128,6.259128,0.000000,12818.693258,0.000000,0.077183,0.077183,0.000000,0.077183,0.000000,1.749289,1.749289,0.000000,1.749289,0.000000,0,0,0,394272256,0,770063,0,0,-1,213374752677,0,213374752677,0,6279948,0,6279948,0,215663649507,0,0,1,,,,,,3590000000.000000,,,,,

    WORKER,Worker 3,512B Seq Read,,,1,12889.226158,12889.226158,0.000000,6.293567,6.293567,0.000000,12889.226158,0.000000,0.076796,0.076796,0.000000,0.076796,0.000000,1.783635,1.783635,0.000000,1.783635,0.000000,0,0,0,396441088,0,774299,0,0,-1,213471089694,0,213471089694,0,6403248,0,6403248,0,215663328108,0,0,1,,,,,,3590000000.000000,,,,,

    DISK,PHYSICALDRIVE:3,,,,,12889.226158,12889.226158,0.000000,6.293567,6.293567,0.000000,12889.226158,0.000000,0.076796,0.076796,0.000000,0.076796,0.000000,1.783635,1.783635,0.000000,1.783635,0.000000,0,0,0,396441088,0,774299,0,0,-1,213471089694,0,213471089694,0,6403248,0,6403248,0,215663328108,0,0,1,,,,,,3590000000.000000,,,,,

    WORKER,Worker 5,512B Seq Read,,,1,12818.706429,12818.706429,0.000000,6.259134,6.259134,0.000000,12818.706429,0.000000,0.077162,0.077162,0.000000,0.077162,0.000000,1.816365,1.816365,0.000000,1.816365,0.000000,0,0,0,394271744,0,770062,0,0,-1,213315239241,0,213315239241,0,6520752,0,6520752,0,215663147865,0,0,1,,,,,,3590000000.000000,,,,,

    DISK,PHYSICALDRIVE:5,,,,,12818.706429,12818.706429,0.000000,6.259134,6.259134,0.000000,12818.706429,0.000000,0.077162,0.077162,0.000000,0.077162,0.000000,1.816365,1.816365,0.000000,1.816365,0.000000,0,0,0,394271744,0,770062,0,0,-1,213315239241,0,213315239241,0,6520752,0,6520752,0,215663147865,0,0,1,,,,,,3590000000.000000,,,,,

    WORKER,Worker 7,512B Seq Read,,,1,12889.846294,12889.846294,0.000000,6.293870,6.293870,0.000000,12889.846294,0.000000,0.076810,0.076810,0.000000,0.076810,0.000000,1.796310,1.796310,0.000000,1.796310,0.000000,0,0,0,396460032,0,774336,0,0,-1,213522075081,0,213522075081,0,6448752,0,6448752,0,215663257467,0,0,1,,,,,,3590000000.000000,,,,,

    DISK,PHYSICALDRIVE:7,,,,,12889.846294,12889.846294,0.000000,6.293870,6.293870,0.000000,12889.846294,0.000000,0.076810,0.076810,0.000000,0.076810,0.000000,1.796310,1.796310,0.000000,1.796310,0.000000,0,0,0,396460032,0,774336,0,0,-1,213522075081,0,213522075081,0,6448752,0,6448752,0,215663257467,0,0,1,,,,,,3590000000.000000,,,,,

    'Time Stamp

    2006-01-04 16:06:59:515

    'Results

    'Target Type,Target Name,Access Specification Name,# Managers,# Workers,# Disks,IOps,Read IOps,Write IOps,MBps,Read MBps,Write MBps,Transactions per Second,Connections per Second,Average Response Time,Average Read Response Time,Average Write Response Time,Average Transaction Time,Average Connection Time,Maximum Response Time,Maximum Read Response Time,Maximum Write Response Time,Maximum Transaction Time,Maximum Connection Time,Errors,Read Errors,Write Errors,Bytes Read,Bytes Written,Read I/Os,Write I/Os,Connections,Transactions per Connection,Total Raw Read Response Time,Total Raw Write Response Time,Total Raw Transaction Time,Total Raw Connection Time,Maximum Raw Read Response Time,Maximum Raw Write Response Time,Maximum Raw Transaction Time,Maximum Raw Connection Time,Total Raw Run Time,Starting Sector,Maximum Size,Queue Depth,% CPU Utilization,% User Time,% Privileged Time,% DPC Time,% Interrupt Time,Processor Speed,Interrupts per Second,CPU Effectiveness,Packets/Second,Packet Errors,Segments Retransmitted/Second

    ALL,All,512B Seq Read,1,4,4,67399.440043,67399.440043,0.000000,32.909883,32.909883,0.000000,67399.440043,0.000000,0.117820,0.117820,0.000000,0.117820,0.000000,2.656720,2.656720,0.000000,2.656720,0.000000,0,0,0,2072611840,0,4048070,0,0,-1,,,,,,,,,,0,0,2,77.351835,6.833498,70.508683,6.364915,17.077237,,33725.676811,871.336017,0.166607,0.000000,0.000000

    MANAGER,INTEL-STORAGE-1,512B Seq Read,,4,4,67399.440043,67399.440043,0.000000,32.909883,32.909883,0.000000,67399.440043,0.000000,0.117820,0.117820,0.000000,0.117820,0.000000,2.656720,2.656720,0.000000,2.656720,0.000000,0,0,0,2072611840,0,4048070,0,0,-1,1712229501681,0,1712229501681,0,9537624,0,9537624,0,215619470145,0,0,2,77.351835,6.833498,70.508683,6.364915,17.077237,3590000000.000000,33725.676811,871.336017,0.166607,0.000000,0.000000

    PROCESSOR,CPU 0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,68.500828,7.341129,61.150045,0.000000,8.746877,3590000000.000000,63.993821,,,,

    PROCESSOR,CPU 1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,86.202842,6.325867,79.867321,12.729831,25.407596,3590000000.000000,33661.682990,,,,

    WORKER,Worker 1,512B Seq Read,,,1,16858.682371,16858.682371,0.000000,8.231779,8.231779,0.000000,16858.682371,0.000000,0.117749,0.117749,0.000000,0.117749,0.000000,2.261254,2.261254,0.000000,2.261254,0.000000,0,0,0,518422528,0,1012544,0,0,-1,428021859357,0,428021859357,0,8117901,0,8117901,0,215617856724,0,0,2,,,,,,3590000000.000000,,,,,

    DISK,PHYSICALDRIVE:1,,,,,16858.682371,16858.682371,0.000000,8.231779,8.231779,0.000000,16858.682371,0.000000,0.117749,0.117749,0.000000,0.117749,0.000000,2.261254,2.261254,0.000000,2.261254,0.000000,0,0,0,518422528,0,1012544,0,0,-1,428021859357,0,428021859357,0,8117901,0,8117901,0,215617856724,0,0,2,,,,,,3590000000.000000,,,,,

    WORKER,Worker 3,512B Seq Read,,,1,16873.491005,16873.491005,0.000000,8.239009,8.239009,0.000000,16873.491005,0.000000,0.117633,0.117633,0.000000,0.117633,0.000000,2.272540,2.272540,0.000000,2.272540,0.000000,0,0,0,518881792,0,1013441,0,0,-1,427979645289,0,427979645289,0,8158419,0,8158419,0,215619470145,0,0,2,,,,,,3590000000.000000,,,,,

    DISK,PHYSICALDRIVE:3,,,,,16873.491005,16873.491005,0.000000,8.239009,8.239009,0.000000,16873.491005,0.000000,0.117633,0.117633,0.000000,0.117633,0.000000,2.272540,2.272540,0.000000,2.272540,0.000000,0,0,0,518881792,0,1013441,0,0,-1,427979645289,0,427979645289,0,8158419,0,8158419,0,215619470145,0,0,2,,,,,,3590000000.000000,,,,,

    WORKER,Worker 5,512B Seq Read,,,1,16818.105449,16818.105449,0.000000,8.211966,8.211966,0.000000,16818.105449,0.000000,0.118050,0.118050,0.000000,0.118050,0.000000,2.656720,2.656720,0.000000,2.656720,0.000000,0,0,0,517175808,0,1010109,0,0,-1,428082678423,0,428082678423,0,9537624,0,9537624,0,215618300226,0,0,2,,,,,,3590000000.000000,,,,,

    DISK,PHYSICALDRIVE:5,,,,,16818.105449,16818.105449,0.000000,8.211966,8.211966,0.000000,16818.105449,0.000000,0.118050,0.118050,0.000000,0.118050,0.000000,2.656720,2.656720,0.000000,2.656720,0.000000,0,0,0,517175808,0,1010109,0,0,-1,428082678423,0,428082678423,0,9537624,0,9537624,0,215618300226,0,0,2,,,,,,3590000000.000000,,,,,

    WORKER,Worker 7,512B Seq Read,,,1,16849.161218,16849.161218,0.000000,8.227130,8.227130,0.000000,16849.161218,0.000000,0.117849,0.117849,0.000000,0.117849,0.000000,2.417285,2.417285,0.000000,2.417285,0.000000,0,0,0,518131712,0,1011976,0,0,-1,428145318612,0,428145318612,0,8678052,0,8678052,0,215618676381,0,0,2,,,,,,3590000000.000000,,,,,

    DISK,PHYSICALDRIVE:7,,,,,16849.161218,16849.161218,0.000000,8.227130,8.227130,0.000000,16849.161218,0.000000,0.117849,0.117849,0.000000,0.117849,0.000000,2.417285,2.417285,0.000000,2.417285,0.000000,0,0,0,518131712,0,1011976,0,0,-1,428145318612,0,428145318612,0,8678052,0,8678052,0,215618676381,0,0,2,,,,,,3590000000.000000,,,,,

    'Time Stamp

    2006-01-04 16:08:09:593

    *******************************************************************

    I am sorry for pasting such a huge chunk of the file.  I have tried various methods of importing this CSV file into the server, but none seemed to work. I tried the bulk insert, but that cannot skip the comments in the file and gives me an error. I tried to import the CSV directly using the Import-export Wizard, by using the Flat file data source. But that cannot see that the CSV file has 56 columns, it comes up with only 2 columned table for the CSV. I even tried changing the column and row delimiter, but was in vain. I tried saving the csv file as txt file and then imported the file into the server database, which kind of worked but that is not I want to do, I should be able to take in CSV directly and should be some how able to delete all the unwanted rows like the repeating Column headers, comments (like Time stamp etc...)..

    Please help me if anyone have an answer to my question. Thanks much.

    Rabab

     

     

  • Basically you either need to format the data better before you try to import it or import it all and play with it afterwards.

    I'd go with the first option - remove all the comments and other unwanted data and makes usre everything else is seperated by a common marker - I think the comma is appropriate here.

  • Looks like something I would tackle with SSIS.

    You could use a conditional split transform to explicitly filter out the non-data rows, or just set up the error action on a text data source to ignore rows that don't have 56 columns.

  • I have a similar situation and avoid using DTS or SSIS by first BULK INSERTing the text file into a temp table with one very wide column. Then I SELECT the rows I want to keep, parsing the data into columns at the same time.

    Parsing the columns from rows becomes the biggest problem. For that I found a udf (from this same newsgroup) that can pick items out of a string by column number. You just have to specify what the delimiter is.

    Otherwise, SSIS can handle it.

  • The first thing you will probably want to do is format the file as just data. For example, strip off the

    'Test Type,Test Description
    6,
    'Version
    2003.12.16
    'Time Stamp
    2006-01-04 16:05:49:406
    'Access specifications
    'Access specification name,default assignment
    512B Seq Read,1
    'size,% of size,% reads,% random,delay,burst,align,reply
    512,100,100,0,0,1,0,0
    'End access specifications
    'Results
    

    And break the file into several smaller files. Assuming you are trying to make 1 table per timestamp. Use something like file1.csv, file2.csv, etc. as the driver to read the files doesn't seem to like complex name for the file.

    Then use an OpenDataSource command

    select 
      * 
    into
      [tbl2006-01-0416:05:49:406]
    from 
      OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=C:\PathToTextFile\;','select * from file1.csv')
    
  • Thank you all for the suggestions. I will try what's suggested.

     

    Thanks.

  • If I had to pick a Plan B, I'd probably pull the file into TextPad and use regular expressions to transform it into something more SQL-friendly.  You could get it to delete all the unwanted lines, and duplicate the timestamp and access specs fields in the results rows.  Then you'd have a flat file that could be easily imported.

    Reading it as one column and processing it in T-SQL would work (I've done it myself), but it wouldn't get higher then #3 or 4 on my list.

    When you say you've got a text file with a mixture of record types to import, it really suggests SSIS and conditional-split transform to me.  I don't know if you're familiar with SSIS, but it would be far more effective than any of the pre- or post-processing alternatives suggested.

  • Thank you all again for the suggestions.

    I experimented with Bulk Insert command and it worked for me. I import the entire CSV into a table with all comments and I have a trigger defined on this table(for insert) which sorts the data and inserts into a second table. So when the Bulk insert command is executed 2 tables are inserted at the same time. (I use the option FIRE_TRIGGERS in the Bulk insert command to fire the tirgger defined on the inserted table).

     

    Thanks much

  • As convoluted as this may sound, it worked for me with a 24000 line CSV file:

    I opened SQL2000 and used the DTS import there to bring the CSV file into a new table in a test database.

    Then I opened SQL2005 and imported the resultant SQL2000 table into the SQL2005 database using the SQL Server Import and Export Wizard found by right-clicking on the database and choosing ALL TASKS.

    NO ERRORS!

    But honestly, Microsoft should have this problem fixed by now... how hard should importing a CSV file into a SQL2005 database be when the OLDER version (SQL2000) did it just fine!??

  • Have had similar problems in the past and have found it quicker to re-export the desired rows and re-import them using the parsing of Bulk Insert or BCP.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply