Improving data import performance - More RAM ?

  • I have a daily batch job that loads about 4000 text files.

    I am currently using a Dell 2950 with dual 1.6 GHZ Quad cores, 64 bit Windows server o/s with 14 GB RAM.

    The job takes about 1.5 hours to load the 4000 text files into the SQL DB.

    What type of performance improvement would i get from adding another 16 GB of RAM ?

    Any other suggestions ?

    Thanks

  • This doesn't have enough details to get a response. I would suggest stating some more information

    What tools do you use to import/how are they imported?

    How many at a time do you import (threads)?

    How big are the text files?

    How big is the database?

    Is your memory usage capped out on the server?

    Are the files located on the server itself you are importing into?

    I would say initially, physical hardware upgrades should be last on the list. Try different loading methods/tools, make sure you database is performing properly (too many indexes, etc), use perf tools to see where your bottlenecks are, use profiler to see which queries take the longest.

  • In addition the composition of the files and the DDL for the tables is needed.


    * Noel

  • (1) I have a program written in VB6 that parses the text files, then i load after the parse process is complete. The parsing takes about

    1 hour per 1k of text files. Then the load takes about 1.5 hours per 4k text files.

    (2) The job imports one at a time until they are all loaded. 4000 per night.

    (3) The text files are between 30k to 35k in size

    (4) The DB is 40 GB

    (5) The DB, application files and text files are located on the server.

    Thanks for your help!

  • ifila (5/4/2009)


    (1) I have a program written in VB6 that parses the text files, then i load after the parse process is complete. The parsing takes about

    1 hour per 1k of text files. Then the load takes about 1.5 hours per 4k text files.

    I take it that this 1k / 4k is # of files, and not size of files?

    (2) The job imports one at a time until they are all loaded. 4000 per night.

    (3) The text files are between 30k to 35k in size

    (4) The DB is 40 GB

    (5) The DB, application files and text files are located on the server.

    Thanks for your help!

    Is this job in T-SQL, SSIS, or something else?

    Are the files all going to the same table?

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Yes that is the number of files.

    Each file SIZE is between 30k and 35k.

    The complete text documents are stored in one table, but the parsed information extracted from the text files, such as: name, address, city, state, phone number etc... is stored in seperate tables.

    I dont have access to the VB source code so i dont know the method used for import. But i will find out.

    Thanks

  • ifila (5/4/2009)


    Yes that is the number of files.

    Each file SIZE is between 30k and 35k.

    The complete text documents are stored in one table, but the parsed information extracted from the text files, such as: name, address, city, state, phone number etc... is stored in seperate tables.

    I dont have access to the VB source code so i dont know the method used for import. But i will find out.

    Thanks

    The problem is very likely the VB source code to begin with. Why can't you use Bulk Insert which will load a 5.1 million line, 20 column wide text file in 60 seconds flat?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I was using the VB application since it made the parsing and loading process very low maintenance. The downside is that as my needs have increased, the performance is lacking.

    I have never used the bulk insert feature, and based on your numbers, seems very fast. I now have to find a parsing solution.

    Thanks

  • ifila (5/5/2009)


    I now have to find a parsing solution.

    Thanks

    The numbers I quoted included parsing. Lookup BULK INSERT in Books Online... you'll see. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks 😉

  • you can use bcp command to process the flat files. this is very fast, for more info see SQL BOL.

  • I am getting a bulk insert error using the following data in a csv file.

    Any ideas ?

    Thanks

    1, MAK, A9411792711, 3400.25

    2, Claire, A9411452711, 24000.33

    3, Sam, A5611792711, 1200.34

    id(int,null)

    name(varchar(100),null)

    accountno(varchar(100),null)

    balance(money,null)

    (3 row(s) affected)

    (1 row(s) affected)

    (2 row(s) affected)

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (ID).

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (ID).

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 3, column 1 (ID).

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (ID).

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (ID).

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 3, column 1 (ID).

  • I am running this query to use the BCP. This is my .txt file.

    1, mak, a9411792711, 3400.25

    2, bob, b9999999999, 2400.99

    The job runs fine and i get no errors but i only have data in the logtable NOT in the Account table. Any ideas. Thanks.

    set quoted_identifier off

    go

    alter procedure usp_ImportMultipleFilesBCP @servername varchar(128),

    @DatabaseName varchar(128), @filepath varchar(500), @pattern varchar(100),

    @TableName varchar(128)

    as

    declare @query varchar(1000)

    declare @max1 int

    declare @count1 int

    Declare @filename varchar(100)

    set @count1 =0

    create table #x (name varchar(200))

    set @query ='master.dbo.xp_cmdshell ''dir '+@filepath+@pattern +' /b'''

    insert #x exec (@query)

    delete from #x where name is NULL

    select identity(int,1,1) as ID, name into #y from #x

    drop table #x

    set @max1 = (select max(ID) from #y)

    --print @max1

    --print @count1

    --select * from #y

    While @count1 <= @max1

    begin

    set @count1=@count1+1

    set @filename = (select name from #y where [id] = @count1)

    set @Query ='bcp "'+ @databasename+'.dbo.'+@Tablename + '"

    in "'+ @Filepath+@Filename+'" -S' + @servername + ' -T -c -r -t,'

    set @Query = 'MASTER.DBO.xp_cmdshell '+ "'"+ @query +"'"

    --set @Query = 'MASTER.DBO.xp_cmdshell '+ '''''+ @query +'''''

    --print query

    EXEC ( @query)

    insert into logtable (query) select @query

    end

    drop table #y

  • You're getting errors probably, but you are only logging the Command, not its output. Change the last few lines to this:

    ...

    insert into logtable (query) select @query

    insert into logtable (query) EXEC ( @query)

    end

    drop table #y

    [/quote]

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I still get nulls in the Account Table 🙁

    Thanks

Viewing 15 posts - 1 through 15 (of 48 total)

You must be logged in to reply to this topic. Login to reply