Improving data import performance - More RAM ?

  • Jeff Moden (5/7/2009)


    Jeff Moden (5/6/2009)


    ifila (5/6/2009)


    I still get nulls in the Account Table 🙁

    Thanks

    Are all of the files you're trying to import in the exact same format and field order?

    I can help, but I still need to know the Yes/No answer to the above.

    Third time might be the charm... Yes or No, please.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • NO.....they are resumes and people vary the location of content.

    Thanks

  • It seems to count the number of text files correctly but for some reason only loads a much smaller amount ?

    (300 row(s) affected)

    (1 row(s) affected)

    (299 row(s) affected)

    (101 row(s) affected)

  • ifila (5/7/2009)


    It seems to count the number of text files correctly but for some reason only loads a much smaller amount ?

    (300 row(s) affected)

    (1 row(s) affected)

    (299 row(s) affected)

    (101 row(s) affected)

    Sorry, I am not following you.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I am trying to load 299 resumes and it only loads 101. Why ?

    Thanks

  • Since you haven't told us where this output is coming from or how each lines is associated with your routine, I am not sure that I can say.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • OK, here's your problem. These first and last lines here:

    ...

    select identity(int,1,1) as '@ID', resume into #y from #x

    drop table #x

    set @max1 = (select max(@ID) from #y)

    ...

    Are problematic in that you are naming a column [@ID], which is also the name of a variable in your procedure. Thus when you references it in the last line above, without the brackets ("[@ID]") it is using your variable's value instead. of finding the column value.

    I suggest that you rename this column.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Almost forgot: the "101 rows" have nothing to do with the "299 rows" earlier. The second (299) is the number of files that you want to read in. The first (101), however, is just the number of lines in the first file. Your loop is exiting early because @max1 is not set correctly.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I added a new variable and still get the same issue ?

    What am i missing?

    Thanks

    alter procedure [dbo].[usp_ImportMultipleFiles] @filepath varchar(max),

    @pattern varchar(max), @TableName varchar(max)

    as

    set quoted_identifier off

    declare @ID int

    declare @ID2 int

    declare @query varchar (max)

    declare @max1 int

    declare @count1 int

    Declare @filename varchar(max)

    set @count1 =0

    set @ID =0

    create table #x (resume varchar(max))

    set @query ='master.dbo.xp_cmdshell ''dir '+@filepath+@pattern +' /b'''

    insert #x exec (@query)

    delete from #x where resume is NULL

    select identity(int,1,1) as '@ID2', resume into #y from #x

    drop table #x

    set @max1 = (select max(@ID) from #y)

    --print @max1

    --print @count1

    While @count1 <= @max1

    begin

    set @count1=@count1+1

    set @filename = (select resume from #y where [@id2] = @count1)

    set @Query ='BULK INSERT '+ @Tablename + ' FROM '''+ @Filepath+@Filename+'''

    WITH ( FIELDTERMINATOR = '','',ROWTERMINATOR = '''')'

    --print @query

    exec (@query)

    --insert into logtable (query) select @query

    end

    drop table #y

  • No, you're still confusing variable names and column names, only more so now. Try it like this:

    alter procedure [dbo].[usp_ImportMultipleFiles]

    @filepath varchar(max),

    @pattern varchar(max),

    @TableName varchar(max)

    as

    set quoted_identifier off

    declare @query varchar (max)

    declare @max1 int

    declare @count1 int

    Declare @filename varchar(max)

    set @count1 =0

    set @query ='master.dbo.xp_cmdshell ''dir '+@filepath+@pattern +' /b'''

    create table #x (resume varchar(max))

    insert #x exec (@query)

    delete from #x where resume is NULL

    select identity(int,1,1) as [ID], resume into #y from #x

    drop table #x

    set @max1 = (select max(ID) from #y)

    --print @max1

    --print @count1

    While @count1 <= @max1

    begin

    set @count1=@count1+1

    set @filename = (select resume from #y where [ID] = @count1)

    set @Query ='BULK INSERT '+ @Tablename + ' FROM '''+ @Filepath+@Filename+'''

    WITH ( FIELDTERMINATOR = '','',ROWTERMINATOR = '''')'

    --print @query

    exec (@query)

    --insert into logtable (query) select @query

    end

    drop table #y

    (EDIT: fixed a code bug)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Worked like a charm.

    Many Thanks !

    Loaded 14586 text files of between 30k and 35k in 2 min 55 seconds.

    I owe you some beers 😀

  • Heh, "was that so hard?" 🙂

    Let's see, originally you were loading 4000 files in 90 minutes, so that's about 45 files/minute. Now your loading almost 5000 files a minute, 100x is not bad.

    Glad I could help.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • The real question is, do you really need to import 4k of files every day, because that's pretty impressive customer flow :hehe:

  • ifila (5/7/2009)


    NO.....they are resumes and people vary the location of content.

    Thanks

    BWAA-HAAA! According to what I see in Barry's fine code, they were, in fact, all the same "format".

    Glad you and Barry got this going because BCP/Bulk Insert was definitely the right way to go.

    Hey, Barry... this would be a good one to demo where it can also be done without a cursor. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/11/2009)


    ifila (5/7/2009)


    NO.....they are resumes and people vary the location of content.

    Thanks

    BWAA-HAAA! According to what I see in Barry's fine code, they were, in fact, all the same "format".

    Glad you and Barry got this going because BCP/Bulk Insert was definitely the right way to go.

    Hey, Barry... this would be a good one to demo where it can also be done without a cursor. 🙂

    Heh, you're killing me here, Jeff. I never actually removed the loop. Yeah, I know, that's a first for me, but I was just trying to get all of the syntax and semantics fixed. And the thing that makes it fast (the BULK INSERT) was added by the OP, not me. I was just a debugger on this one.. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 31 through 45 (of 48 total)

You must be logged in to reply to this topic. Login to reply