Tune Stored Proc

  • I am using this proc for inserting huge data(approx 150million) it takes around 8hrs. Need suggestions to improve this.

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER PROCEDURE [dbo].[InsertBudgetTotals_Med]

    -- Add the parameters for the stored procedure here

    @RevState as varchar(2),

    @StageDB as varchar(25),

    @RevStateDB as varchar(25),

    @label as varchar(50),

    @RevStateyr as varchar(4)

    AS

    BEGIN

    /*Document when the insert into production began*/

    declare @query as varchar(4000)

    declare @sql as varchar(4000)

    set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks

    select ''' + @RevState + ''', ''Start Budget insert into ' + @RevState + ''' as BookMark,

    getdate() as ExecutionTime, ''' + @label + ''''

    exec(@query)

    --Insert into production & capture control totals

    --by looping through all years

    while @RevStateyr < '1991'

    Begin

    set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +

    ' (Col1,Col2,Col3…..Col48)

    select Col1,Col2,Col3…..Col48from ' + @StageDB + '.dbo.Budget' + @RevStateyr +

    ' where RevStateID = ''' + @RevState + ''''

    exec (@sql)

    set @RevStateyr = @RevStateyr + 1

    End

    set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks

    select ''' + @RevState + ''', ''End Budget insert into ' + @RevState + ''' as BookMark,

    getdate() as ExecutionTime, ''' + @label + ''''

    exec(@query)

    set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks

    select ''' + @RevState + ''', ''Start Control Totals for ' + @RevState + ''' as BookMark,

    getdate() as ExecutionTime, ''' + @label + ''''

    exec(@query)

    set @sql = 'insert into OBBEAVER_Benchmarks.dbo.ControlTotals

    select ''' + @RevState + ''', ''Professional'', sum(ControlTotals) from (

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1994 where srcflag = ''p''

    union all

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''p''

    union all

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1996 where srcflag = ''p''

    union all

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1997 where srcflag = ''p''

    union all

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1998 where srcflag = ''p'') as cts'

    exec (@sql)

    set @sql = 'insert into OBBEAVER_Benchmarks.dbo.ControlTotals

    select ''' + @RevState + ''', ''Facility'', sum(ControlTotals) from (

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1994 where srcflag = ''F''

    union all

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''F''

    union all

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1996 where srcflag = ''F''

    union all

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1997 where srcflag = ''F''

    union all

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1998 where srcflag = ''F'') as cts'

    exec (@sql)

    set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks

    select ''' + @RevState + ''', ''End Control Totals for ' + @RevState + ''' as BookMark,

    getdate() as ExecutionTime, ''' + @label + ''''

    exec(@query)

    END

  • Extremely tough question to answer for anyone no the board, but right off the bat one thing I know I'd do is split it up so I can run several of the same process in multiple connections. Maybe do each as a year, for instance?

  • We've found SSIS very good for managing multi-stream loads. It takes a while to build but manages it very nicely. If it's not something that needs to be repeated partition the query approximately evenly manually and run each stream in a separate window.

    If partitioning the stream make sure the partitioning is done on the clustered index, otherwise you'll likely get deadlocks. Also make sure there's no non-clustered indexes on the target table: they'll get very fragmented by the load and it's generally quicker to build them manually after all the data load streams are complete (another reason why SSIS is handy).

  • There's a heck of lot here to chew on, but the first thing that jumped out at me was the this little bit of RBAR (row-by-agonizing-row) processing:

    while @RevStateyr < '1991'

    Begin

    set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +

    ' (Col1,Col2,Col3…..Col48)

    select Col1,Col2,Col3…..Col48 from ' + @StageDB + '.dbo.Budget' + @RevStateyr +

    ' where RevStateID = ''' + @RevState + ''''

    exec (@sql)

    set @RevStateyr = @RevStateyr + 1

    End

    You should look up the phrase "tally table" and probably add the name "Jeff Moden" and you'll find a much, much, much more efficient way to handle this query.

    From the looks of things you're breaking the data up by year into seperate tables. Instead, you might want to look into providing a single table, but partitioning the table based on the year. Then you could eliminate all that ad hoc string building and write some nice clean TSQL code.

    Just doing those two things will speed up your processing quite a lot.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (2/26/2009)


    There's a heck of lot here to chew on, but the first thing that jumped out at me was the this little bit of RBAR (row-by-agonizing-row) processing:

    while @RevStateyr < '1991'

    Begin

    set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +

    ' (Col1,Col2,Col3…..Col48)

    select Col1,Col2,Col3…..Col48 from ' + @StageDB + '.dbo.Budget' + @RevStateyr +

    ' where RevStateID = ''' + @RevState + ''''

    exec (@sql)

    set @RevStateyr = @RevStateyr + 1

    End

    You should look up the phrase "tally table" and probably add the name "Jeff Moden" and you'll find a much, much, much more efficient way to handle this query.

    From the looks of things you're breaking the data up by year into seperate tables. Instead, you might want to look into providing a single table, but partitioning the table based on the year. Then you could eliminate all that ad hoc string building and write some nice clean TSQL code.

    Just doing those two things will speed up your processing quite a lot.

    I don't believe that's RBAR... the op is using a loop to load data by given yearly tables. The While loop really won't take up much processing time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I see a loop and I want to eliminate it, if possible. But you're right, of course.

    I still think partitioning the data would be a better approach rather than having X number of different tables that you have to muck about with in this dynamic process. Queries will be harder too. If you want to do multi-year aggregates you have to add a new UNION each time.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Yes, I am doing partion now in my new stuff apart from that i still want to improve my t-sql to insert records very effeceintly and very fast.

    Thanks

  • Grant Fritchey (2/27/2009)


    I see a loop and I want to eliminate it, if possible. But you're right, of course.

    I still think partitioning the data would be a better approach rather than having X number of different tables that you have to muck about with in this dynamic process. Queries will be harder too. If you want to do multi-year aggregates you have to add a new UNION each time.

    Heh... I'm the same way. I see a loop and I see red... then, I have to go look what they're using it for.

    I agree... a sorted insert into partioned tables would probably do the trick nicely provided they have the correct clustered index on each underlying table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mike, there's a whole lot of things that can make such a large insert slow. For example, a clustered index that's not quite on the right columns for the INSERT can split a lot. Page splits take a lot of extra time. Having a correctly defined clustered index to support the INSERTs can pay off in spades. Also, having too many indexes on a table can be even worse because if an index decides to split, it splits off a whole new extent (8 pages).

    Last, but not least, if the database is in the FULL recovery mode or the Bulk-Logged/Simple modes and the INSERT doesn't qualify as a "minimally logged" operation, then the log file will grow a good amount to hold the INSERT of 150 million rows.

    What I'm saying is that you need to really take a look at things... it may be that you even need to drop some indexes and rebuild them after the INSERT to help speed things up.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Can i do this through SSIS pkg or BULK Insert to spped up my load.??

    Few questions on BULK INSERT

    1. Does a user need to be bulkadmin to do or just ddl_admin would be fine.

    2. Is it used kust for flat file insert or can also be used from sql table to sql table?

  • Mike Levan (3/6/2009)


    Can i do this through SSIS pkg or BULK Insert to spped up my load.??

    Few questions on BULK INSERT

    1. Does a user need to be bulkadmin to do or just ddl_admin would be fine.

    2. Is it used kust for flat file insert or can also be used from sql table to sql table?

    Probably not... all of your tables already exist and probably have data in them.

    To answer your questions... ddl_admin is not sufficient to do Bulk Insert and the only way to use it to go from SQL table to SQL table is via a flat file.

    How many and what are the indexes you have on the target tables?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mike... how long do just one of these take to run?

    select count(*) as ControlTotals

    from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''p''

    That's where I'd start working on tuning this puppy up...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    Probably 1 hr for each step.

    thanks

  • An hour just to do a count(*) of a table?

    Got indexes on the srcflag columns?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • What is the configuration of the server that this is running on?

    How much space does this 150 million rows occupy?

    What is the disk configuration of the source DB's, TargetDB, and tempdb?

    Most importantly, which databases share physical disks for their data files and/or log files?

    What are the relative times for the different sections of the procedure?

    What are the %Idle for the different physical disk used during execution?

    What is the %CPU time of for the CPU's during execution?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply