Tune Stored Proc

Question

Tune Stored Proc

Mike Levan

SSChampion

Points: 14201
More actions
February 26, 2009 at 5:39 pm

#218768

I am using this proc for inserting huge data(approx 150million) it takes around 8hrs. Need suggestions to improve this.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[InsertBudgetTotals_Med]
-- Add the parameters for the stored procedure here
@RevState as varchar(2),
@StageDB as varchar(25),
@RevStateDB as varchar(25),
@label as varchar(50),
@RevStateyr as varchar(4)
AS
BEGIN
/*Document when the insert into production began*/
declare @query as varchar(4000)
declare @sql as varchar(4000)
set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks
select ''' + @RevState + ''', ''Start Budget insert into ' + @RevState + ''' as BookMark,
getdate() as ExecutionTime, ''' + @label + ''''
exec(@query)
--Insert into production & capture control totals
--by looping through all years
while @RevStateyr < '1991'
Begin
set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +
' (Col1,Col2,Col3…..Col48)
select Col1,Col2,Col3…..Col48from ' + @StageDB + '.dbo.Budget' + @RevStateyr +
' where RevStateID = ''' + @RevState + ''''
exec (@sql)
set @RevStateyr = @RevStateyr + 1
End
set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks
select ''' + @RevState + ''', ''End Budget insert into ' + @RevState + ''' as BookMark,
getdate() as ExecutionTime, ''' + @label + ''''
exec(@query)
set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks
select ''' + @RevState + ''', ''Start Control Totals for ' + @RevState + ''' as BookMark,
getdate() as ExecutionTime, ''' + @label + ''''
exec(@query)
set @sql = 'insert into OBBEAVER_Benchmarks.dbo.ControlTotals
select ''' + @RevState + ''', ''Professional'', sum(ControlTotals) from (
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1994 where srcflag = ''p''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''p''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1996 where srcflag = ''p''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1997 where srcflag = ''p''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1998 where srcflag = ''p'') as cts'
exec (@sql)
set @sql = 'insert into OBBEAVER_Benchmarks.dbo.ControlTotals
select ''' + @RevState + ''', ''Facility'', sum(ControlTotals) from (
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1994 where srcflag = ''F''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''F''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1996 where srcflag = ''F''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1997 where srcflag = ''F''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1998 where srcflag = ''F'') as cts'
exec (@sql)
set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks
select ''' + @RevState + ''', ''End Control Totals for ' + @RevState + ''' as BookMark,
getdate() as ExecutionTime, ''' + @label + ''''
exec(@query)
END

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

ab5sr SSCarpal Tunnel Points: 4056 More actions · Answer 1

Extremely tough question to answer for anyone no the board, but right off the bat one thing I know I'd do is split it up so I can run several of the same process in multiple connections. Maybe do each as a year, for instance?

Glenn Dorling SSCertifiable Points: 7938 More actions · Answer 2

We've found SSIS very good for managing multi-stream loads. It takes a while to build but manages it very nicely. If it's not something that needs to be repeated partition the query approximately evenly manually and run each stream in a separate window.

If partitioning the stream make sure the partitioning is done on the clustered index, otherwise you'll likely get deadlocks. Also make sure there's no non-clustered indexes on the target table: they'll get very fragmented by the load and it's generally quicker to build them manually after all the data load streams are complete (another reason why SSIS is handy).

Grant Fritchey SSC Guru Points: 399014 More actions · Answer 3

There's a heck of lot here to chew on, but the first thing that jumped out at me was the this little bit of RBAR (row-by-agonizing-row) processing:

while @RevStateyr < '1991'

Begin

set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +

' (Col1,Col2,Col3…..Col48)

select Col1,Col2,Col3…..Col48 from ' + @StageDB + '.dbo.Budget' + @RevStateyr +

' where RevStateID = ''' + @RevState + ''''

exec (@sql)

set @RevStateyr = @RevStateyr + 1

End

You should look up the phrase "tally table" and probably add the name "Jeff Moden" and you'll find a much, much, much more efficient way to handle this query.

From the looks of things you're breaking the data up by year into seperate tables. Instead, you might want to look into providing a single table, but partitioning the table based on the year. Then you could eliminate all that ad hoc string building and write some nice clean TSQL code.

Just doing those two things will speed up your processing quite a lot.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 4

Grant Fritchey (2/26/2009)
There's a heck of lot here to chew on, but the first thing that jumped out at me was the this little bit of RBAR (row-by-agonizing-row) processing:
while @RevStateyr < '1991'
Begin
set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +
' (Col1,Col2,Col3…..Col48)
select Col1,Col2,Col3…..Col48 from ' + @StageDB + '.dbo.Budget' + @RevStateyr +
' where RevStateID = ''' + @RevState + ''''
exec (@sql)
set @RevStateyr = @RevStateyr + 1
End
You should look up the phrase "tally table" and probably add the name "Jeff Moden" and you'll find a much, much, much more efficient way to handle this query.
From the looks of things you're breaking the data up by year into seperate tables. Instead, you might want to look into providing a single table, but partitioning the table based on the year. Then you could eliminate all that ad hoc string building and write some nice clean TSQL code.
Just doing those two things will speed up your processing quite a lot.

I don't believe that's RBAR... the op is using a loop to load data by given yearly tables. The While loop really won't take up much processing time.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 399014 More actions · Answer 5

I see a loop and I want to eliminate it, if possible. But you're right, of course.

I still think partitioning the data would be a better approach rather than having X number of different tables that you have to muck about with in this dynamic process. Queries will be harder too. If you want to do multi-year aggregates you have to add a new UNION each time.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Mike Levan SSChampion Points: 14201 More actions · Answer 6

Yes, I am doing partion now in my new stuff apart from that i still want to improve my t-sql to insert records very effeceintly and very fast.

Thanks

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 7

Grant Fritchey (2/27/2009)
I see a loop and I want to eliminate it, if possible. But you're right, of course.
I still think partitioning the data would be a better approach rather than having X number of different tables that you have to muck about with in this dynamic process. Queries will be harder too. If you want to do multi-year aggregates you have to add a new UNION each time.

Heh... I'm the same way. I see a loop and I see red... then, I have to go look what they're using it for.

I agree... a sorted insert into partioned tables would probably do the trick nicely provided they have the correct clustered index on each underlying table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 8

Mike, there's a whole lot of things that can make such a large insert slow. For example, a clustered index that's not quite on the right columns for the INSERT can split a lot. Page splits take a lot of extra time. Having a correctly defined clustered index to support the INSERTs can pay off in spades. Also, having too many indexes on a table can be even worse because if an index decides to split, it splits off a whole new extent (8 pages).

Last, but not least, if the database is in the FULL recovery mode or the Bulk-Logged/Simple modes and the INSERT doesn't qualify as a "minimally logged" operation, then the log file will grow a good amount to hold the INSERT of 150 million rows.

What I'm saying is that you need to really take a look at things... it may be that you even need to drop some indexes and rebuild them after the INSERT to help speed things up.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mike Levan SSChampion Points: 14201 More actions · Answer 9

Can i do this through SSIS pkg or BULK Insert to spped up my load.??

Few questions on BULK INSERT

1. Does a user need to be bulkadmin to do or just ddl_admin would be fine.
2. Is it used kust for flat file insert or can also be used from sql table to sql table?

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 10

Mike Levan (3/6/2009)
Can i do this through SSIS pkg or BULK Insert to spped up my load.??
Few questions on BULK INSERT
1. Does a user need to be bulkadmin to do or just ddl_admin would be fine.
2. Is it used kust for flat file insert or can also be used from sql table to sql table?

Probably not... all of your tables already exist and probably have data in them.

To answer your questions... ddl_admin is not sufficient to do Bulk Insert and the only way to use it to go from SQL table to SQL table is via a flat file.

How many and what are the indexes you have on the target tables?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 11

Mike... how long do just one of these take to run?

select count(*) as ControlTotals

from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''p''

That's where I'd start working on tuning this puppy up...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mike Levan SSChampion Points: 14201 More actions · Answer 12

Mike Levan

SSChampion

Points: 14201

March 9, 2009 at 7:25 am

#955962

Jeff

Probably 1 hr for each step.

thanks

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

An hour just to do a count(*) of a table?

Got indexes on the srcflag columns?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

RBarryYoung SSC Guru Points: 143329 More actions · Answer 14

What is the configuration of the server that this is running on?

How much space does this 150 million rows occupy?

What is the disk configuration of the source DB's, TargetDB, and tempdb?

Most importantly, which databases share physical disks for their data files and/or log files?

What are the relative times for the different sections of the procedure?

What are the %Idle for the different physical disk used during execution?

What is the %CPU time of for the CPU's during execution?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Tune Stored Proc

Cookies on SQLServerCentral