February 26, 2009 at 5:39 pm
I am using this proc for inserting huge data(approx 150million) it takes around 8hrs. Need suggestions to improve this.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[InsertBudgetTotals_Med]
-- Add the parameters for the stored procedure here
@RevState as varchar(2),
@StageDB as varchar(25),
@RevStateDB as varchar(25),
@label as varchar(50),
@RevStateyr as varchar(4)
AS
BEGIN
/*Document when the insert into production began*/
declare @query as varchar(4000)
declare @sql as varchar(4000)
set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks
select ''' + @RevState + ''', ''Start Budget insert into ' + @RevState + ''' as BookMark,
getdate() as ExecutionTime, ''' + @label + ''''
exec(@query)
--Insert into production & capture control totals
--by looping through all years
while @RevStateyr < '1991'
Begin
set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +
' (Col1,Col2,Col3…..Col48)
select Col1,Col2,Col3…..Col48from ' + @StageDB + '.dbo.Budget' + @RevStateyr +
' where RevStateID = ''' + @RevState + ''''
exec (@sql)
set @RevStateyr = @RevStateyr + 1
End
set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks
select ''' + @RevState + ''', ''End Budget insert into ' + @RevState + ''' as BookMark,
getdate() as ExecutionTime, ''' + @label + ''''
exec(@query)
set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks
select ''' + @RevState + ''', ''Start Control Totals for ' + @RevState + ''' as BookMark,
getdate() as ExecutionTime, ''' + @label + ''''
exec(@query)
set @sql = 'insert into OBBEAVER_Benchmarks.dbo.ControlTotals
select ''' + @RevState + ''', ''Professional'', sum(ControlTotals) from (
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1994 where srcflag = ''p''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''p''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1996 where srcflag = ''p''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1997 where srcflag = ''p''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1998 where srcflag = ''p'') as cts'
exec (@sql)
set @sql = 'insert into OBBEAVER_Benchmarks.dbo.ControlTotals
select ''' + @RevState + ''', ''Facility'', sum(ControlTotals) from (
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1994 where srcflag = ''F''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''F''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1996 where srcflag = ''F''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1997 where srcflag = ''F''
union all
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1998 where srcflag = ''F'') as cts'
exec (@sql)
set @query = 'insert into OBBEAVER_Benchmarks.dbo.BenchMarks
select ''' + @RevState + ''', ''End Control Totals for ' + @RevState + ''' as BookMark,
getdate() as ExecutionTime, ''' + @label + ''''
exec(@query)
END
February 26, 2009 at 8:16 pm
Extremely tough question to answer for anyone no the board, but right off the bat one thing I know I'd do is split it up so I can run several of the same process in multiple connections. Maybe do each as a year, for instance?
February 26, 2009 at 8:38 pm
We've found SSIS very good for managing multi-stream loads. It takes a while to build but manages it very nicely. If it's not something that needs to be repeated partition the query approximately evenly manually and run each stream in a separate window.
If partitioning the stream make sure the partitioning is done on the clustered index, otherwise you'll likely get deadlocks. Also make sure there's no non-clustered indexes on the target table: they'll get very fragmented by the load and it's generally quicker to build them manually after all the data load streams are complete (another reason why SSIS is handy).
February 26, 2009 at 8:42 pm
There's a heck of lot here to chew on, but the first thing that jumped out at me was the this little bit of RBAR (row-by-agonizing-row) processing:
while @RevStateyr < '1991'
Begin
set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +
' (Col1,Col2,Col3…..Col48)
select Col1,Col2,Col3…..Col48 from ' + @StageDB + '.dbo.Budget' + @RevStateyr +
' where RevStateID = ''' + @RevState + ''''
exec (@sql)
set @RevStateyr = @RevStateyr + 1
End
You should look up the phrase "tally table" and probably add the name "Jeff Moden" and you'll find a much, much, much more efficient way to handle this query.
From the looks of things you're breaking the data up by year into seperate tables. Instead, you might want to look into providing a single table, but partitioning the table based on the year. Then you could eliminate all that ad hoc string building and write some nice clean TSQL code.
Just doing those two things will speed up your processing quite a lot.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 26, 2009 at 10:08 pm
Grant Fritchey (2/26/2009)
There's a heck of lot here to chew on, but the first thing that jumped out at me was the this little bit of RBAR (row-by-agonizing-row) processing:
while @RevStateyr < '1991'
Begin
set @sql = 'Insert Into ' + @RevStateDB + '.dbo.Budget' + @RevStateyr +
' (Col1,Col2,Col3…..Col48)
select Col1,Col2,Col3…..Col48 from ' + @StageDB + '.dbo.Budget' + @RevStateyr +
' where RevStateID = ''' + @RevState + ''''
exec (@sql)
set @RevStateyr = @RevStateyr + 1
End
You should look up the phrase "tally table" and probably add the name "Jeff Moden" and you'll find a much, much, much more efficient way to handle this query.
From the looks of things you're breaking the data up by year into seperate tables. Instead, you might want to look into providing a single table, but partitioning the table based on the year. Then you could eliminate all that ad hoc string building and write some nice clean TSQL code.
Just doing those two things will speed up your processing quite a lot.
I don't believe that's RBAR... the op is using a loop to load data by given yearly tables. The While loop really won't take up much processing time.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2009 at 5:48 am
I see a loop and I want to eliminate it, if possible. But you're right, of course.
I still think partitioning the data would be a better approach rather than having X number of different tables that you have to muck about with in this dynamic process. Queries will be harder too. If you want to do multi-year aggregates you have to add a new UNION each time.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 27, 2009 at 1:43 pm
Yes, I am doing partion now in my new stuff apart from that i still want to improve my t-sql to insert records very effeceintly and very fast.
Thanks
February 27, 2009 at 6:00 pm
Grant Fritchey (2/27/2009)
I see a loop and I want to eliminate it, if possible. But you're right, of course.I still think partitioning the data would be a better approach rather than having X number of different tables that you have to muck about with in this dynamic process. Queries will be harder too. If you want to do multi-year aggregates you have to add a new UNION each time.
Heh... I'm the same way. I see a loop and I see red... then, I have to go look what they're using it for.
I agree... a sorted insert into partioned tables would probably do the trick nicely provided they have the correct clustered index on each underlying table.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2009 at 6:08 pm
Mike, there's a whole lot of things that can make such a large insert slow. For example, a clustered index that's not quite on the right columns for the INSERT can split a lot. Page splits take a lot of extra time. Having a correctly defined clustered index to support the INSERTs can pay off in spades. Also, having too many indexes on a table can be even worse because if an index decides to split, it splits off a whole new extent (8 pages).
Last, but not least, if the database is in the FULL recovery mode or the Bulk-Logged/Simple modes and the INSERT doesn't qualify as a "minimally logged" operation, then the log file will grow a good amount to hold the INSERT of 150 million rows.
What I'm saying is that you need to really take a look at things... it may be that you even need to drop some indexes and rebuild them after the INSERT to help speed things up.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2009 at 1:42 pm
Can i do this through SSIS pkg or BULK Insert to spped up my load.??
Few questions on BULK INSERT
1. Does a user need to be bulkadmin to do or just ddl_admin would be fine.
2. Is it used kust for flat file insert or can also be used from sql table to sql table?
March 6, 2009 at 8:04 pm
Mike Levan (3/6/2009)
Can i do this through SSIS pkg or BULK Insert to spped up my load.??Few questions on BULK INSERT
1. Does a user need to be bulkadmin to do or just ddl_admin would be fine.
2. Is it used kust for flat file insert or can also be used from sql table to sql table?
Probably not... all of your tables already exist and probably have data in them.
To answer your questions... ddl_admin is not sufficient to do Bulk Insert and the only way to use it to go from SQL table to SQL table is via a flat file.
How many and what are the indexes you have on the target tables?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2009 at 8:16 pm
Mike... how long do just one of these take to run?
select count(*) as ControlTotals
from ' + @RevStateDB + '.dbo.Budget1995 where srcflag = ''p''
That's where I'd start working on tuning this puppy up...
--Jeff Moden
Change is inevitable... Change for the better is not.
March 9, 2009 at 7:25 am
Jeff
Probably 1 hr for each step.
thanks
March 9, 2009 at 9:39 am
An hour just to do a count(*) of a table?
Got indexes on the srcflag columns?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 9, 2009 at 10:34 am
What is the configuration of the server that this is running on?
How much space does this 150 million rows occupy?
What is the disk configuration of the source DB's, TargetDB, and tempdb?
Most importantly, which databases share physical disks for their data files and/or log files?
What are the relative times for the different sections of the procedure?
What are the %Idle for the different physical disk used during execution?
What is the %CPU time of for the CPU's during execution?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply