July 2, 2009 at 3:11 am
Dear all,
I have a database in which exists a table with 100 Million rows. At the Moment no
partationing, one column caled code with a varchar(10) Datatype as the primary key.
Then I have a stored procedure with a paramter how many Codes I want to create.
In a while loop a make a lookup to the code table to check if the code exists if not
I insert the code to the code table.
Has someone an idea to increase the performance. If i run the stored procedure
with 1 Million rows it runs 15 Minutes, that is too long.
Thanks for all replies
Thorsten
July 2, 2009 at 3:46 am
Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 2, 2009 at 7:07 am
Here are the needed Informations:
CREATE TABLE [dbo].[Codes](
[Code] [char](10) NOT NULL,
CONSTRAINT [PK_Codes] PRIMARY KEY CLUSTERED
(
[Code] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
-------------------------------------------------------------------------------------------------
THE STORED PROCEDURE
ALTER PROCEDURE [dbo].[CreateCodes]
-- Add the parameters for the stored procedure here
@amount bigint
AS
BEGIN
SET NOCOUNT ON
declare @tempcode as nvarchar(100)
declare @counter as bigint
set @counter = 1
while @counter <= @amount
begin
set @tempcode = (SELECT [dbo].[RNGCharacterMask] (8 ,'34679CDFGHJKLNPRTVXY') )
if ((select count(code) from codes where code = @tempcode) = 0)
begin
insert into codes (code) values (@tempcode)
set @counter = @counter + 1
continue
end
else
continue
end
set nocount off
END
-----------------------------------------------------------------------------------------------
The Stored Procedure calls a User defined function which generates the codes. But this ist not
the reason why the performance ist so bad.
In addition I add the excecution plan as txt file. and I think the most performance is consumed by the insert statement. But take a look and give me your feedbacks.
July 2, 2009 at 7:15 am
What are you trying to achieve here? What is this function supposed to do and why?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 2, 2009 at 7:21 am
Hey,
in the data base we have codes which will printed on packages.
We have two things to do:
First we have the generate new codes in some interfalls. Generating means between 10 and 50 million new codes.
the other thing is to query the database to get a code which is not used, the state for used will be insert in the future.
I hope I can clearify your question.
greetings
July 2, 2009 at 7:24 am
Can you post the code for that function? Running a function repeatedly in a loop is not the fastest way of doing things, but if I put the function into a set-based insert, it'll still be run repeatedly.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 2, 2009 at 7:35 am
I can not give you the code of the function
because of security reasons....
Only as much the function generates a code with a security cryption provider.
But I have one Problem because I have to ensure that exact
the amount of codes is generate.
You understand.
July 2, 2009 at 7:41 am
I can possibly write a set-based alternative, but it's not going to perform much better. The function will still have to be called a couple million times and that's going to cost.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 2, 2009 at 7:43 am
I have a gut-feel that this is not the best approach. What uses these codes and why can a code not be generated when it's needed rather than a few million getting generated upfront?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 2, 2009 at 9:20 am
Scalar udfs are generally quite slow and should be avoided.
Try to isolated the issue , what is performance like if you make it a sequential count ?
Also i notice that code is used is the clustered primary key , it could be page splitting on the inserts ?
July 2, 2009 at 9:40 am
Dave Ballantyne (7/2/2009)
Scalar udfs are generally quite slow and should be avoided.Try to isolated the issue , what is performance like if you make it a sequential count ?
Also i notice that code is used is the clustered primary key , it could be page splitting on the inserts ?
Ouch, you can pretty much be sure there are page splits happening in a million row insert.
Like Gail has said, without knowing what the code is in the function there really isn't a way to make it faster because the UDF will still run a million times.
Can you at least give us some pseudocode for the UDF like:
Take the second parameter passed in
Loop through it by character, multiplying the the ASCII Code by the first parameter and if > 255 substract 255 and place that character in the 10 character string.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 2, 2009 at 9:41 am
You said you can't post the code for the UDF, but could you post code for the UDF with a faked process? It just has to show us how it processes the data passed without showing us the actual details. I hope that makes some sense.
July 2, 2009 at 9:01 pm
scziege (7/2/2009)
The stored procedure calls a user-defined function which generates the code. But this is notthe reason why the performance is so bad.
I would encourage you to test that. The script below will allow you to see how expensive the function call is. It is set up to call a test function that doubles a number. The function is executed one million times per run, and there are five runs. You get summary performance statistics at the end. There are comments to explain what it is doing, and where to plug your private function in.
Paul
use tempdb;
go
-- Holds numbers 1 to 1 million
createtable [#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446]
(
NBIGINT PRIMARY KEY CLUSTERED,
);
-- Generate numbers
withNumbers (N)
as(
selecttop (1000000)
ROW_NUMBER() over (order by (select null))
frommaster.sys.columns C1, master.sys.columns C2, master.sys.columns C3
)
insert[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446] with (tablockx) (N)
selectN
fromNumbers;
go
-- Test function - just doubles the input
create function dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71] (@number bigint) returns bigint with schemabinding as begin return @number * 2; end;
go
-- Clear ad-hoc sql plans (warning: server-wide, run on test only!)
dbcc freesystemcache(N'SQL Plans');
go
-- Run function one million times - substitute your function here!
declare@Bitbucket BIGINT;
select@Bitbucket = dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71] (N)
from[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446];
go 5
-- Show test results
selectstatement_text = SUBSTRING([text], statement_start_offset / 2, (case statement_end_offset when -1 then DATALENGTH(text) else statement_end_offset end) / 2),
sample_size = execution_count,
[avg elapsed time µs] = total_elapsed_time / execution_count,
[avg elapsed time sec] = CONVERT(DEC(9, 3), total_elapsed_time / execution_count / 1000000.0),
[avg cpu time µs] = total_worker_time / execution_count,
[avg cpu time sec] = CONVERT(DEC(9, 3), total_worker_time / execution_count / 1000000.0),
[avg logical reads] = total_logical_reads / execution_count
fromsys.dm_exec_query_stats as qs
cross
applysys.dm_exec_sql_text (qs.[sql_handle]) as st
wherest.[text] like '%@BitBucket%[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446]%'
andst.[text] not like '%dm_exec_query_stats%';
go
-- Tidy up
drop table [#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446];
drop function dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71];
go
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 2, 2009 at 9:21 pm
scziege (7/2/2009)
In addition I add the execution plan as txt file. I think the most performance is consumed by the insert statement. But take a look and give me your feedback.
You should disregard the estimated cost percentages when looking at plans - they are wildly inaccurate much of the time (for good reasons).
For example, the call to your function ([Expr1000] = Scalar Operator([NestleDECodes].[dbo].[RNGCharacterMask]((8),N'34679CDFGHJKLNPRTVXY'))) is costed at 0.0000001 units (0%) since the server has no way to know how expensive the function call is in reality.
Anyhow, as I hope you will now be able to test for yourself, the function call is probably over 99% of the cost of a real execution run.
A few small points:
IF NOT EXISTS (SELECT * FROM dbo.Codes WHERE code = @tempcode)
...is frequently more efficient than...
IF (SELECT count(code) FROM codes WHERE code = @tempcode) = 0
You should be careful to match data types. That lookup on code = @tempcode is less efficient than it could be due to an implicit conversion:
CONVERT_IMPLICIT(nchar(10),[NestleDECodes].[dbo].[Codes].[Code],0)=[@tempcode]
(The Code column is defined as CHAR(10) not NCHAR(10))
The SET NOCOUNT OFF at the end of the procedure is pointless. SET NOCOUNT reverts to the setting in the outer scope when the procedure returns anyway.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 3, 2009 at 12:03 am
Paul White (7/2/2009)
You should disregard the estimated cost percentages when looking at plans - they are wildly inaccurate much of the time (for good reasons).
And they are always wildly inaccurate when there are scalar UDFs involved, because SQL estimates 0% for the udf execution, regardless of what it does.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 50 total)
You must be logged in to reply to this topic. Login to reply