Performance Question

Question

Performance Question

scziege

SSC Veteran

Points: 247
More actions
July 2, 2009 at 3:11 am

#211638

Dear all,
I have a database in which exists a table with 100 Million rows. At the Moment no
partationing, one column caled code with a varchar(10) Datatype as the primary key.
Then I have a stored procedure with a paramter how many Codes I want to create.
In a while loop a make a lookup to the code table to check if the code exists if not
I insert the code to the code table.
Has someone an idea to increase the performance. If i run the stored procedure
with 1 Million rows it runs 15 Minutes, that is too long.
Thanks for all replies
Thorsten

Viewing 15 posts - 1 through 15 (of 50 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

scziege SSC Veteran Points: 247 More actions · Answer 2

Here are the needed Informations:

CREATE TABLE [dbo].[Codes](

[Code] [char](10) NOT NULL,

CONSTRAINT [PK_Codes] PRIMARY KEY CLUSTERED

(

[Code] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

) ON [PRIMARY]

-------------------------------------------------------------------------------------------------

THE STORED PROCEDURE

ALTER PROCEDURE [dbo].[CreateCodes]

-- Add the parameters for the stored procedure here

@amount bigint

AS

BEGIN

SET NOCOUNT ON

declare @tempcode as nvarchar(100)

declare @counter as bigint

set @counter = 1

while @counter <= @amount

begin

set @tempcode = (SELECT [dbo].[RNGCharacterMask] (8 ,'34679CDFGHJKLNPRTVXY') )

if ((select count(code) from codes where code = @tempcode) = 0)

begin

insert into codes (code) values (@tempcode)

set @counter = @counter + 1

continue

end

else

continue

end

set nocount off

END

-----------------------------------------------------------------------------------------------

The Stored Procedure calls a User defined function which generates the codes. But this ist not

the reason why the performance ist so bad.

In addition I add the excecution plan as txt file. and I think the most performance is consumed by the insert statement. But take a look and give me your feedbacks.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

What are you trying to achieve here? What is this function supposed to do and why?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

scziege SSC Veteran Points: 247 More actions · Answer 4

Hey,

in the data base we have codes which will printed on packages.

We have two things to do:

First we have the generate new codes in some interfalls. Generating means between 10 and 50 million new codes.

the other thing is to query the database to get a code which is not used, the state for used will be insert in the future.

I hope I can clearify your question.

greetings

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

Can you post the code for that function? Running a function repeatedly in a loop is not the fastest way of doing things, but if I put the function into a set-based insert, it'll still be run repeatedly.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

scziege SSC Veteran Points: 247 More actions · Answer 6

I can not give you the code of the function

because of security reasons....

Only as much the function generates a code with a security cryption provider.

But I have one Problem because I have to ensure that exact

the amount of codes is generate.

You understand.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

I can possibly write a set-based alternative, but it's not going to perform much better. The function will still have to be called a couple million times and that's going to cost.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

I have a gut-feel that this is not the best approach. What uses these codes and why can a code not be generated when it's needed rather than a few million getting generated upfront?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 9

Scalar udfs are generally quite slow and should be avoided.

Try to isolated the issue , what is performance like if you make it a sequential count ?

Also i notice that code is used is the clustered primary key , it could be page splitting on the inserts ?

Clear Sky SQL
My Blog[/url]

Jack Corbett SSC Guru Points: 184394 More actions · Answer 10

Dave Ballantyne (7/2/2009)
Scalar udfs are generally quite slow and should be avoided.
Try to isolated the issue , what is performance like if you make it a sequential count ?
Also i notice that code is used is the clustered primary key , it could be page splitting on the inserts ?

Ouch, you can pretty much be sure there are page splits happening in a million row insert.

Like Gail has said, without knowing what the code is in the function there really isn't a way to make it faster because the UDF will still run a million times.

Can you at least give us some pseudocode for the UDF like:

Take the second parameter passed in

Loop through it by character, multiplying the the ASCII Code by the first parameter and if > 255 substract 255 and place that character in the 10 character string.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

You said you can't post the code for the UDF, but could you post code for the UDF with a faked process? It just has to show us how it processes the data passed without showing us the actual details. I hope that makes some sense.

Paul White SSC Guru Points: 150468 More actions · Answer 12

scziege (7/2/2009)
The stored procedure calls a user-defined function which generates the code. But this is not
the reason why the performance is so bad.

I would encourage you to test that. The script below will allow you to see how expensive the function call is. It is set up to call a test function that doubles a number. The function is executed one million times per run, and there are five runs. You get summary performance statistics at the end. There are comments to explain what it is doing, and where to plug your private function in.

Paul

use tempdb;

go

-- Holds numbers 1 to 1 million

createtable [#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446]

(

NBIGINT PRIMARY KEY CLUSTERED,

);

-- Generate numbers

withNumbers (N)

as(

selecttop (1000000)

ROW_NUMBER() over (order by (select null))

frommaster.sys.columns C1, master.sys.columns C2, master.sys.columns C3

)

insert[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446] with (tablockx) (N)

selectN

fromNumbers;

go

-- Test function - just doubles the input

create function dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71] (@number bigint) returns bigint with schemabinding as begin return @number * 2; end;

go

-- Clear ad-hoc sql plans (warning: server-wide, run on test only!)

dbcc freesystemcache(N'SQL Plans');

go

-- Run function one million times - substitute your function here!

declare@Bitbucket BIGINT;

select@Bitbucket = dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71] (N)

from[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446];

go 5

-- Show test results

selectstatement_text = SUBSTRING([text], statement_start_offset / 2, (case statement_end_offset when -1 then DATALENGTH(text) else statement_end_offset end) / 2),

sample_size = execution_count,

[avg elapsed time µs] = total_elapsed_time / execution_count,

[avg elapsed time sec] = CONVERT(DEC(9, 3), total_elapsed_time / execution_count / 1000000.0),

[avg cpu time µs] = total_worker_time / execution_count,

[avg cpu time sec] = CONVERT(DEC(9, 3), total_worker_time / execution_count / 1000000.0),

[avg logical reads] = total_logical_reads / execution_count

fromsys.dm_exec_query_stats as qs

cross

applysys.dm_exec_sql_text (qs.[sql_handle]) as st

wherest.[text] like '%@BitBucket%[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446]%'

andst.[text] not like '%dm_exec_query_stats%';

go

-- Tidy up

drop table [#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446];

drop function dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71];

go

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 13

scziege (7/2/2009)
In addition I add the execution plan as txt file. I think the most performance is consumed by the insert statement. But take a look and give me your feedback.

You should disregard the estimated cost percentages when looking at plans - they are wildly inaccurate much of the time (for good reasons).

For example, the call to your function ([Expr1000] = Scalar Operator([NestleDECodes].[dbo].[RNGCharacterMask]((8),N'34679CDFGHJKLNPRTVXY'))) is costed at 0.0000001 units (0%) since the server has no way to know how expensive the function call is in reality.

Anyhow, as I hope you will now be able to test for yourself, the function call is probably over 99% of the cost of a real execution run.

A few small points:

IF NOT EXISTS (SELECT * FROM dbo.Codes WHERE code = @tempcode)

...is frequently more efficient than...

IF (SELECT count(code) FROM codes WHERE code = @tempcode) = 0

You should be careful to match data types. That lookup on code = @tempcode is less efficient than it could be due to an implicit conversion:

CONVERT_IMPLICIT(nchar(10),[NestleDECodes].[dbo].[Codes].[Code],0)=[@tempcode]

(The Code column is defined as CHAR(10) not NCHAR(10))

The SET NOCOUNT OFF at the end of the procedure is pointless. SET NOCOUNT reverts to the setting in the outer scope when the procedure returns anyway.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 14

Paul White (7/2/2009)
You should disregard the estimated cost percentages when looking at plans - they are wildly inaccurate much of the time (for good reasons).

And they are always wildly inaccurate when there are scalar UDFs involved, because SQL estimates 0% for the udf execution, regardless of what it does.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass