Performance Question

Question

Post reply

Performance Question

Viewing 15 posts - 16 through 30 (of 50 total)

You must be logged in to reply to this topic. Login to reply

scziege SSC Veteran Points: 247 More actions · Answer 1

Thank's for all the good hints:

I modified my UDF an now I can show the code:

public static SqlString RNGCharacterMask(int codesize, string characters)

{

int maxSize = codesize;

int minSize = codesize;

char[] chars = new char[20];

string a;

a = characters;

chars = a.ToCharArray();

int size = maxSize;

byte[] data = new byte[1];

RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();

crypto.GetNonZeroBytes(data);

size = maxSize;

data = new byte;

crypto.GetNonZeroBytes(data);

StringBuilder result = new StringBuilder(size);

foreach (byte b in data)

{ result.Append(chars); }

return result.ToString();

}

Hope that clearify my problem a little bit.

Thank's in advance

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 2

Ouch.

I would modify the CLR Udf to returns a data set of '@amount' Codes and store those to a temp table.

Then delete duplicates with the codes table and re-execute the UDF returning the rowcount from the delete.

Repeat until there are no duplicate rows.

This will minimize the amount of round trips to the udf , plus the SQL logic is much more set based.

Ive never used the RNGCryptoServiceProvider class but you may only have to have one instance of if it if GetBytes returns a different value at each call.

As another thought though does all this function do return a string of variable(edit: random) characters from '34679CDFGHJKLNPRTVXY' ?

If so a pure TSQL solution should be pretty simple

Post back if this isn't clear

Clear Sky SQL
My Blog[/url]

scziege SSC Veteran Points: 247 More actions · Answer 3

Can you please provide me a sample TSQL

Application which replaces the UDF Function ?

ThanK's

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 4

try this , ill be interested to hear how it performs in comparison.

Note that the case statement on the call to GetRandomCode is required otherwise you will get the same value returned.

On my dev server it creates 15,000,000 rows in about 11 mins, cutting down the size of the union to your limited set should improve matters.

CREATE TABLE dbo.Numbers (Num INT NOT NULL PRIMARY KEY CLUSTERED);

GO

DECLARE @i INT;

SELECT @i = 1;

WHILE @i <= 10000

BEGIN

INSERT INTO dbo.Numbers(Num) VALUES (@i);

SELECT @i = @i + 1;

END;

Create View VwNewId

as

Select New_Id = newid()

go

Drop Function GetRandomCode

go

Create Function GetRandomCode(@CharsNeeded integer)

returns table

as

return

(

with cteCharsNeeded(Character)

as(

select '1' union all select '2' union all select '3' union all select '4' union all select '6' union all select '7' union all select '8' union all select '9' union all Select '0' union all

select 'A' union all select 'B' union all select 'C' union all select 'D' union all select 'E' union all select 'F' union all select 'G' union all select 'H' union all Select 'I' union all

select 'J' union all select 'K' union all select 'L' union all select 'M' union all select 'N' union all select 'O' union all select 'P' union all select 'Q' union all Select 'R' union all

select 'S' union all select 'T' union all select 'U' union all select 'V' union all select 'W' union all select 'X' union all select 'Y' union all select 'Z' union all

select 'a' union all select 'b' union all select 'c' union all select 'd' union all select 'e' union all select 'f' union all select 'g' union all select 'h' union all select 'i' union all

select 'j' union all select 'k' union all select 'l' union all select 'm' union all select 'n' union all select 'o' union all select 'p' union all select 'q' union all select 'r' union all

select 's' union all select 't' union all select 'u' union all select 'v' union all select 'w' union all select 'x' union all select 'y' union all select 'z'

),

ctenumbers(num)

as

(

Select Num from numbers

where Num <= @CharsNeeded

),

cteRandomChars(num,c)

as

(

select num,c = chars.c

from ctenumbers cross apply (select top 1 Character from cteCharsNeeded,vwNewId where num = num order by new_id) as chars(c)

)

select (

select c as [text()]

from cteRandomChars

for xml path('')) as Random

)

go

drop table #res

go

create table #res

(

random char(8)

)

insert into #res

select Random from sysobjects cross apply GetRandomCode(case when id is not null then 8 else null end)

Clear Sky SQL
My Blog[/url]

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 5

Ok give this a try...

1,000,000 random strings in 24Seconds

CREATE TABLE dbo.Numbers (Num INT NOT NULL PRIMARY KEY CLUSTERED);

GO

DECLARE @i INT;

SELECT @i = 1;

WHILE @i <= 1000

BEGIN

INSERT INTO dbo.Numbers(Num) VALUES (@i);

SELECT @i = @i + 1;

END;

go

update statistics Numbers with fullscan

go

drop View VwNewCheck

go

Create View VwNewCheck

as

Select abs(checksum(NewId())) as New_Id

go

Drop Function GetRandomCode

go

Create Function GetRandomCode(@StrLen integer,@CharsNeeded char(62))

returns table

as

return

(

with cteRandomChars(num,c)

as

(

Select Num,substring(@CharsNeeded,(Select VwNewCheck.new_id%(len(@CharsNeeded)-1)+1 from VwNewCheck where num = num ),1)

from numbers

where Num <= @StrLen

)

select (

select c as [text()]

from cteRandomChars

for xml path('')) as random

)

go

create table #random

(

random char(8)

)

insert into #random

select top 15000000 Random from sysobjects cross apply sysobjects so1 cross apply GetRandomCode(case when so1.id is not null or sysobjects.id is not null then 8 else 0 end,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789')

select @@rowcount

Clear Sky SQL
My Blog[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 6

Once you have a way of generating many codes all at once, you need a way to efficiently check for duplicates, and to reach your target for new unique codes.

The script below illustrates some techniques. I wrote this before seeing Dave Ballantyne's posts, so I use a different method to generate 'random' codes.

This site will ruin the formatting, so I have attached it as a file as well.

Paul

-- Safe database

USE tempdb;

GO

-- Drop our test table if it exists

IF OBJECT_ID(N'dbo.UniqueCodes', N'U') IS NOT NULL DROP TABLE dbo.UniqueCodes;

GO

-- Create the test table

CREATE TABLE dbo.UniqueCodes

(

codeVARCHAR(10) NOT NULL UNIQUE,-- our unique codes

hash_codeINTEGER NOT NULL,-- a hash of the unique code to optimize lookups

);

GO

-- Create an index on the hash code

CREATE NONCLUSTERED INDEX nc1 ON dbo.UniqueCodes (hash_code);

--

-- Main script start

-- (you may have to run this several times before unique code collisions start to occur)

--

DECLARE@RowsToAdd BIGINT,-- Unique codes to add to the table

@MaxGen BIGINT,-- Maximum number of random codes to generate (some may already exist)

@MaxPasses INT, -- Maximum number of passes to make before giving up

@Magic BIGINT, -- A small number of extra rows to cope with generated duplicates and duplicates in the table (optimization, not critical)

@pass INT-- The current pass

--

-- Change these values to experiment

--

SELECT@RowsToAdd = 25000,-- We want to add these many codes

@MaxGen = 25100,-- Prepared to generate up to this many codes on each pass

@Magic = 25,-- This magic number can help produce a plan with a flow distinct if not many rows are needed

@MaxPasses = 5,-- Prepared to try five times

@pass = 0-- First pass (don't change this)

-- Optional

-- UPDATE STATISTICS dbo.UniqueCodes;

-- Loop while we have unique codes to find, and have passes remaining

WHILE(@RowsToAdd > 0 AND @pass 0.2 THEN CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65) ELSE '' END +

CASE WHEN RAND() > 0.3 THEN CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65) ELSE '' END +

CASE WHEN RAND() > 0.4 THEN CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65) ELSE '' END +

CASE WHEN RAND() > 0.5 THEN CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65) ELSE '' END +

CASE WHEN RAND() > 0.6 THEN CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65) ELSE '' END +

CASE WHEN RAND() > 0.7 THEN CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65) ELSE '' END +

CASE WHEN RAND() > 0.8 THEN CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65) ELSE '' END +

CASE WHEN RAND() > 0.9 THEN CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65) ELSE '' END

FROMmaster.sys.columns C1, master.sys.columns C2, master.sys.columns C3

),

A (data) AS

(

-- Generate the hash code (a simple CHECKSUM will do here)

-- add filter out any duplicate codes we generated

SELECTDISTINCT

TOP (@RowsToAdd + @Magic)

N.data

FROMRandom N

),

B (data, hash_code, may_clash) AS

(

-- Create a flag which is set if the code may already exist in the target table

-- If the flag is zero, the code definitely does not already exist, so we can add it

SELECTTOP (@RowsToAdd + @Magic)

data,

CHECKSUM(data),

CASE WHEN EXISTS (SELECT * FROM dbo.UniqueCodes U WHERE U.hash_code = CHECKSUM(A.data)) THEN 1 ELSE 0 END

FROMA

)

-- Add a maximum of @RowsToAdd rows from CTE 'B'

-- The flag allows the optimizer to use start-up filters to optimize lookups on the existing data

INSERT dbo.UniqueCodes (code, hash_code)

SELECTTOP (@RowsToAdd)

B.data, B.hash_code

FROMB

WHEREmay_clash = 0 -- Safe to add if the flag is zero

OR(

-- If the flag is set, we need to check the code explicitly, since

-- the hash shows a possible match

may_clash = 1

AND NOT EXISTS

(

-- Can add if the code does not already exist

SELECT*

FROMdbo.UniqueCodes U

WHEREU.code = B.data

)

OPTION(RECOMPILE);-- The optimal plan may change significantly between calls

-- Adjust the number of codes we need to find, and increment the pass counter

SELECT@RowsToAdd = @RowsToAdd - @@ROWCOUNT,

@pass = @pass + 1;

-- Informational: pass number & rows remaining

IF @RowsToAdd > 0 SELECT [Pass #] = @pass, [Rows Remaining] = @RowsToAdd;

END;

-- Error if we failed

IF@RowsToAdd > 0 RAISERROR('Failed to add the required number of codes - %i remain', 16, 1, @RowsToAdd);

GO

SELECT [Total Unique Codes] = COUNT_BIG(*) FROM dbo.UniqueCodes;

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

dervishi Newbie Points: 7 More actions · Answer 7

Hi,

While your table has more then 100 mil. why you don't try to crete triggers and one another table. Changes on primary table such as delete or insert should update row on another table.

The new tabel can have only one column type of int or long int and on this column can be saved the total number of rows in primary table. After that you don;t need to scan all 100 mil rows.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 8

It's not the function that takes time here.

It's the LOOP. See this row of code

if ((select count(code) from codes where code = @tempcode) = 0)

For each created code, you COUNT the Codes table if the code already exists. And THAT will take a LONG TIME.

Try this instead

IF NOT EXISTS (SELECT * FROM Codes WHERE Code = @TempCode)

And you will be better off using IGNORE_DUP_KEY on your table and insert all created tempcodes. If successful, the @@ROWCOUNT will be 1 and @@ERROR = 0, if not successful @@ROWCOUNT = 0 AND @@ERROR 0.

Then you can keep track of the successful inserts and increment a variable.

Rewrite the code and make it work.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 9

Try this

CREATE TABLE [dbo].[Codes](

[Code] [char](10) NOT NULL,

CONSTRAINT [PK_Codes] PRIMARY KEY CLUSTERED

(

[Code] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

) ON [PRIMARY]

ALTER PROCEDURE [dbo].[CreateCodes]

(

@amount bigint

)

AS

SET NOCOUNT ON

set @amount = abs(@amount )

while @amount > 0

begin

insertcodes

(

code

)

SELECT [dbo].[RNGCharacterMask](8, '34679CDFGHJKLNPRTVXY')

IF @@ERROR = 0 AND @@ROWCOUNT = 1

SET @Amount = @Amount - 1

end

N 56°04'39.16"
E 12°55'05.25"

Paul White SSC Guru Points: 150468 More actions · Answer 10

Peso (7/10/2009)
It's not the function that takes time here.
It's the LOOP.

That has already been covered in a previous post.

It rather misses the point anyway - there is no need for a loop, a set-based solution will be much faster (especially since the expensive random-generating class is instantiated for each call). That has also already been covered by several people.

Peso (7/10/2009)
And you will be better off using IGNORE_DUP_KEY on your table and insert all created tempcodes

There is are some important caveats to this. (Many people avoid IGNORE_DUP_KEY for these, and other, reasons)

First, the IGNORE_DUP_KEY should be set on a clustered index only. It would be unwise to change the definition to make the primary key non-clustered. Adding a future non-clustered index with IGNORE_DUP_KEY would produce the same effect: the query plan changes dramatically, making it cheaper to check for and eliminate potential duplicates in the query rather than relying on the engine to handle the duplicates. See this blog entry by Craig Freeman for full details.

The second caveat is to note that IGNORE_DUP_KEY only works with INSERT statements. Any UPDATE statement that would result in a duplicate key will raise an error and roll back as normal.

Third, one needs to be aware that if a key does not exist in the table, but the INSERT set contains duplicates for that key value, one row will be inserted, and one will be rejected. Exactly which row is inserted is non-deterministic. One could argue that this breaks the normally atomic behaviour of a transaction where either all rows are inserted or none are.

Fourth, you should be aware that some client APIs misinterpret SQL Server's 'duplicate key was ignored' warning as an error.

Paul

Given that the above considerations are neither well documented, nor well-known, it would make sense to highlight these issues when recommending a solution based on IGNORE_DUP_KEY.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 11

Yes, I have read the previous posts.

But why suggest an alternative that returns an error saying "Sorry mate, couldn't create as many codes as you wanted."

This behaviour wasn't there before and why introduce it at all?

OP just wants the stored procedure to create the number of codes he wants with no fuzz.

I admit the set-based way to create the codes is WAY more efficient but doing this, you also need to take care of the duplicates in the temp table.

One way or another, you're bound to check for number of unique records created in one way or another.

And the CHECKSUM approach for hash value? What can I say more than it's a BAD choice.

See here how checksum is calculated. It will provide very many false positives... CHECKSUM('bQ') = CHECKSUM('aa') and CHECKSUM('ABA') = CHECKSUM('ACQ')

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 12

Or this pseudo code..

INSERT INTO #Temp (Code)

SELECT ... (record count equals the existing number of codes in Codes table plus the new wanted number).

insert into Codes

SELECT TOP (@wantedrecords)

from #Temp AS t

where not exist (select * from codes as c where c.code = t.code)

It only requires two set-based operations and will always work without any errors, as long as the first creation guarantees unique output.

N 56°04'39.16"
E 12°55'05.25"

Paul White SSC Guru Points: 150468 More actions · Answer 13

Peso (7/10/2009)
But why suggest an alternative that returns an error saying "Sorry mate, couldn't create as many codes as you wanted." This behaviour wasn't there before and why introduce it at all?

Because it is a script to illustrate technique, not a cut-and-paste solution, as I stated explicitly in that post.

Peso (7/10/2009)
I admit the set-based way to create the codes is WAY more efficient but doing this, you also need to take care of the duplicates in the temp table. One way or another, you're bound to check for number of unique records created in one way or another.

Absolutely. So one might as well do it in a robust and efficient manner.

Peso (7/10/2009)
And the CHECKSUM approach for hash value? What can I say more than it's a BAD choice.

The comment in the code states that CHECKSUM is sufficient for the demo script. If you are that concerned about collisions, use HashBytes instead. I would argue that CHECKSUM is faster and eliminates sufficient joins to make it well worth doing. Try running the script (instead of making assumptions) - CHECKSUM is a perfectly usable hash function.

Please also see my comments regarding your choice of IGNORE_DUP_KEY in my previous post.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 14

Hi Peso ,

I think you are being a little unfair..

checksumming the newid value is pretty safe due to the size (in terms of bytes) or the uniqueidentifier type. You could reverse engineer 2 ids to have the same checksum , but after abs(checksum) 1 million newid()'s i have 217 collisions. I think generating an 8 character random string with this method is as random as any method can be. Having populated 2million rows with my test script , i only have 1 duplicate string. I think that is is 'fit for purpose'.

Clear Sky SQL
My Blog[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 15

Hey Dave,

Your logic also applies to the hash_code calculated on my UniqueCodes table as well - since the hash is for the code column which is VARCHAR(10).

In my current UniqueCodes table I have 845K rows. Duplicate hash values based on the CHECKSUM exist for 29K hash keys, which is 3%. Seems reasonable.

In a new run, 25K potentially unique codes generate 280 matches on the hash key, with 19 codes turning out to be unique. That compares well to the alternative of doing a join between those 25K rows and the 845K rows in the main table.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi