Performance issue with tally solution

Question

Post reply

Performance issue with tally solution

Viewing 15 posts - 376 through 390 (of 522 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 1

See what I mean? This works...

[font="Courier New"] SELECT [Name],LenCount,LenTally,REPLACE(definition,CHAR(13)+CHAR(10),CHAR(1))

FROM (

SELECT TOP(200) o.name, m.definition,

(DATALENGTH(m.Definition)-DATALENGTH(REPLACE(m.Definition,(CHAR(13)+CHAR(10)),'')))/4 AS LenCount,

(

SELECT COUNT(*) AS DoDah

FROM Tally

WHERE N <= LEN(definition)

AND SUBSTRING(definition,N,2) = CHAR(13)+CHAR(10)

)AS LenTally

FROM MASTER.sys.all_objects o

JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID

WHERE TYPE = 'P'

)d

WHERE LenCount LenTally[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 2

Actually, it's a "white space" problem.

[font="Courier New"]DECLARE @CrLf NVARCHAR(MAX)

SET @CrLf = 'X '+CHAR(13)+CHAR(10)

SELECT LEN(@CrLf),

LEN(REPLACE(@CrLf,CHAR(13)+CHAR(10),''))[/font]

Any sproc that has spaces just before the last CrLf will have the problem because when the trailing CrLf goes away, those spaces become trailing spaces that LEN will not find.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Factor SSC-Insane Points: 20254 More actions · Answer 3

Jeff, you're absolutely right. Oh dear, I should have realised that, but I stupidly assumed that the two would be the same.

This proves the point

[font="Courier New"]SELECT [name] FROM

(

SELECT TOP(200) o.name, m.definition

FROM MASTER.sys.all_objects o

JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID

WHERE TYPE = 'P'

) f

WHERE --get the count using the Replace trick

(LEN(REPLACE(definition,' ','|'))- LEN(REPLACE(REPLACE(definition,' ','|'),'

','')))/2

<> --use a tally table

(SELECT COUNT(*) FROM (SELECT SUBSTRING(definition,number,2)AS 'ch' FROM numbers WHERE number<=LEN(definition)) f

WHERE ch='

')

[/font]

Which produces the correct result. We can then use a much neater way of counting the no. of lines.

[font="Courier New"]SELECT [name] FROM

(

SELECT TOP(200) o.name, m.definition

FROM MASTER.sys.all_objects o

JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID

WHERE TYPE = 'P'

) f

WHERE --get the count using the Replace trick

LEN(definition)- LEN(REPLACE(definition, '

','|'))

<> --use a tally table

(SELECT COUNT(*) FROM (SELECT SUBSTRING(definition,number,2)AS 'ch' FROM numbers WHERE number<=LEN(definition)) f

WHERE ch='

')

[/font]

Best wishes,
Phil Factor

Phil Factor SSC-Insane Points: 20254 More actions · Answer 4

This is the solution to the original problem, based on the Update method. It actually uses a tally table too. Because it has to do a lot of string lifting, it is slower than the other two methods, but I thought I'd put it in just for the sake of completeness. I feel sure that, with a bit of optimisation, it could be as fast as the pure tally, but I hit problems with the UPDATE ... FROM query, whist using variables. SQL Server Bugs, I'm afraid. I backed off.

[font="Courier New"]DECLARE @Delimiter VARCHAR(10)

SELECT @Delimiter='

'

DECLARE @insertTable TABLE

(

insertTable_id INT IDENTITY(1,1) PRIMARY KEY,

[name] NVARCHAR(128),

[start] INT DEFAULT 0,

[end] INT DEFAULT 0,

TheSubString VARCHAR(MAX) DEFAULT '',

Definition VARCHAR(MAX)

)

INSERT INTO @insertTable (name,definition)

SELECT name,definition FROM @source

INNER JOIN @tally ON n<=

(LEN(definition)

-LEN(REPLACE(definition,@Delimiter,'|')))/(LEN(@Delimiter)-1)+1

DECLARE @Start INT, @End INT,@LenDelimiter INT

DECLARE @previousname VARCHAR(80),@dummy VARCHAR(MAX)

SELECT @previousname='', @LenDelimiter=LEN(@Delimiter)

UPDATE @insertTable

SET

@start=start=CASE WHEN [name]=@previousname THEN @end+@LenDelimiter ELSE 0 END,

@end=[end]=CHARINDEX(@delimiter,definition+@delimiter,@start),

@PreviousName=[name],

TheSubString=SUBSTRING(definition,@start,@end-start) -- Results

[/font]

Best wishes,
Phil Factor

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 5

Phil Factor (5/2/2009)
but I hit problems with the UPDATE ... FROM query, whist using variables. SQL Server Bugs, I'm afraid.

Which bugs?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Factor SSC-Insane Points: 20254 More actions · Answer 6

They were rather interesting. I was when doing a 'quirky update' from a join with a temporary table and a table variable. It worked fine on the simple test harness with a small amount of data but failed when using a larger table. I probably still have the code. It was very odd. I sometimes come across this with the Update with variables. Sometimes, the addition of an empty string works wonders. Don't ask why! This time, I couldn't get it to work reliably (it worked fine with up to fifty stored procedures. Then it refused to update the table at all on 51 or more stored procedures.)

Best wishes,
Phil Factor

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 7

Phil Factor (5/2/2009)
They were rather interesting. I was when doing a 'quirky update' from a join with a temporary table and a table variable. It worked fine on the simple test harness with a small amount of data but failed when using a larger table. I probably still have the code. It was very odd. I sometimes come across this with the Update with variables. Sometimes, the addition of an empty string works wonders. Don't ask why! This time, I couldn't get it to work reliably (it worked fine with up to fifty stored procedures. Then it refused to update the table at all on 51 or more stored procedures.)

I can see where the joins on a quirky update may cause a problem with the quirky update... but what does that have to with the code you posted?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 8

Jeff Moden (5/1/2009)
Flo actually used a permutation of Itzik's method. It will quite handily build a billion row table without driving the log file nuts like most every other method will.

Hey Jeff,

Thanks for the test rig - I had fun playing around with it!

I used to use the odd-looking UNION/UNION ALL version, but usually ended up using the ROWNUMBER over master tables one because it was easier.

I didn't realize it was faster too...!

These days, none of them get a look-in because a fixed Numbers/Tally table is even more convenient.

Back to the test rig though:

It really is a shame about the compilation time on the pure UNION ALL method, because firing all the numbers out from a single constant scan operator would be pretty much as fast as it gets, I would think. The best I could manage (still with a shocking compilation time of over a minute!) follows.

I was disappointed that USE PLAN doesn't work on constant scans. 🙁

I guess KEEPFIXED PLAN would be the way - though it doesn't help in the test rig because you free the proc cache!

Anyway, here's my effort:

[font="Courier New"]DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

PRINT '===== Pure UNION ALL Method'

SET STATISTICS IO ON

SET STATISTICS TIME ON

; WITHN0 (N) AS (SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1)),-- 4 rows

N1 (N) AS (SELECT N FROM N0 UNION ALL SELECT N FROM N0),-- 8

N2 (N) AS (SELECT N FROM N1 UNION ALL SELECT N FROM N1),-- 16

N3 (N) AS (SELECT N FROM N2 UNION ALL SELECT N FROM N2),-- 32

N4 (N) AS (SELECT N FROM N3 UNION ALL SELECT N FROM N3),-- 64

N5 (N) AS (SELECT N FROM N4 UNION ALL SELECT N FROM N4),-- 128

N6 (N) AS (SELECT N FROM N5 UNION ALL SELECT N FROM N5),-- 256

N7 (N) AS (SELECT N FROM N6 UNION ALL SELECT N FROM N6),-- 512

N8 (N) AS (SELECT N FROM N7 UNION ALL SELECT N FROM N7),-- 1024

N9 (N) AS (SELECT N FROM N8 UNION ALL SELECT N FROM N8),-- 2048

NA (N) AS (SELECT N FROM N9 UNION ALL SELECT N FROM N9),-- 4096

NB (N) AS (SELECT N FROM NA UNION ALL SELECT N FROM NA),-- 8192

Numbers (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM NB),

T1 (N) AS (SELECT TOP (9999999999) N FROM Numbers ORDER BY N % 1),

T2 (N) AS (SELECT N FROM Numbers)

SELECTTOP (@TestSize) @Bitbucket = T1.N

FROMT1,T1 AS T2 -- 67,108,864

OPTION (MAXDOP 1);

SET STATISTICS TIME OFF

SET STATISTICS IO OFF[/font]

...and these are the results on my machine:

[font="Courier New"]===== Matt Miller's Method =====

CPU time = 453 ms, elapsed time = 456 ms.

===== Itzek's Method =====

CPU time = 453 ms, elapsed time = 539 ms.

===== Jeff Moden's Method

CPU time = 313 ms, elapsed time = 546 ms.

===== RBarryYoung's Method

SQL Server Execution Times:

CPU time = 344 ms, elapsed time = 455 ms.

===== Combined Method

CPU time = 313 ms, elapsed time = 427 ms.

===== Pure UNION ALL Method

CPU time = 157 ms, elapsed time = 178 ms.[/font]

...as I say, it's a shame about the compilation time.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 9

Jeff Moden (5/1/2009)
Actually, it's a "white space" problem.
[font="Courier New"]DECLARE @CrLf NVARCHAR(MAX)
SET @CrLf = 'X '+CHAR(13)+CHAR(10)
SELECT LEN(@CrLf),
LEN(REPLACE(@CrLf,CHAR(13)+CHAR(10),''))[/font]
Any sproc that has spaces just before the last CrLf will have the problem because when the trailing CrLf goes away, those spaces become trailing spaces that LEN will not find.

Jeff,

Well spotted! I must admit I looked at it and gave up after a few minutes...

I guess replacing LEN with DATALENGTH might fix it up too?

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 10

Paul White (5/2/2009)
Well spotted! I must admit I looked at it and gave up after a few minutes...
I guess replacing LEN with DATALENGTH might fix it up too?
Paul

Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 11

Jeff Moden (5/2/2009)
Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.

Yes, so you did - sorry!

This is what I get for replying in post order as I catch up on this thread 🙁

I thought the 2 in Phil's code was for the unicode thing - but it's not is it?

There are 2 characters!! Silly me 😉

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 12

Paul White (5/2/2009)
Jeff Moden (5/2/2009)
Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.
Yes, so you did - sorry!
This is what I get for replying in post order as I catch up on this thread 🙁
I thought the 2 in Phil's code was for the unicode thing - but it's not is it?
There are 2 characters!! Silly me 😉

Heh... no worries, Paul. I've done the same thing especially on threads with a bazillion posts. Just too busy to go back a catch up to try to put all of the pieces back together.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 13

Paul White (5/2/2009)
...as I say, it's a shame about the compilation time.

Heh... yeah, I agree... for a million rows, it's a shame. For more common numbers like 8000, all the other methods beat the UNION ALL method for CPU usage even after the DBCC commands are removed and the scripts are allowed to cash. At least on my box they do. On faster boxes, like yours, you would probably see zero's across the board for cpu time. After that, you'd have to consider I/O and whether it was cached in memory or not and whether or not it was for reuse by GUI code or not.

As a side bar and for the record (I think it was in the Tally table article I wrote), I did state that some of these CTE methods would actually beat the all too convenient Tally table method. The cursor zealots would love to hear me say that at the speeds that all of these things run at (including the Tally table), that you should go for whatever gives the greatest ease of programming and readability. While there's certainly something to be said about that, I'll still maintain the position that knocking 150 milliseconds off a 450 millisecond run and maybe knocking I/O to 0 is a good thing to go after even if the code becomes a bit more complex because you just can't tell where something is going to be used in the future.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 14

Jeff,

For common numbers like 8000 it's even sillier - the version I posted was a hack-around to get past the fact that the original fails to compile at all over 16384 rows with a message stating that 'there is insufficient system memory to run this query' - like, wow! That's just an estimated execution plan too!

I haven't met a cursor zealot for a while now - which is sad because I would love to have the debate with one. Anyone who thinks that cursor solutions are somehow more natural or easier to understand than the set-based tally has spent too long with procedural code ;c)

BTW my laptop is not a very speed machine - Pentium 4m (one 'core'!) @2GHz! It does have 2GB RAM though which is ok.

I have to start SQL Server with the -P hack just to simulate parallel plans though...

Anyway, here's the memory-busting version, seeing as I referred to it:

[font="Courier New"]; WITHN0 (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),-- 4 rows