May 1, 2009 at 8:04 pm
See what I mean? This works...
[font="Courier New"] SELECT [Name],LenCount,LenTally,REPLACE(definition,CHAR(13)+CHAR(10),CHAR(1))
FROM (
SELECT TOP(200) o.name, m.definition,
(DATALENGTH(m.Definition)-DATALENGTH(REPLACE(m.Definition,(CHAR(13)+CHAR(10)),'')))/4 AS LenCount,
(
SELECT COUNT(*) AS DoDah
FROM Tally
WHERE N <= LEN(definition)
AND SUBSTRING(definition,N,2) = CHAR(13)+CHAR(10)
)AS LenTally
FROM MASTER.sys.all_objects o
JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID
WHERE TYPE = 'P'
)d
WHERE LenCount LenTally[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
May 1, 2009 at 8:43 pm
Actually, it's a "white space" problem.
[font="Courier New"]DECLARE @CrLf NVARCHAR(MAX)
SET @CrLf = 'X '+CHAR(13)+CHAR(10)
SELECT LEN(@CrLf),
LEN(REPLACE(@CrLf,CHAR(13)+CHAR(10),''))[/font]
Any sproc that has spaces just before the last CrLf will have the problem because when the trailing CrLf goes away, those spaces become trailing spaces that LEN will not find.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2009 at 1:54 am
Jeff, you're absolutely right. Oh dear, I should have realised that, but I stupidly assumed that the two would be the same.
This proves the point
[font="Courier New"]SELECT [name] FROM
(
SELECT TOP(200) o.name, m.definition
FROM MASTER.sys.all_objects o
JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID
WHERE TYPE = 'P'
) f
WHERE --get the count using the Replace trick
(LEN(REPLACE(definition,' ','|'))- LEN(REPLACE(REPLACE(definition,' ','|'),'
','')))/2
<> --use a tally table
(SELECT COUNT(*) FROM (SELECT SUBSTRING(definition,number,2)AS 'ch' FROM numbers WHERE number<=LEN(definition)) f
WHERE ch='
')
[/font]
Which produces the correct result. We can then use a much neater way of counting the no. of lines.
[font="Courier New"]SELECT [name] FROM
(
SELECT TOP(200) o.name, m.definition
FROM MASTER.sys.all_objects o
JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID
WHERE TYPE = 'P'
) f
WHERE --get the count using the Replace trick
LEN(definition)- LEN(REPLACE(definition, '
','|'))
<> --use a tally table
(SELECT COUNT(*) FROM (SELECT SUBSTRING(definition,number,2)AS 'ch' FROM numbers WHERE number<=LEN(definition)) f
WHERE ch='
')
[/font]
Best wishes,
Phil Factor
May 2, 2009 at 5:47 am
This is the solution to the original problem, based on the Update method. It actually uses a tally table too. Because it has to do a lot of string lifting, it is slower than the other two methods, but I thought I'd put it in just for the sake of completeness. I feel sure that, with a bit of optimisation, it could be as fast as the pure tally, but I hit problems with the UPDATE ... FROM query, whist using variables. SQL Server Bugs, I'm afraid. I backed off.
[font="Courier New"]DECLARE @Delimiter VARCHAR(10)
SELECT @Delimiter='
'
DECLARE @insertTable TABLE
(
insertTable_id INT IDENTITY(1,1) PRIMARY KEY,
[name] NVARCHAR(128),
[start] INT DEFAULT 0,
[end] INT DEFAULT 0,
TheSubString VARCHAR(MAX) DEFAULT '',
Definition VARCHAR(MAX)
)
INSERT INTO @insertTable (name,definition)
SELECT name,definition FROM @source
INNER JOIN @tally ON n<=
(LEN(definition)
-LEN(REPLACE(definition,@Delimiter,'|')))/(LEN(@Delimiter)-1)+1
DECLARE @Start INT, @End INT,@LenDelimiter INT
DECLARE @previousname VARCHAR(80),@dummy VARCHAR(MAX)
SELECT @previousname='', @LenDelimiter=LEN(@Delimiter)
UPDATE @insertTable
SET
@start=start=CASE WHEN [name]=@previousname THEN @end+@LenDelimiter ELSE 0 END,
@end=[end]=CHARINDEX(@delimiter,definition+@delimiter,@start),
@PreviousName=[name],
TheSubString=SUBSTRING(definition,@start,@end-start) -- Results
[/font]
Best wishes,
Phil Factor
May 2, 2009 at 11:15 am
Phil Factor (5/2/2009)
but I hit problems with the UPDATE ... FROM query, whist using variables. SQL Server Bugs, I'm afraid.
Which bugs?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2009 at 3:02 pm
They were rather interesting. I was when doing a 'quirky update' from a join with a temporary table and a table variable. It worked fine on the simple test harness with a small amount of data but failed when using a larger table. I probably still have the code. It was very odd. I sometimes come across this with the Update with variables. Sometimes, the addition of an empty string works wonders. Don't ask why! This time, I couldn't get it to work reliably (it worked fine with up to fifty stored procedures. Then it refused to update the table at all on 51 or more stored procedures.)
Best wishes,
Phil Factor
May 2, 2009 at 5:00 pm
Phil Factor (5/2/2009)
They were rather interesting. I was when doing a 'quirky update' from a join with a temporary table and a table variable. It worked fine on the simple test harness with a small amount of data but failed when using a larger table. I probably still have the code. It was very odd. I sometimes come across this with the Update with variables. Sometimes, the addition of an empty string works wonders. Don't ask why! This time, I couldn't get it to work reliably (it worked fine with up to fifty stored procedures. Then it refused to update the table at all on 51 or more stored procedures.)
I can see where the joins on a quirky update may cause a problem with the quirky update... but what does that have to with the code you posted?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2009 at 5:26 pm
Jeff Moden (5/1/2009)
Flo actually used a permutation of Itzik's method. It will quite handily build a billion row table without driving the log file nuts like most every other method will.
Hey Jeff,
Thanks for the test rig - I had fun playing around with it!
I used to use the odd-looking UNION/UNION ALL version, but usually ended up using the ROWNUMBER over master tables one because it was easier.
I didn't realize it was faster too...!
These days, none of them get a look-in because a fixed Numbers/Tally table is even more convenient.
Back to the test rig though:
It really is a shame about the compilation time on the pure UNION ALL method, because firing all the numbers out from a single constant scan operator would be pretty much as fast as it gets, I would think. The best I could manage (still with a shocking compilation time of over a minute!) follows.
I was disappointed that USE PLAN doesn't work on constant scans. 🙁
I guess KEEPFIXED PLAN would be the way - though it doesn't help in the test rig because you free the proc cache!
Anyway, here's my effort:
[font="Courier New"]DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
PRINT '===== Pure UNION ALL Method'
SET STATISTICS IO ON
SET STATISTICS TIME ON
; WITHN0 (N) AS (SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1)),-- 4 rows
N1 (N) AS (SELECT N FROM N0 UNION ALL SELECT N FROM N0),-- 8
N2 (N) AS (SELECT N FROM N1 UNION ALL SELECT N FROM N1),-- 16
N3 (N) AS (SELECT N FROM N2 UNION ALL SELECT N FROM N2),-- 32
N4 (N) AS (SELECT N FROM N3 UNION ALL SELECT N FROM N3),-- 64
N5 (N) AS (SELECT N FROM N4 UNION ALL SELECT N FROM N4),-- 128
N6 (N) AS (SELECT N FROM N5 UNION ALL SELECT N FROM N5),-- 256
N7 (N) AS (SELECT N FROM N6 UNION ALL SELECT N FROM N6),-- 512
N8 (N) AS (SELECT N FROM N7 UNION ALL SELECT N FROM N7),-- 1024
N9 (N) AS (SELECT N FROM N8 UNION ALL SELECT N FROM N8),-- 2048
NA (N) AS (SELECT N FROM N9 UNION ALL SELECT N FROM N9),-- 4096
NB (N) AS (SELECT N FROM NA UNION ALL SELECT N FROM NA),-- 8192
Numbers (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM NB),
T1 (N) AS (SELECT TOP (9999999999) N FROM Numbers ORDER BY N % 1),
T2 (N) AS (SELECT N FROM Numbers)
SELECTTOP (@TestSize) @Bitbucket = T1.N
FROMT1,T1 AS T2 -- 67,108,864
OPTION (MAXDOP 1);
SET STATISTICS TIME OFF
SET STATISTICS IO OFF[/font]
...and these are the results on my machine:
[font="Courier New"]===== Matt Miller's Method =====
CPU time = 453 ms, elapsed time = 456 ms.
===== Itzek's Method =====
CPU time = 453 ms, elapsed time = 539 ms.
===== Jeff Moden's Method
CPU time = 313 ms, elapsed time = 546 ms.
===== RBarryYoung's Method
SQL Server Execution Times:
CPU time = 344 ms, elapsed time = 455 ms.
===== Combined Method
CPU time = 313 ms, elapsed time = 427 ms.
===== Pure UNION ALL Method
CPU time = 157 ms, elapsed time = 178 ms.[/font]
...as I say, it's a shame about the compilation time.
Paul
May 2, 2009 at 5:28 pm
Jeff Moden (5/1/2009)
Actually, it's a "white space" problem.[font="Courier New"]DECLARE @CrLf NVARCHAR(MAX)
SET @CrLf = 'X '+CHAR(13)+CHAR(10)
SELECT LEN(@CrLf),
LEN(REPLACE(@CrLf,CHAR(13)+CHAR(10),''))[/font]
Any sproc that has spaces just before the last CrLf will have the problem because when the trailing CrLf goes away, those spaces become trailing spaces that LEN will not find.
Jeff,
Well spotted! I must admit I looked at it and gave up after a few minutes...
I guess replacing LEN with DATALENGTH might fix it up too?
Paul
May 2, 2009 at 5:36 pm
Paul White (5/2/2009)
Well spotted! I must admit I looked at it and gave up after a few minutes...I guess replacing LEN with DATALENGTH might fix it up too?
Paul
Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2009 at 5:47 pm
Jeff Moden (5/2/2009)
Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.
Yes, so you did - sorry!
This is what I get for replying in post order as I catch up on this thread 🙁
I thought the 2 in Phil's code was for the unicode thing - but it's not is it?
There are 2 characters!! Silly me 😉
May 2, 2009 at 5:56 pm
Paul White (5/2/2009)
Jeff Moden (5/2/2009)
Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.Yes, so you did - sorry!
This is what I get for replying in post order as I catch up on this thread 🙁
I thought the 2 in Phil's code was for the unicode thing - but it's not is it?
There are 2 characters!! Silly me 😉
Heh... no worries, Paul. I've done the same thing especially on threads with a bazillion posts. Just too busy to go back a catch up to try to put all of the pieces back together.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2009 at 7:01 pm
Paul White (5/2/2009)
...as I say, it's a shame about the compilation time.
Heh... yeah, I agree... for a million rows, it's a shame. For more common numbers like 8000, all the other methods beat the UNION ALL method for CPU usage even after the DBCC commands are removed and the scripts are allowed to cash. At least on my box they do. On faster boxes, like yours, you would probably see zero's across the board for cpu time. After that, you'd have to consider I/O and whether it was cached in memory or not and whether or not it was for reuse by GUI code or not.
As a side bar and for the record (I think it was in the Tally table article I wrote), I did state that some of these CTE methods would actually beat the all too convenient Tally table method. The cursor zealots would love to hear me say that at the speeds that all of these things run at (including the Tally table), that you should go for whatever gives the greatest ease of programming and readability. While there's certainly something to be said about that, I'll still maintain the position that knocking 150 milliseconds off a 450 millisecond run and maybe knocking I/O to 0 is a good thing to go after even if the code becomes a bit more complex because you just can't tell where something is going to be used in the future.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2009 at 7:55 pm
Jeff,
For common numbers like 8000 it's even sillier - the version I posted was a hack-around to get past the fact that the original fails to compile at all over 16384 rows with a message stating that 'there is insufficient system memory to run this query' - like, wow! That's just an estimated execution plan too!
I haven't met a cursor zealot for a while now - which is sad because I would love to have the debate with one. Anyone who thinks that cursor solutions are somehow more natural or easier to understand than the set-based tally has spent too long with procedural code ;c)
BTW my laptop is not a very speed machine - Pentium 4m (one 'core'!) @2GHz! It does have 2GB RAM though which is ok.
I have to start SQL Server with the -P hack just to simulate parallel plans though...
Anyway, here's the memory-busting version, seeing as I referred to it:
[font="Courier New"]; WITHN0 (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),-- 4 rows
N1 (N) AS (SELECT N FROM N0 UNION ALL SELECT N FROM N0),-- 8
N2 (N) AS (SELECT N FROM N1 UNION ALL SELECT N FROM N1),-- 16
N3 (N) AS (SELECT N FROM N2 UNION ALL SELECT N FROM N2),-- 32
N4 (N) AS (SELECT N FROM N3 UNION ALL SELECT N FROM N3),-- 64
N5 (N) AS (SELECT N FROM N4 UNION ALL SELECT N FROM N4),-- 128
N6 (N) AS (SELECT N FROM N5 UNION ALL SELECT N FROM N5),-- 256
N7 (N) AS (SELECT N FROM N6 UNION ALL SELECT N FROM N6),-- 512
N8 (N) AS (SELECT N FROM N7 UNION ALL SELECT N FROM N7),-- 1024
N9 (N) AS (SELECT N FROM N8 UNION ALL SELECT N FROM N8),-- 2048
NA (N) AS (SELECT N FROM N9 UNION ALL SELECT N FROM N9),-- 4096
NB (N) AS (SELECT N FROM NA UNION ALL SELECT N FROM NA),-- 8192
NC (N) AS (SELECT N FROM NB UNION ALL SELECT N FROM NB),-- 16384
Numbers (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM NC)
SELECTTOP (@TestSize)
@Bitbucket = N
FROMNC
OPTION(KEEPFIXED PLAN, MAXDOP 1);
[/font]
Just how much dev effort is involved in fixing the colour tags anyway?!
Cheers,
Paul
May 4, 2009 at 2:56 am
Hi gents,
I have been following the discussion, half and from a large distance, so I ain't really in the loop...so to speak ;).
I was wondering how this tally function that I been using for a while now does stack up against the other methods being tested here. If it is shockingly different (slower) I sure will spend some time to see why and then improve upon it from there.
-- Tally table function (24 bit, large enaugh for general purpose use)
--
create function dbo.tfnTally24bit( @max-2 int ) returns table
as
return
(
with
nQ( N ) as
(
select 0 union all select 0 union all select 0 union all select 0 union all -- 4
select 0 union all select 0 union all select 0 union all select 0 union all -- 8
select 0 union all select 0 union all select 0 union all select 0 union all -- 12
select 0 union all select 0 union all select 0 union all select 0 -- 16
)
select top ( isnull( @max-2, 0 ) )
row_number() over ( order by anchor.constant ) as n
from
( select 0 as constant ) as anchor
cross join nQ n1 -- 16 ( 4 bit)
cross join nQ n2 -- 256 ( 8 bit)
cross join nQ n3 -- 4096 (12 bit)
cross join nQ n4 -- 65536 (16 bit)
cross join nQ n5 -- 1048576 (20 bit)
cross join nQ n6 -- 16777216 (24 bit)
)
;
go
-- Test the inline "dbo.tfnTally24bit" function
--
select * from dbo.tfnTally24bit( 10 );
go
I saw somone convert the numbers to bits, does that have an effect on memory use when none of these constants are returned in the result?
Viewing 15 posts - 376 through 390 (of 522 total)
You must be logged in to reply to this topic. Login to reply