March 21, 2011 at 7:49 am
I have a performance problem with an inline table function I use to split any string into numerous records of same length.
Here is script of the function:
alter function util._MC_split_string_by_length
(@inStringvarchar(max),
@inLengthint)
returns table
return
with recs as
(select cast(datalength(@inString) / @inlength as int) + 1 dol), //calculate the number of rows to return to prevent empty records
cnt as
(select
row_number() over (order by (select 1)) rn // some tally table
from
syscolumns),
aa as
(select
@inString zapis)
select
substring(zapis, ((cnt.rn-1) * @inLength) + 1, @inLength) Token
from
aa
inner join cnt on 1 = 1
inner join recs on 1 = 1
where
cnt.rn <= recs.dol
go
-- testing examples
--example No 1
declare @aa table (polje varchar(max))
insert into @aa select replicate(cast(',uu' as varchar(max)),100000)
select gg.*
from
@aa aa
cross apply util._MC_split_string_by_length(aa.[polje], 17) gg
-- elapsed time < 1s
example No 2
create table #aa(polje varchar(max))
insert into #aa select replicate(cast(',uu' as varchar(max)),100000)
select gg.*
from
#aa aa
cross apply util._MC_split_string_by_length(aa.[polje], 17) gg
drop table #aa
-- elapsed time < 1s
example No 3
declare @aa varchar(max)
set @aa = replicate(cast(',uu' as varchar(max)),100000)
select * from util._MC_split_string_by_length(@aa, 17) gg
-- elapsed time < 1s
example No 4
select * from util._MC_split_string_by_length( replicate(cast(',uu' as varchar(max)),100000),17)
-- elapsed time > 10 min
Can anyone tell me why the last example takes so long?
March 24, 2011 at 9:29 am
I've tried it out locally, and get about the same time for all of them (< 1s)
The last 2 have identical query plans apart from the variable declaration and identical costs. What sort of query plans do you get for them?
March 25, 2011 at 1:33 am
Thy 4 ur reply.
I get the same execution plans for last two executions. What bothers me is the "segment" operation; it displays cost 493815% ??
Anyhow, the performance is still poor.
But now I seem to have another problem.
In case of large amount of data sent to the function the tally table doesn't satisfy the criteria anymore. Number of rows in sys.columns table is app. 40K.
In order to be able to return more than 40K rows I use a cartesian product on sys.columns table.
I also did some other modification, tried them out; here is the script
"ALTER function [util].[_MC_split_string_by_length]
(@inStringvarchar(max),
@inLengthint)
returns table
return
with recs as
(select cast(datalength(@inString) / @inlength as int) + 1 dol),
/*
Nbrs_4( nbr ) AS ( SELECT 1 UNION SELECT 0 ),
Nbrs_3( nbr ) AS ( SELECT 1 FROM Nbrs_4 n1 CROSS JOIN Nbrs_4 n2 ),
Nbrs_2( nbr ) AS ( SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2 ),
Nbrs_1( nbr ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 ),
Nbrs_0( nbr ) AS ( SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2 ),
Nbrs ( nbr ) AS ( SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2 ),
cnt as(SELECT nbr FROM ( SELECT ROW_NUMBER() OVER (ORDER BY nbr) FROM Nbrs) D ( nbr )),
*/
cnt as
(select
row_number() over (order by (select 1)) rn
from
syscolumns a, syscolumns b), //major change now covers up to 1161855396 records
aa as
(select
@inString zapis),
tally as
(select
cnt.rn
from
cnt
inner join recs on cnt.rn <= recs.dol) //?? seems it doesn't work when used via cross apply
select
substring(zapis, ((tally.rn-1) * @inLength) + 1, @inLength) Token
--substring(zapis, ((cnt.nbr-1) * @inLength) + 1, @inLength) Token
/*substring(@inString, ((cnt.rn-1) * @inLength) + 1, @inLength) Token*/
from
aa
inner join tally on 1 = 1"
running the following script:
"declare @aa table (polje varchar(max))
insert into @aa select replicate(cast(',uu' as varchar(max)),100000)
select gg.*
from
@aa aa
cross apply util._MC_split_string_by_length(aa.[polje], 17) gg"
results in output list to be somehow halted at row 17593 and running. Same happens when using a # table.
runnint the script :
"declare @aa varchar(max)
set @aa = replicate(cast(',uu' as varchar(max)),1000000)
select * from util._MC_split_string_by_length(@aa, 17) gg"
works perfectly. Returns ~176K records in few seconds.
I ran out of ideas. I've tried just about everything.
Maybe the tally table within the function scans the entire range of rows, produced via the cartesian product in cases, where cross apply is used to access the function result.
I really don't know.
I'd absolutely appreciate any suggestion.
March 26, 2011 at 1:04 am
I have to ask, why are you segmenting on a fixed length instead of by the position of the delimiters? What is the purpose?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 26, 2011 at 2:00 am
Hello Jeff and thanks for your reply.
The story is simple:
I want to re-code an existing function whics contains a while loop.
I don't have any delimiter whatsoever; the only think I know is the length.
And I want to split the string into records of same length which would later be proccessed by some stored procedure.
Recently, loops and cursors is something I avoid as much as possible. If possible I re-code an existing proc/function to get rid of both loops and cursors.
All that thank to SQLServercentral.com articles and posts (your's as well) I follow on a daily basis.
Still my problem remains. Do you have any idea?
Thanks in advance,
Marko
March 26, 2011 at 1:29 pm
Hi Marko,
Yep... I certainly understand and fully agree with the idea of trading in Cursors and While loops for something a lot more effecient. I just didn't understand why or how each Token was exactly the same length.
There are several problems you're likely fighting...
The first problem may be how many rows you have in the SysColumns table of the current DB. You say in your comments that the Cross Join of SysColumns with itself produces 1,161,855,396 rows but that would mean that your SysColumns table would have to have 34,086 rows in it. If you do a SELECT COUNT(*) FROM SysColumns in the current database, what number do you actually get?
Another problem you may be fighting with the CTE method of making a "Tally Table" is that the Joins and Where clauses you've constructed are mostly "non-SARGable" meaning, in this case, that ALL the numbers in the CTE have to be rendered before it can limit the numbers according to the criteria.Of course, that makes for some very slow code if the "Tally Table" CTE generates numbers large enough to handle all eventualities. I suspect that's why you commented that code out of your function. What we need to do is limit that number right up front which you'll see I did with a well placed TOP clause in the code that follows.
Last but not least, let's keep it all very simple. You jumped through some extaordinary code "hoops" trying to resolve the problem when what we really needed to do was simply start the "Tally Table" CTE at 0 to greatly simplify the necessary calculations. The following code has that nuance in it, as well.
Last but not least, I didn't want to build a UTIL schema in my server. Instead, I used "dbo" as the schema name in all the code which follows. You'll need to replace "dbo." with "UTIL." everywhere before you can use it for your purposes although it'll work just fine as "dbo." for testing in an area such as TempDB. I also added code to veryify the number of Tokens returned and to identify the Token "number" which, of course, you can remove if you wish.
It's my habit to document within the code but if you find the explanations above or in the code lacking, please feel free to post back.
First, here's the function I've tested using your test code:
--DROP FUNCTION dbo.[_MC_split_string_by_length]
--;
GO
CREATE FUNCTION dbo.[_MC_split_string_by_length]
(
@inString VARCHAR(MAX),
@inLength INT
)
RETURNS TABLE
RETURN
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000
E16(N) AS (SELECT 1 FROM E8 a, E8 b), --10,000,000,000,000,000
cteTally(N) AS (
SELECT 0 --Makes for a "zero" based "Tally Table"
UNION ALL
SELECT TOP (DATALENGTH(@inString)-1) ROW_NUMBER() OVER (ORDER BY N) FROM E16
)
SELECT TokenNumber = ROW_NUMBER() OVER (ORDER BY t.N),
Token = SUBSTRING(@inString, t.N+1, @inLength)
FROM cteTally t
WHERE t.N % @inLength = 0
GO
Here's the test code I used. It's your code with a couple of modifications to double check the number of Tokens that should be returned according to the desired width of the Tokens. I believe your problem may be solved now. Please post back if it isn't.
--===== Suppress the auto-display of rowcounts for performance and
-- to prevent "false" result sets to a GUI if one is being used.
SET NOCOUNT ON
;
--===== Declare the desired Token width
DECLARE @TokenWidth INT
;
SELECT @TokenWidth = 17
;
--===== Create the test table as a table variable
DECLARE @aa TABLE (polje VARCHAR(MAX))
;
--===== Populate one row with 100,000 copies of 3 characters for
-- a total of 300,000 characters
INSERT INTO @aa SELECT REPLICATE(CAST(',uu' AS VARCHAR(MAX)),100000)
;
--===== Calculate and display the expected number of Tokens to be returned.
-- The addition of the 0.0 turns it all into something other than an INT calculation
SELECT ExpectedNumberOfTokens = CEILING((DATALENGTH(polje)+0.0)/@TokenWidth)
FROM @aa
;
--===== Do the slicing according to the desired Token width
SELECT gg.*
FROM @aa aa
CROSS APPLY dbo._MC_split_string_by_length(aa.polje, @TokenWidth) gg
;
{Edit was to correct a spelling error}
--Jeff Moden
Change is inevitable... Change for the better is not.
March 28, 2011 at 3:12 am
Hi Jeff
First I'd like to thank you for the given solution. I tried it out in every way I could think of and it now works perfectly.
You say: »Where clauses you've constructed are mostly "non-SARGable" meaning, in this case, that ALL the numbers in the CTE have to be rendered before it can limit the numbers according to the criteria.« and it puts my general understanding of how things run around here into a-bit different perspective.
I did somehow know the problem lied in a tally table but I didn't know why.
Thanks again.
Best regards
Marko
March 29, 2011 at 7:13 pm
Hi Marko,
Thank you very much for taking the time to post the feedback you did. And, it was my pleasure to help. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply