inline table valued function performance problem

  • I have a performance problem with an inline table function I use to split any string into numerous records of same length.

    Here is script of the function:

    alter function util._MC_split_string_by_length

    (@inStringvarchar(max),

    @inLengthint)

    returns table

    return

    with recs as

    (select cast(datalength(@inString) / @inlength as int) + 1 dol), //calculate the number of rows to return to prevent empty records

    cnt as

    (select

    row_number() over (order by (select 1)) rn // some tally table

    from

    syscolumns),

    aa as

    (select

    @inString zapis)

    select

    substring(zapis, ((cnt.rn-1) * @inLength) + 1, @inLength) Token

    from

    aa

    inner join cnt on 1 = 1

    inner join recs on 1 = 1

    where

    cnt.rn <= recs.dol

    go

    -- testing examples

    --example No 1

    declare @aa table (polje varchar(max))

    insert into @aa select replicate(cast(',uu' as varchar(max)),100000)

    select gg.*

    from

    @aa aa

    cross apply util._MC_split_string_by_length(aa.[polje], 17) gg

    -- elapsed time < 1s

    example No 2

    create table #aa(polje varchar(max))

    insert into #aa select replicate(cast(',uu' as varchar(max)),100000)

    select gg.*

    from

    #aa aa

    cross apply util._MC_split_string_by_length(aa.[polje], 17) gg

    drop table #aa

    -- elapsed time < 1s

    example No 3

    declare @aa varchar(max)

    set @aa = replicate(cast(',uu' as varchar(max)),100000)

    select * from util._MC_split_string_by_length(@aa, 17) gg

    -- elapsed time < 1s

    example No 4

    select * from util._MC_split_string_by_length( replicate(cast(',uu' as varchar(max)),100000),17)

    -- elapsed time > 10 min

    Can anyone tell me why the last example takes so long?

  • I've tried it out locally, and get about the same time for all of them (< 1s)

    The last 2 have identical query plans apart from the variable declaration and identical costs. What sort of query plans do you get for them?

  • Thy 4 ur reply.

    I get the same execution plans for last two executions. What bothers me is the "segment" operation; it displays cost 493815% ??

    Anyhow, the performance is still poor.

    But now I seem to have another problem.

    In case of large amount of data sent to the function the tally table doesn't satisfy the criteria anymore. Number of rows in sys.columns table is app. 40K.

    In order to be able to return more than 40K rows I use a cartesian product on sys.columns table.

    I also did some other modification, tried them out; here is the script

    "ALTER function [util].[_MC_split_string_by_length]

    (@inStringvarchar(max),

    @inLengthint)

    returns table

    return

    with recs as

    (select cast(datalength(@inString) / @inlength as int) + 1 dol),

    /*

    Nbrs_4( nbr ) AS ( SELECT 1 UNION SELECT 0 ),

    Nbrs_3( nbr ) AS ( SELECT 1 FROM Nbrs_4 n1 CROSS JOIN Nbrs_4 n2 ),

    Nbrs_2( nbr ) AS ( SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2 ),

    Nbrs_1( nbr ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 ),

    Nbrs_0( nbr ) AS ( SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2 ),

    Nbrs ( nbr ) AS ( SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2 ),

    cnt as(SELECT nbr FROM ( SELECT ROW_NUMBER() OVER (ORDER BY nbr) FROM Nbrs) D ( nbr )),

    */

    cnt as

    (select

    row_number() over (order by (select 1)) rn

    from

    syscolumns a, syscolumns b), //major change now covers up to 1161855396 records

    aa as

    (select

    @inString zapis),

    tally as

    (select

    cnt.rn

    from

    cnt

    inner join recs on cnt.rn <= recs.dol) //?? seems it doesn't work when used via cross apply

    select

    substring(zapis, ((tally.rn-1) * @inLength) + 1, @inLength) Token

    --substring(zapis, ((cnt.nbr-1) * @inLength) + 1, @inLength) Token

    /*substring(@inString, ((cnt.rn-1) * @inLength) + 1, @inLength) Token*/

    from

    aa

    inner join tally on 1 = 1"

    running the following script:

    "declare @aa table (polje varchar(max))

    insert into @aa select replicate(cast(',uu' as varchar(max)),100000)

    select gg.*

    from

    @aa aa

    cross apply util._MC_split_string_by_length(aa.[polje], 17) gg"

    results in output list to be somehow halted at row 17593 and running. Same happens when using a # table.

    runnint the script :

    "declare @aa varchar(max)

    set @aa = replicate(cast(',uu' as varchar(max)),1000000)

    select * from util._MC_split_string_by_length(@aa, 17) gg"

    works perfectly. Returns ~176K records in few seconds.

    I ran out of ideas. I've tried just about everything.

    Maybe the tally table within the function scans the entire range of rows, produced via the cartesian product in cases, where cross apply is used to access the function result.

    I really don't know.

    I'd absolutely appreciate any suggestion.

  • I have to ask, why are you segmenting on a fixed length instead of by the position of the delimiters? What is the purpose?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello Jeff and thanks for your reply.

    The story is simple:

    I want to re-code an existing function whics contains a while loop.

    I don't have any delimiter whatsoever; the only think I know is the length.

    And I want to split the string into records of same length which would later be proccessed by some stored procedure.

    Recently, loops and cursors is something I avoid as much as possible. If possible I re-code an existing proc/function to get rid of both loops and cursors.

    All that thank to SQLServercentral.com articles and posts (your's as well) I follow on a daily basis.

    Still my problem remains. Do you have any idea?

    Thanks in advance,

    Marko

  • Hi Marko,

    Yep... I certainly understand and fully agree with the idea of trading in Cursors and While loops for something a lot more effecient. I just didn't understand why or how each Token was exactly the same length.

    There are several problems you're likely fighting...

    The first problem may be how many rows you have in the SysColumns table of the current DB. You say in your comments that the Cross Join of SysColumns with itself produces 1,161,855,396 rows but that would mean that your SysColumns table would have to have 34,086 rows in it. If you do a SELECT COUNT(*) FROM SysColumns in the current database, what number do you actually get?

    Another problem you may be fighting with the CTE method of making a "Tally Table" is that the Joins and Where clauses you've constructed are mostly "non-SARGable" meaning, in this case, that ALL the numbers in the CTE have to be rendered before it can limit the numbers according to the criteria.Of course, that makes for some very slow code if the "Tally Table" CTE generates numbers large enough to handle all eventualities. I suspect that's why you commented that code out of your function. What we need to do is limit that number right up front which you'll see I did with a well placed TOP clause in the code that follows.

    Last but not least, let's keep it all very simple. You jumped through some extaordinary code "hoops" trying to resolve the problem when what we really needed to do was simply start the "Tally Table" CTE at 0 to greatly simplify the necessary calculations. The following code has that nuance in it, as well.

    Last but not least, I didn't want to build a UTIL schema in my server. Instead, I used "dbo" as the schema name in all the code which follows. You'll need to replace "dbo." with "UTIL." everywhere before you can use it for your purposes although it'll work just fine as "dbo." for testing in an area such as TempDB. I also added code to veryify the number of Tokens returned and to identify the Token "number" which, of course, you can remove if you wish.

    It's my habit to document within the code but if you find the explanations above or in the code lacking, please feel free to post back.

    First, here's the function I've tested using your test code:

    --DROP FUNCTION dbo.[_MC_split_string_by_length]

    --;

    GO

    CREATE FUNCTION dbo.[_MC_split_string_by_length]

    (

    @inString VARCHAR(MAX),

    @inLength INT

    )

    RETURNS TABLE

    RETURN

    WITH

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000

    E16(N) AS (SELECT 1 FROM E8 a, E8 b), --10,000,000,000,000,000

    cteTally(N) AS (

    SELECT 0 --Makes for a "zero" based "Tally Table"

    UNION ALL

    SELECT TOP (DATALENGTH(@inString)-1) ROW_NUMBER() OVER (ORDER BY N) FROM E16

    )

    SELECT TokenNumber = ROW_NUMBER() OVER (ORDER BY t.N),

    Token = SUBSTRING(@inString, t.N+1, @inLength)

    FROM cteTally t

    WHERE t.N % @inLength = 0

    GO

    Here's the test code I used. It's your code with a couple of modifications to double check the number of Tokens that should be returned according to the desired width of the Tokens. I believe your problem may be solved now. Please post back if it isn't.

    --===== Suppress the auto-display of rowcounts for performance and

    -- to prevent "false" result sets to a GUI if one is being used.

    SET NOCOUNT ON

    ;

    --===== Declare the desired Token width

    DECLARE @TokenWidth INT

    ;

    SELECT @TokenWidth = 17

    ;

    --===== Create the test table as a table variable

    DECLARE @aa TABLE (polje VARCHAR(MAX))

    ;

    --===== Populate one row with 100,000 copies of 3 characters for

    -- a total of 300,000 characters

    INSERT INTO @aa SELECT REPLICATE(CAST(',uu' AS VARCHAR(MAX)),100000)

    ;

    --===== Calculate and display the expected number of Tokens to be returned.

    -- The addition of the 0.0 turns it all into something other than an INT calculation

    SELECT ExpectedNumberOfTokens = CEILING((DATALENGTH(polje)+0.0)/@TokenWidth)

    FROM @aa

    ;

    --===== Do the slicing according to the desired Token width

    SELECT gg.*

    FROM @aa aa

    CROSS APPLY dbo._MC_split_string_by_length(aa.polje, @TokenWidth) gg

    ;

    {Edit was to correct a spelling error}

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    First I'd like to thank you for the given solution. I tried it out in every way I could think of and it now works perfectly.

    You say: »Where clauses you've constructed are mostly "non-SARGable" meaning, in this case, that ALL the numbers in the CTE have to be rendered before it can limit the numbers according to the criteria.« and it puts my general understanding of how things run around here into a-bit different perspective.

    I did somehow know the problem lied in a tally table but I didn't know why.

    Thanks again.

    Best regards

    Marko

  • Hi Marko,

    Thank you very much for taking the time to post the feedback you did. And, it was my pleasure to help. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply