Disappointed with List as table Function

  • I was surprised at the performance issues when it comes to the following UDF I have in place

    ALTER FUNCTION [dbo].[ListAsTable] (

    @pString NVARCHAR(Max),

    @pDelimiter CHAR(1)

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

    SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item

    FROM Common.Tally

    WHERE N < LEN(@pString) + 2

    AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

    AND SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) <> '' --Added for when there is a preceeding , making an empty record

    ;

    We primarily use Guids and when I threw 160 guids at this and had a where clause like the following I received nested loops, and lazy spools (90%) sucking down most of the process.

    WHERE [ItemsID] in (Select CAST(Item as Uniqueidentifier) from ListAsTable(@ItemIDList,','))

    When we changed it to the following

    Insert into variable table:

    INSERT INTO @Temp

    Select CAST(Item as Uniqueidentifier) from ListAsTable(@ItemIDList,',')

    and put it in the where clause it improved performance but still did not optimize it like I thought it would. 22% of the performance was in sorting. 77% of the query was devoted to the table scan and sort

    WHERE [ItemsID] in (Select ID FROM @TEMP)

    When we changed it to the following

    Insert into variable table:

    INSERT INTO @Temp

    Select CAST(Item as Uniqueidentifier) from ListAsTable(@ItemIDList,',')

    We then joined it to the table on the index and the sort was removed. This balanced the query the most to where it was 50% table scan on @temp and 50% on the clustered index.

    INNER JOIN @Temp ON ItemsID = ID

    Is there anyway to improve it beyond this to where most of the work is off the @temp and on the index itself? I still feel 50% is way to high but I have to send in a CSV to the Stored proc. I do not believe a TPV is available for this project.

  • Part of the issue is the explicit conversion after the function to a different datatype, messes with the optimizer.

    Another is the differences between @Tmp and #tmp. @Tmp has statistical and optimization issues.

    I assume you're trying to keep ListToTable generic, but you might look into creating one for UniqueIdentifiers and joining directly on the function, instead of using a mid-step table and a generic function which needs a conversion after the fact.

    After that, if we can see the full primary query and the execution plan, we might be able to help further.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Sorry for the late reply but that did not help the processing out at all. I saw no difference in processing

    We use this method all over. I am wondering if I could actually create a persisting table using the computer name/ids and then have an index on that. Curious if that would solve the workload issue.

    It would be a insert/Select/Delete

  • You may want to search this site for other delimited split functions. In fact, you might find this article[/url] quite interesting. As you are looking at spliting an nvarchar(max) string you may want to look at a CLR based split function.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply