Understanding and Using APPLY (Part 1)

  • SQL Kiwi (1/6/2012)


    reinpost (1/6/2012)


    Just a trivial remark: right at the start it says

    APPLY is named after the process of applying a set of input rows to a table-valued function.

    Surely you mean to say applying a table-valued function to a set of input rows.

    I think it makes sense either way! ๐Ÿ˜Ž

    So do I, but mathematicians and programmers tend to say that functions are applied to things. In fact this new operator was probably named after Lisp's 'apply' , which does the same thing.

  • reinpost (1/6/2012)


    SQL Kiwi (1/6/2012)


    reinpost (1/6/2012)


    Just a trivial remark: right at the start it says

    APPLY is named after the process of applying a set of input rows to a table-valued function.

    Surely you mean to say applying a table-valued function to a set of input rows.

    I think it makes sense either way! ๐Ÿ˜Ž

    So do I, but mathematicians and programmers tend to say that functions are applied to things. In fact this new operator was probably named after Lisp's 'apply' , which does the same thing.

    Even though it is irrelevant... I'll agree that the proper is that the function applies to the rows. I thought about it like a car and paint. When you apply the paint to the car, the properties of the paint do not change the car does. So in this case the function does not change at all, it is applied to the rows of which the properties have changed.

    Now that I said that... Who cares, it still accomplishes the same thing ๐Ÿ˜›

    Jared
    CE - Microsoft

  • A word of caution

    I took the tip about SCHEMABINDING and the optimizer's expanding iTVFs and converted a Multi-statement UDF (referred to from here on as the TF) that splits up to 12MB of delimited data using a Numbers table. Perhaps this is SS2005EE-specific and later versions are smarter, but...

    BAD MOVE!

    I have this (inherited) snippet from an SP that uses CROSS APPLY in the FROM clause and has a WHERE clause that contains DBA.dbo.udf_Split(), which is the TF I converted to an iTVF. The SP happens to be the most important SP in the system - the one that gets a User's permissions to do anything. It runs in under 1 second for over 500 Users using the original TF. However, it takes 15+ minutes to run the same SP using the iTVF for a single User, I consider this a very bad move and will pull the plug on changing this TF to an iTVF:-D

    FROM dbo.t_user_ref ur

    CROSS APPLY dbo.fn_GetPermissionByUser(user_id) AS p

    WHERE GETDATE() BETWEEN eff_date AND term_date

    AND ( @CompanyID IS NULL

    OR ( @CompanyID IS NOT NULL

    AND @CompanyID > 0

    AND company_id = @CompanyID

    )

    )

    AND ( @SiteIDs IS NULL

    OR ( @SiteIDs IS NOT NULL

    AND EXISTS ( SELECT 1

    FROM DBA.dbo.udf_Split( @SiteIDs + ', ',',')

    WHERE ',' + r_site_ids + ',' <> REPLACE(',' + r_site_ids + ',', ',' + [value] + ',', '')

    )

    )

    )

    AND ( @user-id IS NULL

    OR ( @user-id IS NOT NULL

    AND @user-id > 0

    AND user_id = @user-id

    )

    )

    AND ( @UserRole IS NULL

    OR ( @UserRole IS NOT NULL

    AND p.r_user_role LIKE '%' + @UserRole

    )

    )

    The optimizer did indeed expand the iTVF form of the UDF into the main SELECT, but then goodness knows why it decides that 12789618 rows must now be scanned for each User row.

    Be very, very careful which TFs you convert to iTVFs... :ermm:

  • SAinCA (1/6/2012)


    A word of caution

    It is certainly still possible to write highly dubious code using in-line table-valued functions if you really try. There are a few cases where a multi-statement function makes sense; in this case it's because the results of the split are materialized into a table variable once and reused many times that way. Of course, this is something that could easily be done without the msTVF - just split the site IDs into a temporary table first. This can be better than using an msTVF since the 'real' temporary table can have statistics and can be indexed - the hidden msTVF table variable cannot do either of those things!

    So my word of caution would be: if you find a multi-statement TVF performing better, check that you shouldn't be materializing something yourself. Do it properly with an iTVF and a temporary table and you'll get statistics, indexes, and probably better performance.

    Fellow MVP Brad Shultz went into some details about why the multi-statement hidden materialization works here:

    http://bradsruminations.blogspot.com/2010/08/integer-list-splitting-sql-fable.html

  • SAinCA (1/6/2012)


    A word of caution

    I took the tip about SCHEMABINDING and the optimizer's expanding iTVFs and converted a Multi-statement UDF (referred to from here on as the TF) that splits up to 12MB of delimited data using a Numbers table. Perhaps this is SS2005EE-specific and later versions are smarter, but...

    It could be the way the split function is using the Numbers table. I've seen it where the entire table gets scanned because of "reflection" (not an official term but that's what I think of it as) back to the table from the "outside" world cause the table to be scanned many times.

    Would you mind posting your split function so I can take a look, please?

    Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/6/2012)


    Would you mind posting your split function so I can take a look, please?

    Can I ask that you guys start a new thread for this? Feel free to post a link here though.

  • SQL Kiwi (1/7/2012)


    Jeff Moden (1/6/2012)


    Would you mind posting your split function so I can take a look, please?

    Can I ask that you guys start a new thread for this? Feel free to post a link here though.

    Thought the discussion would lend itself to the proper application of CROSS APPLY, but sure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/8/2012)


    Thought the discussion would lend itself to the proper application of CROSS APPLY, but sure.

    Hm, that's actually a good point. As you were, discuss away...:-)

  • As requested, Jeff:

    USE DBA

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[udf_SplitN]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))

    DROP FUNCTION [dbo].[udf_SplitN]

    GO

    CREATE FUNCTION [dbo].[udf_SplitN]

    ( @inString nvarchar(MAX)

    , @delim nvarchar(255)

    )

    RETURNS @arrSplit TABLE

    ( IDX int IDENTITY(0, 1) PRIMARY KEY

    , [value] nvarchar(1024)

    )

    AS

    /* =============================================================================

    ** original idea by Anith Sen, posted on http://www.simple-talk.com at

    ** http://www.simple-talk.com/sql/t-sql-programming/the-helper-table-workbench/

    ** Description: Split a nvarchar string into chunks using a delimiter of up to

    ** 255 characters.

    **

    ** Notes: See the corresponding udf_Split for the varchar version.

    **

    SELECT *

    FROM dbo.udf_Split('one,two,three,four,five,six,seven,eight,nine,ten', ',')

    SELECT *

    FROM dbo.udf_Split('Monday--Tuesday--Wednesday--thursday--friday--saturday--sunday', '--')

    ========================================================================== */

    BEGIN

    INSERT INTO @arrSplit

    ( [value]

    )

    SELECT SUBSTRING(@inString + @delim, number, CHARINDEX(@delim, @inString + @delim, number) - number)

    FROM dbo.Numbers

    WHERE number <= LEN(REPLACE(@inString, N' ', N'`'))

    AND SUBSTRING(@delim + @inString, number, LEN(REPLACE(@delim, N' ', N'`'))) = @delim

    ORDER BY number

    RETURN

    END

    GOThis is used in preference to the function below in cases where 4000 characters is likely to be exceeded, and sometimes by MB. The msTVF works very well at splitting XML files that contain multiple documents - a problem a third-party vendor presented that could not be changed due to their third party software. That's the primary reason for the msTVF's existence in this form.

    When I know the delimited varchar data are short, I'll use the much-discussed and augmented UDF found on SSC (renamed by me to distinguish functionality from my other 2 udf_Split% UDFs):

    USE [DBA]

    GO

    /****** Object: UserDefinedFunction [dbo].[udf_Split8Kv] Script Date: 01/09/2012 09:26:22 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER FUNCTION [dbo].[udf_Split8Kv]

    /***************************************************************************************************

    Purpose: Split a given string at a given delimiter and return a list of the split elements (items).

    Returns: iTVF containing the following:

    ItemNumber = Index Position of the Item as an int, begins at ZERO.

    Item = Element value as a VARCHAR(8000)

    CROSS APPLY Usage Example:

    -----------------------------------------------------------------------------------------------------

    ===== Conditionally drop the test tables to make reruns easier for testing.

    -- (this is NOT a part of the solution)

    IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL

    DROP TABLE #JBMTest;

    --===== Create and populate a test table on the fly (this is NOT a part of the solution).

    SELECT *

    INTO #JBMTest

    FROM ( SELECT 1,'1,10,100,1000,10000,100000,1000000'

    UNION ALL

    SELECT 2,'2000000,200000,20000,2000,200,20,2'

    UNION ALL

    SELECT 3, 'This,is,a,test'

    UNION ALL

    SELECT 4, 'and so is this'

    UNION ALL

    SELECT 5, 'This, too (no pun intended)'

    UNION ALL

    SELECT 6, LEFT(REPLICATE('Thisisanunbrokenstring',400),7999)

    ) d (SomeID,SomeValue);

    GO

    --===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)

    SELECT test.SomeID

    , split.ItemNumber

    , split.Item

    FROM #JBMTest test

    CROSS APPLY ( SELECT ItemNumber

    , Item

    FROM dbo.udf_Split8Kv(test.SomeValue,',')

    ) split;

    ***************************************************************************************************/

    --===== Define I/O parameters

    ( @pString varchar(7999)

    , @pDelimiter char(1)

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    --===== "Inline" CTE Driven "Tally Tableโ€ produces values up to

    -- 10,000... enough to cover VARCHAR(8000)

    WITH E1 ( N )

    AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    )

    , E2 ( N )

    AS ( SELECT 1

    FROM E1 a

    , E1 b

    ) -- 100

    , E4 ( N )

    AS ( SELECT 1

    FROM E2 a

    , E2 b

    ) --10,000

    , cteTally ( N )

    AS ( SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT N ) )

    FROM E4

    )

    --===== Do the split

    SELECT ROW_NUMBER() OVER ( ORDER BY N ) - 1 AS IDX

    , SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Value

    FROM cteTally

    WHERE N < LEN(@pString) + 2

    AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

    GO

  • Excellent article. I liked the graphics, although I have to say I liked the code file even better. I have a few questions but I think I will wait until I have had a chance to read part 2 before expressing any ๐Ÿ˜Ž

    Kenneth

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • SAinCA (1/9/2012)


    As requested, Jeff:

    I thought the problem may have been with the split function you had listed but, looking at the code you provided for the split functions, I don't see a particular problem with them other than they're slower (because of the delimiter concatenation) than the new split function available at the URL coming up. The attachments to the article have been updated, so be sure to get the code from the attachments and not the article itself. The code in the attachments is about 20% faster after someone was good enough to provide yet another optimization.

    Here's the link to the article with the new splitter functions...

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    One of the "problems" (in quotes because it's a human error, not a problem with CROSS APPLY or iTVFs themselves) with iTVFs and CROSS APPLY is a problem that is very similar to a problem with VIEWs. That is, it's "hidden" code or at least "out of site" code AND the code becomes a part of the execution plan as if it had been written as a part of the code. Because of that, you can miss some of the most terrible performance problems due to a very simple oversight... mismatched datatypes which, of course, will cause table and index scans and even full "rewinds" which are worse than mere table scans. That was the same problem that Brad Shultz ran into in the link that Paul previously posted (I never did post back on that thread to show what the problem was). In that case (IIRC... it was a while ago), it was caused by CHARINDEX returning BIGINT instead of just an INT because the first operand of the CHARINDEX Brad used was MAX datatype. The datatype mismatch was a real killer there. Instead of the Numbers table selecting only the numbers it needed, it "rewound" the table 121,316 times and generated a total of 121,317,000 rows internally through a Lazy Table Spool (I have the execution plan but have misplaced the analysis that showed the precise reason, so operating as best I can from memory).

    Because you haven't included any information as to the datatypes you used for variables or any of the columns used in your code example, I can't even begin to guess where the datatype mismatch (or, possibly, accidental many-to-many join) may be. It may be in the splitters or it (more likely) is in the fn_GetPermissionByUser function you actually used in the CROSS APPLY. And, yeah... because of the lack of information about your code, I can only guess but anytime something like what you've described happens with a conversion to an iTVF and a CROSS APPLY, it's usually due to a datatype mismatch. You have to remember... iTVFs and VIEWs are a lot alike... they "become one" with the calling code and, as with any fully inline code, data type matches become rather critical for any decent performance.

    I agree with your "word of caution" but I wouldn't blame the iTVF or the CROSS APPLY. Chances are it's something like a simple datatype mismatch and that's likely what you have to be most cautious about. Why does multi-line and scalar TVFs let you get away with it (datatype mismatches and accidental many-to-many joins)? Because the execution plan doesn't incorporate the plans for such things in the main-stream execution plan.

    Paul is much more of an expert on execution plans than I am. If you were to follow the procedure (see the second link in my signature line at the end of this post) for posting performance problems and opened a new thread for the problem (we've diverged too much from the intent of this thread already), I'm sure that someone, or even Paul himself, would be happy to show you how to correctly incorporate the very high speed methods of combining CROSS APPLY and iTVFs for this particular problem.

    As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...

    WHERE GETDATE() BETWEEN eff_date AND term_date

    AND ( @CompanyID IS NULL

    OR ( @CompanyID IS NOT NULL

    AND @CompanyID > 0

    AND company_id = @CompanyID

    )

    )

    Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...

    WHERE GETDATE() BETWEEN eff_date AND term_date

    AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))

    The fact that you're using BETWEEN to compare dates is a whole other chapter in performing SQL Harakiri but this thread isn't the place for that particular discussion. ๐Ÿ™‚

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/9/2012)


    As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...

    WHERE GETDATE() BETWEEN eff_date AND term_date

    AND ( @CompanyID IS NULL

    OR ( @CompanyID IS NOT NULL

    AND @CompanyID > 0

    AND company_id = @CompanyID

    )

    )

    Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...

    WHERE GETDATE() BETWEEN eff_date AND term_date

    AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))

    I can't see any declarations anywhere in any of the code posted that prevents companyID being negative, and unless there is such a restriction this simplification is broken. It seems likely that there is such a restriction (column names with ID that get compared with 0 are often identity columns with 1 as initial seed) but there isn't certain to be, unless I've missed something.

    Tom

  • Thanks, Jeff, for the complete treatment given to the script.

    Thanks, too, to Paul for uncovering the treasure that is APPLY.

    My only defense is "inherited code; wasn't broken so didn't fix it". I grabbed the newest split code (thanks) and looked at the code for implicit data type conversions. The usage of the split-string within the snippet posted takes a varchar and concatenates strings before comparison against a string, so there are no implicit conversions there.

    The get permissions UDF is unchanged in my tenure apart from reformatting it from unreadable spaghetti. It's a msTVF that could probaby be rewritten as an iTVF using some CTEs and UNION ALL statements, but time doesn't permit that change just yet. Folks no longer with the Company said the underlying design needed complete overhaul and I can wait another day until it no longer supports the business - I have bigger fish to fry. The defensive "company > 0" was already there in case the UI sent garbage (outside my control); as I said, it ain't broke and under 2-second response to get all permissions for all users is acceptable by my Director.

    BTW: Did you have a URL for the reference to the dates-between-harakiri, please? Wouldn't want to bump into misleading posts, as sometimes is the case. The term_date is defaulted to 9999-12-31 23:59:59 and if someone leaves a client company, we usually find out after the fact, so their term_date bereft of time is OK.

    Thanks again for such an in-depth, meaty post on the posted scripts. If I get into performance issues (growth is just beginning in earnest), I'll be sure to post a new thread per your directive.

  • L' Eomot Inversรฉ (1/10/2012)


    Jeff Moden (1/9/2012)


    As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...

    WHERE GETDATE() BETWEEN eff_date AND term_date

    AND ( @CompanyID IS NULL

    OR ( @CompanyID IS NOT NULL

    AND @CompanyID > 0

    AND company_id = @CompanyID

    )

    )

    Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...

    WHERE GETDATE() BETWEEN eff_date AND term_date

    AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))

    I can't see any declarations anywhere in any of the code posted that prevents companyID being negative, and unless there is such a restriction this simplification is broken. It seems likely that there is such a restriction (column names with ID that get compared with 0 are often identity columns with 1 as initial seed) but there isn't certain to be, unless I've missed something.

    Heh... agreed. That's why I very specifically stated...

    ...and assuming that you're not actually using negative CompanyIDs...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/10/2012)


    Heh... agreed. That's why I very specifically stated...

    ...and assuming that you're not actually using negative CompanyIDs...

    Yes, I'm getting really erratic :blush:. I even quoted that statement and still didn't notice it. How erratic can I get?

    Tom

Viewing 15 posts - 76 through 89 (of 89 total)

You must be logged in to reply to this topic. Login to reply