UDF, Computed Column, Persisted, Determinism

  • Hi folks,

    Having a tough go at this, but have learned a lot in the process. Here's my dilemma:

    1) I have a Computed Column that calls a Scalar-Valued UDF

    2) I want this Computed Column to be Persisted, so that it doesn't query the UDF every single time the column is accessed

    3) The UDF is not deterministic (not sure I understand why, but the function is below):

    -- Parameters

    @ParentID int,

    @Keyword nvarchar(255)

    -- Output Variable

    DECLARE @Score int

    SELECT @Score = SUM(k.KeywordCount)

    FROMTopics AS t,

    Keywords AS k

    WHEREk.Keyword = @Keyword

    ANDt.PostID = k.PostID

    ANDt.ParentID = @ParentID

    SET @Score = ISNULL(@Score,0)

    RETURN @Score

    4) I've tried using WITH SCHEMABINDING to make the UDF deterministic, which seemed to work...but then I can't use it in a Computed Column because that apprently doesn't allow user access.

    I've read quite a bit about what defines "deterministic", but I'm still missing something I guess. It would help if I understood why my UDF is not deterministic - so that maybe I can figure out how to make it so (without using SCHEMABINDING).

    Any thoughts would be greatly appreciated. If I'm barking up the wrong tree, I'd love to hear some alternate solutions.

    As well, I can upgrade to SQL 2008 if that helps any...

    Thanks for any advice!

    AL

    Al Bsharah
    Blog / Twitter

  • Why don't you use a view instead?

    And, if you really want to use a UDF, at least use a table-valued function.

    Something like

    CREATE FUNCTION your_function (@ParentID int, @Keyword nvarchar(255))

    RETURNS TABLE

    AS

    RETURN

    (

    SELECT ISNULL(SUM(k.KeywordCount),0) as ReturnValue

    FROM Topics AS t

    INNER JOIN Keywords AS k

    ON k.Keyword = @Keyword

    AND t.PostID = k.PostID

    AND t.ParentID = @ParentID

    );



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi Lutz, thanks for the (quick) reply!

    I'm not sure I follow...please help me understand what I'm missing from your suggestions:

    1) How and where is a View going to help my cause? My UDF queries for a subset of data that depends on other columns in the row where the computation occurs. Creating a view will still require me to put a query into the "Computed Column" field, and that doesn't seem to be allowed (Error: "Subqueries are not allowed in this context")

    2) Is returning a TABLE in my UDF going to help make it Deterministic? Otherwise, what's the benefit when I just need one integer returned?

    Please let me know what I'm missing...thanks for the help!

    AL

    Al Bsharah
    Blog / Twitter

  • lmu92 (4/5/2010)


    ...at least use a table-valued function.

    Teaching opportunity here, Lutz. Tell the op "why" and how you'd use an iTVF in a computed column. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • How many rows in the table you are querying that you use the UDF?

    Also, to learn a bit about scalar and in-line TVF, you might want to read the following blog: Comparing Hardcoded functions, In-Line TVF's, and Scalar Functions.

    You may find it a bit enlightening.

  • Hi folks,

    Lynn:

    At this point I'm querying a table that contains 3,000,000 rows (that will likely have 10x that sooner or later). I'd like to add a Persistent Calculated Column within this table that references my UDF.

    All:

    I appreciate all the input on Scalar vs. Table-Valued functions, I have seen the light. <smile> Nice find, Lynn.

    I still need to figure out why my UDF is not Deterministic (TVF version isn't either)...so that I can persist this calculated column. I feel like I may be trying to do something not possible, I'd just like to understand why if that's the case.

    Thanks again!

    Al Bsharah
    Blog / Twitter

  • Here is the problem, you are only giving us PART of the problem. If you want better answers, give use the whole problem, plus the code you have already developed.

    Please read the first article I reference below in my signature block to see how you should actually post a question. Not mentioned in the article, but also provide the expected results based on the sample data. This should be as a query output, not just descibed in words.

  • Jeff Moden (4/5/2010)


    lmu92 (4/5/2010)


    ...at least use a table-valued function.

    Teaching opportunity here, Lutz. Tell the op "why" and how you'd use an iTVF in a computed column. 😉

    I guess I completely forgot to mention that neither the view nor the iTVF is a replacement of the UDF used so far and I wouldn't use a computed column at all... :blush:

    I obviously made too many assumptions (for instance: that someone would look up iTVF's to see that those are used just like views/tables). Sorry about that.

    Here's what I would do:

    1) evaluate the business case whether it is really required to have that column in that table (seems like violating normalization) with no other option

    Side note: I would consider this as an option: using a CTE to limit the number of rows that are selected (WHERE clause inside the CTE) and apply the function I mentioned earlier to the limited result set.

    2) if that column really needs to be there I would investigate how often that specific column would need to be updated based on changes of the underlying columns (Insert/Update/Delete).

    3) Based on that I may decide to use a trigger after insert/update/delete either on one or both of the tables referenced in the original UDF limited to changes of the columns in question using the UPDATE() clause within the trigger together with the INSERTED logical table provided inside the trigger and the iTVF from above. This trigger then would update the column in question.

    4) I finally would set up a test environment where I could compare the various soultions that are available and test, test, and test again. My testing would include evaluation of the execution plans while applying various common scenarios, index optimization (if required), concurrency testing (if required). The "usual stuff". I'd probably include the UDF in the testing scenario to see how the alternatives perform.

    Once I'd be done with those steps I'd discuss the results with the people in charge.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Al B (4/5/2010)


    Any thoughts would be greatly appreciated. If I'm barking up the wrong tree, I'd love to hear some alternate solutions.

    The problem here is not one of being deterministic or not: current versions (including 2008) of SQL Server do not support persisted columns that reference a UDF that does data access. The reason is that SQL Server has no way to keep the persisted value up-to-date.

    The alternative here is to use the mechanism that does guarantee that updated values will be reflected in the persisted value - and that solution is an indexed view.

  • Paul White NZ (4/6/2010)


    Al B (4/5/2010)


    Any thoughts would be greatly appreciated. If I'm barking up the wrong tree, I'd love to hear some alternate solutions.

    The problem here is not one of being deterministic or not: current versions (including 2008) of SQL Server do not support persisted columns that reference a UDF that does data access. The reason is that SQL Server has no way to keep the persisted value up-to-date.

    The alternative here is to use the mechanism that does guarantee that updated values will be reflected in the persisted value - and that solution is an indexed view.

    I'd like to agree but it seems like that's not going to work here, since the OP used the ISNULL function to get a non-NULL result:

    SET @Score = ISNULL(@Score,0)

    And, AFAIK, an indexed view is not allowed if (straight from BOL SS2K5):

    a SUM function [...] references a nullable expression.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (4/6/2010)


    I'd like to agree but it seems like that's not going to work here, since the OP used the ISNULL function to get a non-NULL result:

    SET @Score = ISNULL(@Score,0)

    And, AFAIK, an indexed view is not allowed if (straight from BOL SS2K5):

    a SUM function [...] references a nullable expression.

    SET ANSI_NULLS,

    ANSI_PADDING,

    ANSI_WARNINGS,

    ARITHABORT,

    CONCAT_NULL_YIELDS_NULL,

    QUOTED_IDENTIFIER

    ON;

    SET NUMERIC_ROUNDABORT

    OFF;

    GO

    USE tempdb;

    GO

    CREATE TABLE dbo.Lutz

    (

    row_id INTEGER IDENTITY PRIMARY KEY,

    group_id INTEGER NOT NULL,

    data INTEGER NULL

    );

    INSERT dbo.Lutz

    (group_id, data)

    SELECT 1, 100 UNION ALL

    SELECT 1, 200 UNION ALL

    SELECT 1, 300 UNION ALL

    SELECT 2, 100 UNION ALL

    SELECT 2, 150 UNION ALL

    SELECT 2, NULL;

    GO

    CREATE VIEW dbo.LutzView

    WITH SCHEMABINDING

    AS

    SELECT group_id,

    total = SUM(ISNULL(data, 0)),

    item_count = COUNT_BIG(*)

    FROM dbo.Lutz

    GROUP BY

    group_id;

    GO

    CREATE UNIQUE CLUSTERED INDEX c

    ON dbo.LutzView (group_id);

    GO

    SELECT *

    FROM dbo.LutzView WITH (NOEXPAND);

    GO

    DROP VIEW dbo.LutzView;

    DROP TABLE dbo.Lutz;

    GO

  • Paul White NZ (4/6/2010)


    Al B (4/5/2010)


    Any thoughts would be greatly appreciated. If I'm barking up the wrong tree, I'd love to hear some alternate solutions.

    The problem here is not one of being deterministic or not: current versions (including 2008) of SQL Server do not support persisted columns that reference a UDF that does data access. The reason is that SQL Server has no way to keep the persisted value up-to-date.

    The alternative here is to use the mechanism that does guarantee that updated values will be reflected in the persisted value - and that solution is an indexed view.

    THAT's the teaching opportunity I was hoping for. Thanks, Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @ Paul:

    Interesting!

    What do I misread/misunderstand when reading BOL?

    I thought what you just demonstrated wouldn't be possible...

    It looks like query optimizer is using ISNULL to override the columns NULL definition.

    VERY NICE!! Never thought that's possible to "cheat QO" that way so I never even tried...

    That, of course, changes everything and makes my posts in this thread irrelevant if not misleading... :ermm::crying::ermm:



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (4/6/2010)


    @ Paul:

    Interesting!

    What do I misread/misunderstand when reading BOL?

    I thought what you just demonstrated wouldn't be possible...

    It looks like query optimizer is using ISNULL to override the columns NULL definition.

    VERY NICE!! Never thought that's possible to "cheat QO" that way so I never even tried...

    That, of course, changes everything and makes my posts in this thread irrelevant if not misleading... :ermm::crying::ermm:

    No, no...not at all irrelevant or misleading...just another example of how things work around here at Team SSC 😀 We always get a solution in the end, but it often requires input from many people.

    Anyhow. I am a big fan of indexed views - when used appropriately (a bit like SQLCLR!)

    I tend to consider them wherever I might previously have been forced to store aggregates using triggers...

    Indexed views are a lot more flexible than many folks realise - and the BOL documentation doesn't always help with that. The issue here is not that the underlying column is NULLable - it is that the aggregated expression cannot be NULLable. I hope that makes sense?

    That said, indexed views do have many restrictions (for extremely good reasons - mostly related to correctness concerns and view-maintenance plan complexity) but can often be used to at least 'break the back' of many kinds of problems. As a last resort, of course, we can always hack something together with triggers 🙁

    Finally - thank you, Jeff.

  • Paul White NZ (4/6/2010)


    The problem here is not one of being deterministic or not: current versions (including 2008) of SQL Server do not support persisted columns that reference a UDF that does data access. The reason is that SQL Server has no way to keep the persisted value up-to-date.

    The problem is one of being not deterministic. A deterministic function should return the same value each time it is passed the same set of values for its parameters. A function that does data access is not deterministic because the data accessed may change over time and it may return different values for the same set of values for its input parameters in different calls. Non-determinism is the reason that SQL Server has no way to keep the persisted value up-to-date.

    Peter

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply