Controlling the evaluation of scalar functions

  • I am facing a strange issue and, even though I found a solution, I wonder if a better solution exists.

    I have a table containing a multi-lingual dictionary in a xml field. A mock-up of the table can be created and filled using the following statement:

    CREATE TABLE #Dict (ID INT PRIMARY KEY, XmlValue XML)
    INSERT INTO #Dict SELECT 0, CONVERT(XML, '<labels><label lang="en" value="en-0" /><label lang="fr" value="fr-0" /></labels>')
    INSERT INTO #Dict SELECT 1, CONVERT(XML, '<labels><label lang="en" value="en-1" /><label lang="fr" value="fr-1" /></labels>')
    INSERT INTO #Dict SELECT 2, CONVERT(XML, '<labels><label lang="en" value="en-2" /><label lang="fr" value="fr-2" /></labels>')
    INSERT INTO #Dict SELECT 3, CONVERT(XML, '<labels><label lang="en" value="en-3" /><label lang="fr" value="fr-3" /></labels>')
    INSERT INTO #Dict SELECT 4, CONVERT(XML, '<labels><label lang="en" value="en-4" /><label lang="fr" value="fr-4" /></labels>')
    INSERT INTO #Dict SELECT 5, CONVERT(XML, '<labels><label lang="en" value="en-5" /><label lang="fr" value="fr-5" /></labels>')
    INSERT INTO #Dict SELECT 6, CONVERT(XML, '<labels><label lang="en" value="en-6" /><label lang="fr" value="fr-6" /></labels>')
    INSERT INTO #Dict SELECT 7, CONVERT(XML, '<labels><label lang="en" value="en-7" /><label lang="fr" value="fr-7" /></labels>')
    INSERT INTO #Dict SELECT 8, CONVERT(XML, '<labels><label lang="en" value="en-8" /><label lang="fr" value="fr-8" /></labels>')
    INSERT INTO #Dict SELECT 9, CONVERT(XML, '<labels><label lang="en" value="en-9" /><label lang="fr" value="fr-9" /></labels>')

    The following scalar function is called to retrieve the value of a given index for a given language:

    CREATE OR ALTER FUNCTION GetValueInLanguage(@lang CHAR(2), @XmlValue XML)
    RETURNS NVARCHAR(MAX) AS BEGIN
    RETURN (SELECT X.Y.value('@value', 'NVARCHAR(MAX)') AS Value FROM @XmlValue.nodes('labels/label') AS X(Y) WHERE X.Y.value('@lang', 'CHAR(2)') = @lang)
    END

    Now, let's create and fill a big table containing references to the dictionary

    CREATE TABLE #Big (ID INT PRIMARY KEY, DictID INT)
    DECLARE @I INT = 0
    WHILE @I < 1000000 BEGIN
    INSERT INTO #Big SELECT @I, @I % 10
    SET @I = @I + 1
    END

    It is now time to perform some tests. The following statement runs in 76 seconds on my computer:

    SELECT
    B.ID,
    dbo.GetValueInLanguage('fr', D.XmlValue)
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID

    We can avoid calling the GetValueInLanguage function a million times by using a variable table. The following statement runs in 4 seconds:

    DECLARE @D TABLE (ID INT PRIMARY KEY, Value NVARCHAR(MAX))
    INSERT INTO @D SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) FROM #Dict
    SELECT
    B.ID,
    D.Value
    FROM #Big B
    INNER JOIN @D D ON D.ID = B.DictID

    My goal becomes to find a way to get the same performance using a single SELECT statement. The following function sill runs in 76 seconds:

    SELECT
    B.ID,
    D.Value
    FROM #Big B
    CROSS APPLY (SELECT Value FROM (
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict
    ) X WHERE ID = B.DictID) D

    But the following function runs in 4 seconds:

    SELECT
    B.ID,
    D.Value
    FROM #Big B
    CROSS APPLY (SELECT Value FROM (
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 0 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 1 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 2 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 3 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 4 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 5 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 6 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 7 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 8 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 9
    ) X WHERE ID = B.DictID) D

    Using UNIONs has the query plan call the GetValueInLanguage function only once per entry in the dictionary. Unfortunately, I have to hard code all entries. Fortunately, the following statement is just as fast:

    SELECT
    B.ID,
    D.Value
    FROM #Big B
    CROSS APPLY (SELECT Value FROM (
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = -1 UNION ALL
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID <> -1
    ) X WHERE ID = B.DictID) D

    While the last statement works perfectly, it looks strange to anyone not aware of the problem I tried to solve. Is anyone aware of a more "standard" way to get the same results with similar performance?

    Regards.

  • Change it to a table valued function.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • The following statement runs in 4 seconds and is definitely better:

    SELECT
    B.ID,
    D.Value
    FROM #Big B
    LEFT MERGE JOIN (
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict
    ) D ON D.ID = B.DictID
    OPTION (RECOMPILE)

    I can specify either MERGE or HASH join but not LOOP join

  • Or just do the xml query inline...

    DECLARE @startTime DATETIME2;
    DECLARE @endTime DATETIME2;
    SET @startTime = SYSDATETIME();

    DECLARE @lang CHAR(2) = 'fr';

    SET @startTime = SYSDATETIME();

    SELECT
    B.ID,
    --dbo.GetValueInLanguage('fr', D.XmlValue)
    X.Y.value('@value', 'NVARCHAR(MAX)')
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID
    CROSS APPLY D.XmlValue.nodes('labels/label') AS X(Y)
    WHERE X.Y.value('@lang', 'CHAR(2)') = @lang

    SET @endTime = SYSDATETIME();
    SELECT DATEDIFF(MILLISECOND,@startTime,@endTime) AS SelectDurationMS;

    Note: Using a tally table or function to insert the test data in #Big is MUCH faster than looping to insert one row at a time (I didn't have time to let my poor  overloaded PC finish that insert).

  • cmartel 20772 wrote:

    The following statement runs in 4 seconds and is definitely better:

    SELECT
    B.ID,
    D.Value
    FROM #Big B
    LEFT MERGE JOIN (
    SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict
    ) D ON D.ID = B.DictID
    OPTION (RECOMPILE)

    I can specify either MERGE or HASH join but not LOOP join

    Have you done as Michael suggested and tried converting it to an iTVF (inline Table Valued Function)?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've tried all those queries and they all run in about 4 seconds on my machine even the one that runs in 76 seconds on your machine.

    I'm using SQL 2019 Developer Edition.

     

  • Well, here we go.

    I used this to load the table. Took a few milliseconds

     INSERT INTO #Big (id, Dictid)
    SELECT TOP (1000000)
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 10
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    ;

    I changed the function to a iTVF

    CREATE OR ALTER   FUNCTION  [dbo].[GetValueInLanguage2]
    (@lang CHAR(2),
    @XmlValue XML)
    RETURNS TABLE
    RETURN (
    SELECT
    X.Y.value('@value', 'NVARCHAR(MAX)') AS Value
    FROM @XmlValue.nodes('labels/label') AS X(Y)
    WHERE X.Y.value('@lang', 'CHAR(2)') = @lang
    )
    GO

    And the results

    --20 seconds, new function
    SELECT
    B.ID,
    (SELECT VALUE FROM dbo.[GetValueInLanguage2]('fr', D.XmlValue))
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID

    --53 seconds, old function
    SELECT
    B.ID,
    dbo.GetValueInLanguage('fr', D.XmlValue)
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID

    --6 seconds, new function with a cross apply
    SELECT
    B.ID,
    X.VALUE
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID
    CROSS APPLY dbo.[GetValueInLanguage2]('fr', D.XmlValue) X

    There's probably more that can be gained.  But that should get you started.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John wrote:

    --20 seconds, new function
    SELECT
    B.ID,
    (SELECT VALUE FROM dbo.[GetValueInLanguage2]('fr', D.XmlValue))
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID

    --53 seconds, old function
    SELECT
    B.ID,
    dbo.GetValueInLanguage('fr', D.XmlValue)
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID

    --6 seconds, new function with a cross apply
    SELECT
    B.ID,
    X.VALUE
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID
    CROSS APPLY dbo.[GetValueInLanguage2]('fr', D.XmlValue) X

    Which version of SQL Server are you using?

    These are the results I got from your code:

    drop table if exists #x
    drop table if exists #y
    drop table if exists #z
    -- 7689 ms, new function
    SELECT
    B.ID,
    (SELECT VALUE FROM dbo.[GetValueInLanguage2]('fr', D.XmlValue)) x
    INTO #x
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID
    go
    -- 226 ms, old function
    SELECT
    B.ID,
    dbo.GetValueInLanguage('fr', D.XmlValue) x
    INTO #y
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID
    go
    -- 787 ms, new function with a cross apply
    SELECT B.ID,
    X.VALUE
    into #z
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID
    CROSS APPLY dbo.[GetValueInLanguage2]('fr', D.XmlValue) X

    So the fastest one on my machine is the old code with 226 ms elapsed.

    I'm using Microsoft SQL Server 2019 (RTM-GDR) (KB5014356) - 15.0.2095.3 (X64) Apr 29 2022 18:00:13 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows 10 Home 10.0 <X64> (Build 19044: )

    And running it on a 7 year old Dell XPS 8700 desktop with 28GB RAM and i7-4790 CPU.

     

  • I performed my tests on a local SQL Server 2019. I will experiment with all received suggestions during the holidays. Many thanks to all of you and happy holidays!

  • [/quote]Which version of SQL Server are you using?[/quote]

    2019. My laptop is 32 GB ram with an 8 core i7 processor

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • I've just pasted in my results in my last comment above. Your results are very different to mine.

    I thought Microsoft made a change in a previous release of SQL Server to make scalar valued functions return as fast as iTVFs.

  • There is a scoped configuration that can be set in SQL 2019 to improve the performance of scalar UDFs:

    To check its value:

    select * from sys.database_scoped_configurations

    udf

    To switch it on:

    ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON;

    This makes a huge difference the following query

    drop table if exists #y;
    SELECT B.ID,
    dbo.GetValueInLanguage('fr', D.XmlValue) x
    INTO #y
    FROM #Big B
    INNER JOIN #Dict D ON D.ID = B.DictID;

    runs in 56000 ms with it switched off and 256 ms with it switched on.

    Read this: Scalar UDF inlining

    Note: Before you do this you might need to set the database compatibility level to 150:

    ALTER DATABASE [myDatabase] SET COMPATIBILITY LEVEL 150;

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply