Parallelising a Scalar Function Call within SSMS

  • Hello

    Scenario:

    I have a script that is the following:

    INSERT INTO
    StagingTable2 (LotsOfColumns,
    FunctionResult
    )
    SELECT
    s1.LotsOfColumns,
    MyScalarFuntion(s1.Col1, s1.Col2) AS FunctionResult
    FROM
    StagingTable1 AS s1

    That takes very long to run as it runs in serial. The function only uses those two columns, nothing from any other table. No rows are dependant on any other rows. The way that I parallelised this was to add a column to StagingTable1 with the first character of a string, open up 26 tabs in SSMS, and run the following:

    INSERT INTO
    StagingTable2 (LotsOfColumns,
    FunctionResult
    )
    SELECT
    s1.LotsOfColumns,
    MyScalarFuntion(s1.Col1, s1.Col2) AS FunctionResult
    FROM
    StagingTable1 AS s1 WITH (NOLOCK)
    WHERE
    s1.FirstCharOfString = 'A'

    This cut the time down drastically, as expected.

    Question:

    How can I parallelise this function call within SSMS without opening up multiple tabs and calling each letter individually, or using something like SSIS or Alteryx to do what is effectively the same thing? This would be *SO* useful for investigations on large-for-us data sets like I am currently doing..

    Thank you

    Andrew

  • Quick thought, the best way would be to inline the function's code in the query. This would enable the server to use a parallel execution plan and also remove the execution overhead of the scalar function.

    ๐Ÿ˜Ž

     

    Can you post the full code of the function?

  • Unless you're planning to move to the most recent release of Sql Server, you cannot. A good alternative is to rework the UDF as an inline table-valued function. Jeff Moden has an article on this if I can find it...

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you for your replies. I'm taking over from a contractor who put this Levenshtein matching function in place https://stackoverflow.com/questions/36492073/sql-full-text-search-result-priority/36613504#36613504 (literally this function, he copied the code and renamed the two functions to fit into our naming schema.)

    If SQL Server has built-in better and faster matching functions, I'm also open to hearing it please.

  • It doesn't but there are a few TSQL equivalents around. I've written a Levenshtein workalike which is quite fast, but I'm having some trouble finding it!

     

    Edit - found it: https://www.sqlservercentral.com/forums/topic/how-to-use-scalar-function-without-while

     

    • This reply was modified 5 years, 8 months ago by  ChrisM@Work.
    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work wrote:

    It doesn't but there are a few TSQL equivalents around. I've written a Levenshtein workalike which is quite fast, but I'm having some trouble finding it! ย  Edit - found it: https://www.sqlservercentral.com/forums/topic/how-to-use-scalar-function-without-while ย 

     

    I remember that one, did one about year earlier but at work but not certain that I posted it on SSC

    ๐Ÿ˜Ž

    Will dig around and see if I can find it

  • Quick update, this is the most efficient code I've come up with so far, beats the bashes out of anything I've seen so far

    ๐Ÿ˜Ž

     

    DECLARE @STRING01   NVARCHAR(100)   = N'levenshtein distance sql'--N'giff' -- ;
    DECLARE @STRING02 NVARCHAR(100) = N'levenshtein sql server' -- N'sitting' -- N'gxaff' --;

    ;WITH T(N) AS (SELECT X.N FROM ( VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    ,MAX_STRING_LEN (MXSTRLEN) AS
    (
    SELECT
    CASE WHEN LEN(@STRING01)>= LEN(@STRING02) THEN LEN(@STRING01)
    ELSE LEN(@STRING02)
    END AS MXSTRLEN
    )
    ,NUMS(N) AS
    (
    SELECT TOP((SELECT MSL.MXSTRLEN FROM MAX_STRING_LEN MSL)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
    FROM T T1 CROSS JOIN T T2
    )
    ,STRING_SET_SIMPLE AS
    (
    SELECT
    NM.N
    ,SUBSTRING(@STRING01,NM.N,1) AS CHAR01
    ,SUBSTRING(@STRING02,NM.N,1) AS CHAR02
    FROM NUMS NM
    )
    ,COMPARE_STRINGS AS
    (
    SELECT
    SSS.N
    ,SSS.CHAR01
    ,SSS.CHAR02
    ,NULLIF((CHARINDEX(SUBSTRING(@STRING01,SSS.N,1),@STRING02,SSS.N) - SSS.N),SSS.N) AS CMP01
    ,NULLIF((CHARINDEX(SUBSTRING(@STRING02,SSS.N,1),@STRING01,SSS.N) - SSS.N),SSS.N) AS CMP02
    FROM STRING_SET_SIMPLE SSS
    )
    SELECT
    (SUM(CASE
    WHEN CS.CMP01 < 0 THEN 1.0
    ELSE 0
    END)
    + SUM(CASE
    WHEN CS.CMP02 < 0 THEN 1.0
    ELSE 0
    END)) / 2.0 AS C2SUM
    ,SUM(CASE
    WHEN CS.CMP01 < 0 THEN 1.0
    ELSE 0
    END) AS C1SUM
    ,SUM(CASE
    WHEN CS.CMP02 < 0 THEN 1.0
    ELSE 0
    END) AS C2SUM
    FROM COMPARE_STRINGS CS;

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply