April 11, 2019 at 7:36 am
Hello
Scenario:
I have a script that is the following:
INSERT INTO
StagingTable2 (LotsOfColumns,
FunctionResult
)
SELECT
s1.LotsOfColumns,
MyScalarFuntion(s1.Col1, s1.Col2) AS FunctionResult
FROM
StagingTable1 AS s1
That takes very long to run as it runs in serial. The function only uses those two columns, nothing from any other table. No rows are dependant on any other rows. The way that I parallelised this was to add a column to StagingTable1 with the first character of a string, open up 26 tabs in SSMS, and run the following:
INSERT INTO
StagingTable2 (LotsOfColumns,
FunctionResult
)
SELECT
s1.LotsOfColumns,
MyScalarFuntion(s1.Col1, s1.Col2) AS FunctionResult
FROM
StagingTable1 AS s1 WITH (NOLOCK)
WHERE
s1.FirstCharOfString = 'A'
This cut the time down drastically, as expected.
Question:
How can I parallelise this function call within SSMS without opening up multiple tabs and calling each letter individually, or using something like SSIS or Alteryx to do what is effectively the same thing? This would be *SO* useful for investigations on large-for-us data sets like I am currently doing..
Thank you
Andrew
April 11, 2019 at 7:53 am
Quick thought, the best way would be to inline the function's code in the query. This would enable the server to use a parallel execution plan and also remove the execution overhead of the scalar function.
๐
Can you post the full code of the function?
April 11, 2019 at 7:57 am
Unless you're planning to move to the most recent release of Sql Server, you cannot. A good alternative is to rework the UDF as an inline table-valued function. Jeff Moden has an article on this if I can find it...
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 11, 2019 at 8:13 am
Thank you for your replies. I'm taking over from a contractor who put this Levenshtein matching function in place https://stackoverflow.com/questions/36492073/sql-full-text-search-result-priority/36613504#36613504 (literally this function, he copied the code and renamed the two functions to fit into our naming schema.)
If SQL Server has built-in better and faster matching functions, I'm also open to hearing it please.
April 11, 2019 at 12:14 pm
It doesn't but there are a few TSQL equivalents around. I've written a Levenshtein workalike which is quite fast, but I'm having some trouble finding it!
Edit - found it: https://www.sqlservercentral.com/forums/topic/how-to-use-scalar-function-without-while
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 11, 2019 at 1:58 pm
It doesn't but there are a few TSQL equivalents around. I've written a Levenshtein workalike which is quite fast, but I'm having some trouble finding it! ย Edit - found it: https://www.sqlservercentral.com/forums/topic/how-to-use-scalar-function-without-while ย
I remember that one, did one about year earlier but at work but not certain that I posted it on SSC
๐
Will dig around and see if I can find it
April 17, 2019 at 3:27 pm
Quick update, this is the most efficient code I've come up with so far, beats the bashes out of anything I've seen so far
๐
DECLARE @STRING01 NVARCHAR(100) = N'levenshtein distance sql'--N'giff' -- ;
DECLARE @STRING02 NVARCHAR(100) = N'levenshtein sql server' -- N'sitting' -- N'gxaff' --;
;WITH T(N) AS (SELECT X.N FROM ( VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
,MAX_STRING_LEN (MXSTRLEN) AS
(
SELECT
CASE WHEN LEN(@STRING01)>= LEN(@STRING02) THEN LEN(@STRING01)
ELSE LEN(@STRING02)
END AS MXSTRLEN
)
,NUMS(N) AS
(
SELECT TOP((SELECT MSL.MXSTRLEN FROM MAX_STRING_LEN MSL)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
FROM T T1 CROSS JOIN T T2
)
,STRING_SET_SIMPLE AS
(
SELECT
NM.N
,SUBSTRING(@STRING01,NM.N,1) AS CHAR01
,SUBSTRING(@STRING02,NM.N,1) AS CHAR02
FROM NUMS NM
)
,COMPARE_STRINGS AS
(
SELECT
SSS.N
,SSS.CHAR01
,SSS.CHAR02
,NULLIF((CHARINDEX(SUBSTRING(@STRING01,SSS.N,1),@STRING02,SSS.N) - SSS.N),SSS.N) AS CMP01
,NULLIF((CHARINDEX(SUBSTRING(@STRING02,SSS.N,1),@STRING01,SSS.N) - SSS.N),SSS.N) AS CMP02
FROM STRING_SET_SIMPLE SSS
)
SELECT
(SUM(CASE
WHEN CS.CMP01 < 0 THEN 1.0
ELSE 0
END)
+ SUM(CASE
WHEN CS.CMP02 < 0 THEN 1.0
ELSE 0
END)) / 2.0 AS C2SUM
,SUM(CASE
WHEN CS.CMP01 < 0 THEN 1.0
ELSE 0
END) AS C1SUM
,SUM(CASE
WHEN CS.CMP02 < 0 THEN 1.0
ELSE 0
END) AS C2SUM
FROM COMPARE_STRINGS CS;
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply