I am facing a strange issue and, even though I found a solution, I wonder if a better solution exists.
I have a table containing a multi-lingual dictionary in a xml field. A mock-up of the table can be created and filled using the following statement:
CREATE TABLE #Dict (ID INT PRIMARY KEY, XmlValue XML)
INSERT INTO #Dict SELECT 0, CONVERT(XML, '<labels><label lang="en" value="en-0" /><label lang="fr" value="fr-0" /></labels>')
INSERT INTO #Dict SELECT 1, CONVERT(XML, '<labels><label lang="en" value="en-1" /><label lang="fr" value="fr-1" /></labels>')
INSERT INTO #Dict SELECT 2, CONVERT(XML, '<labels><label lang="en" value="en-2" /><label lang="fr" value="fr-2" /></labels>')
INSERT INTO #Dict SELECT 3, CONVERT(XML, '<labels><label lang="en" value="en-3" /><label lang="fr" value="fr-3" /></labels>')
INSERT INTO #Dict SELECT 4, CONVERT(XML, '<labels><label lang="en" value="en-4" /><label lang="fr" value="fr-4" /></labels>')
INSERT INTO #Dict SELECT 5, CONVERT(XML, '<labels><label lang="en" value="en-5" /><label lang="fr" value="fr-5" /></labels>')
INSERT INTO #Dict SELECT 6, CONVERT(XML, '<labels><label lang="en" value="en-6" /><label lang="fr" value="fr-6" /></labels>')
INSERT INTO #Dict SELECT 7, CONVERT(XML, '<labels><label lang="en" value="en-7" /><label lang="fr" value="fr-7" /></labels>')
INSERT INTO #Dict SELECT 8, CONVERT(XML, '<labels><label lang="en" value="en-8" /><label lang="fr" value="fr-8" /></labels>')
INSERT INTO #Dict SELECT 9, CONVERT(XML, '<labels><label lang="en" value="en-9" /><label lang="fr" value="fr-9" /></labels>')
The following scalar function is called to retrieve the value of a given index for a given language:
CREATE OR ALTER FUNCTION GetValueInLanguage(@lang CHAR(2), @XmlValue XML)
RETURNS NVARCHAR(MAX) AS BEGIN
RETURN (SELECT X.Y.value('@value', 'NVARCHAR(MAX)') AS Value FROM @XmlValue.nodes('labels/label') AS X(Y) WHERE X.Y.value('@lang', 'CHAR(2)') = @lang)
END
Now, let's create and fill a big table containing references to the dictionary
CREATE TABLE #Big (ID INT PRIMARY KEY, DictID INT)
DECLARE @I INT = 0
WHILE @I < 1000000 BEGIN
INSERT INTO #Big SELECT @I, @I % 10
SET @I = @I + 1
END
It is now time to perform some tests. The following statement runs in 76 seconds on my computer:
SELECT
B.ID,
dbo.GetValueInLanguage('fr', D.XmlValue)
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
We can avoid calling the GetValueInLanguage function a million times by using a variable table. The following statement runs in 4 seconds:
DECLARE @D TABLE (ID INT PRIMARY KEY, Value NVARCHAR(MAX))
INSERT INTO @D SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) FROM #Dict
SELECT
B.ID,
D.Value
FROM #Big B
INNER JOIN @D D ON D.ID = B.DictID
My goal becomes to find a way to get the same performance using a single SELECT statement. The following function sill runs in 76 seconds:
SELECT
B.ID,
D.Value
FROM #Big B
CROSS APPLY (SELECT Value FROM (
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict
) X WHERE ID = B.DictID) D
But the following function runs in 4 seconds:
SELECT
B.ID,
D.Value
FROM #Big B
CROSS APPLY (SELECT Value FROM (
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 0 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 1 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 2 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 3 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 4 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 5 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 6 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 7 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 8 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = 9
) X WHERE ID = B.DictID) D
Using UNIONs has the query plan call the GetValueInLanguage function only once per entry in the dictionary. Unfortunately, I have to hard code all entries. Fortunately, the following statement is just as fast:
SELECT
B.ID,
D.Value
FROM #Big B
CROSS APPLY (SELECT Value FROM (
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID = -1 UNION ALL
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict WHERE ID <> -1
) X WHERE ID = B.DictID) D
While the last statement works perfectly, it looks strange to anyone not aware of the problem I tried to solve. Is anyone aware of a more "standard" way to get the same results with similar performance?
Regards.
December 21, 2022 at 7:40 pm
Change it to a table valued function.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
December 21, 2022 at 10:16 pm
The following statement runs in 4 seconds and is definitely better:
SELECT
B.ID,
D.Value
FROM #Big B
LEFT MERGE JOIN (
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict
) D ON D.ID = B.DictID
OPTION (RECOMPILE)
I can specify either MERGE or HASH join but not LOOP join
December 21, 2022 at 10:52 pm
Or just do the xml query inline...
DECLARE @startTime DATETIME2;
DECLARE @endTime DATETIME2;
SET @startTime = SYSDATETIME();
DECLARE @lang CHAR(2) = 'fr';
SET @startTime = SYSDATETIME();
SELECT
B.ID,
--dbo.GetValueInLanguage('fr', D.XmlValue)
X.Y.value('@value', 'NVARCHAR(MAX)')
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
CROSS APPLY D.XmlValue.nodes('labels/label') AS X(Y)
WHERE X.Y.value('@lang', 'CHAR(2)') = @lang
SET @endTime = SYSDATETIME();
SELECT DATEDIFF(MILLISECOND,@startTime,@endTime) AS SelectDurationMS;
Note: Using a tally table or function to insert the test data in #Big is MUCH faster than looping to insert one row at a time (I didn't have time to let my poor overloaded PC finish that insert).
December 21, 2022 at 11:51 pm
The following statement runs in 4 seconds and is definitely better:
SELECT
B.ID,
D.Value
FROM #Big B
LEFT MERGE JOIN (
SELECT ID, dbo.GetValueInLanguage('fr', XmlValue) AS Value FROM #Dict
) D ON D.ID = B.DictID
OPTION (RECOMPILE)I can specify either MERGE or HASH join but not LOOP join
Have you done as Michael suggested and tried converting it to an iTVF (inline Table Valued Function)?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 22, 2022 at 12:23 am
I've tried all those queries and they all run in about 4 seconds on my machine even the one that runs in 76 seconds on your machine.
I'm using SQL 2019 Developer Edition.
Well, here we go.
I used this to load the table. Took a few milliseconds
INSERT INTO #Big (id, Dictid)
SELECT TOP (1000000)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 10
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
I changed the function to a iTVF
CREATE OR ALTER FUNCTION [dbo].[GetValueInLanguage2]
(@lang CHAR(2),
@XmlValue XML)
RETURNS TABLE
RETURN (
SELECT
X.Y.value('@value', 'NVARCHAR(MAX)') AS Value
FROM @XmlValue.nodes('labels/label') AS X(Y)
WHERE X.Y.value('@lang', 'CHAR(2)') = @lang
)
GO
And the results
--20 seconds, new function
SELECT
B.ID,
(SELECT VALUE FROM dbo.[GetValueInLanguage2]('fr', D.XmlValue))
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
--53 seconds, old function
SELECT
B.ID,
dbo.GetValueInLanguage('fr', D.XmlValue)
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
--6 seconds, new function with a cross apply
SELECT
B.ID,
X.VALUE
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
CROSS APPLY dbo.[GetValueInLanguage2]('fr', D.XmlValue) X
There's probably more that can be gained. But that should get you started.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
December 22, 2022 at 2:11 pm
--20 seconds, new function
SELECT
B.ID,
(SELECT VALUE FROM dbo.[GetValueInLanguage2]('fr', D.XmlValue))
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
--53 seconds, old function
SELECT
B.ID,
dbo.GetValueInLanguage('fr', D.XmlValue)
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
--6 seconds, new function with a cross apply
SELECT
B.ID,
X.VALUE
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
CROSS APPLY dbo.[GetValueInLanguage2]('fr', D.XmlValue) X
Which version of SQL Server are you using?
These are the results I got from your code:
drop table if exists #x
drop table if exists #y
drop table if exists #z
-- 7689 ms, new function
SELECT
B.ID,
(SELECT VALUE FROM dbo.[GetValueInLanguage2]('fr', D.XmlValue)) x
INTO #x
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
go
-- 226 ms, old function
SELECT
B.ID,
dbo.GetValueInLanguage('fr', D.XmlValue) x
INTO #y
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
go
-- 787 ms, new function with a cross apply
SELECT B.ID,
X.VALUE
into #z
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID
CROSS APPLY dbo.[GetValueInLanguage2]('fr', D.XmlValue) X
So the fastest one on my machine is the old code with 226 ms elapsed.
I'm using Microsoft SQL Server 2019 (RTM-GDR) (KB5014356) - 15.0.2095.3 (X64) Apr 29 2022 18:00:13 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows 10 Home 10.0 <X64> (Build 19044: )
And running it on a 7 year old Dell XPS 8700 desktop with 28GB RAM and i7-4790 CPU.
December 22, 2022 at 2:15 pm
I performed my tests on a local SQL Server 2019. I will experiment with all received suggestions during the holidays. Many thanks to all of you and happy holidays!
December 22, 2022 at 2:16 pm
[/quote]Which version of SQL Server are you using?[/quote]
2019. My laptop is 32 GB ram with an 8 core i7 processor
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
December 22, 2022 at 2:28 pm
I've just pasted in my results in my last comment above. Your results are very different to mine.
I thought Microsoft made a change in a previous release of SQL Server to make scalar valued functions return as fast as iTVFs.
December 22, 2022 at 5:13 pm
There is a scoped configuration that can be set in SQL 2019 to improve the performance of scalar UDFs:
To check its value:
select * from sys.database_scoped_configurations
To switch it on:
ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON;
This makes a huge difference the following query
drop table if exists #y;
SELECT B.ID,
dbo.GetValueInLanguage('fr', D.XmlValue) x
INTO #y
FROM #Big B
INNER JOIN #Dict D ON D.ID = B.DictID;
runs in 56000 ms with it switched off and 256 ms with it switched on.
Read this: Scalar UDF inlining
Note: Before you do this you might need to set the database compatibility level to 150:
ALTER DATABASE [myDatabase] SET COMPATIBILITY LEVEL 150;
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply