August 24, 2015 at 12:54 am
Hi, I'm migrating electronic records from a legacy system and the new system has strict requirements for ASCII characters in certain metadata fields. I wrote a UDF to display illegal characters, so I can work out how to map them.
The UDF used a while loop and to improve performance, I wrote the equivalent UDF using a tally table. The tally table version actually ran significantly slower. Query calling UDF using cross apply took 26 secsfor the while loop versus 119 secs for Tally table, for test data of 97000 rows
I would like to work out why, as I will use similar code to replace the illegal characters
-- while loop version of UDF
CREATE FUNCTION [dbo].[DisplayIllegalChars](@strText VARCHAR(4000))
RETURNS @TableVariable TABLE (
Chr CHAR(1)
,AsciiValue INT)
AS
BEGIN
DECLARE @intCount INT
DECLARE @chrCheck CHAR
SET @intCount = 1
WHILE @intCount <= LEN(@strText)
BEGIN
SET @chrCheck = SUBSTRING(@strText, @intCount, 1)
IF (ASCII(@chrCheck) > 126)
OR ((ASCII(@chrCheck) < 31) AND NOT ASCII(@chrCheck) IN (9, 10, 13))
INSERT
INTO @TableVariable (Chr, AsciiValue)
SELECT @chrCheck, ASCII(@chrCheck)
SET @intCount = @intCount + 1
END
RETURN
END
-- tally table, per Jeff Moden article
SELECT TOP 4000
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
-- tally table version of UDF (could be inline function but want to compare like for like)
CREATE FUNCTION [dbo].[DisplayIllegalChars2](@strText VARCHAR(4000))
RETURNS @TableVariable TABLE (
Chr CHAR(1)
,AsciiValue INT)
AS
BEGIN
;with CTE AS (
SELECT SUBSTRING(@strText, N, 1) AS Chr
FROM dbo.Tally)
INSERT INTO @TableVariable (Chr, AsciiValue)
SELECT Chr, ASCII(Chr)
FROM CTE
WHERE (ASCII(Chr) > 126)
OR ((ASCII(Chr) < 31) AND NOT ASCII(Chr) IN (9, 10, 13))
RETURN
END
-- ColumnText contains characters to be checked
CREATE TABLE [dbo].[TestData](
[ID Key] [varchar](255) NULL,
[RecordType] [varchar](256) NOT NULL,
[ColumnType] [varchar](256) NOT NULL,
[ColumnText] [varchar](4000) NULL
) ON [PRIMARY]
-- for each illegal character, display original text and the illegal characters
-- whle loop UDF took 26 secs
select [id key]
,RecordType
,ColumnType
,ColumnText
from TestData t
cross apply [dbo].[DisplayIllegalChars](t.ColumnText) i
-- tally table USD took 119 secs
select [id key]
,RecordType
,ColumnType
,ColumnText
from TestData t
cross apply [dbo].[DisplayIllegalChars2](t.ColumnText) i
August 24, 2015 at 1:31 am
There are three key problems with the way you have structured the tally table version.
Firstly, your tally table should have a unique clustered index on n, or define it as a primary key in the ddl.
Secondly, you aren't restricting the number of rows pulled from the tally table. You're using the full 4000 with every call.
Thirdly, your function, a multi-statement table-valued function, will be much faster if it's written as an inline table-valued function, like this:
CREATE FUNCTION [dbo].[iTVF_DisplayIllegalChars2]
(@strText VARCHAR(4000))
RETURNS TABLE AS
RETURN (
SELECT
Chr = SUBSTRING(@strText, N, 1),
AsciiValue = ASCII(SUBSTRING(@strText, N, 1))
FROM dbo.Tally
WHERE n <= LEN(@strText)
AND (
(ASCII(Chr) > 126)
OR ((ASCII(Chr) < 31) AND NOT ASCII(Chr) IN (9, 10, 13))
)
)
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 24, 2015 at 2:17 am
Hi Chris, I tried your suggestions,
1) clustered index on tally table didn't make any difference, still 120 secs on test data
2) using LEN(@vstrText) < N made huge difference, down to 15 secs, thanks! Makes sense that this is so signficant
3) using a multi-statement table-valued function, this one I was aware of and was keeping as multi-statement to compare like for like, as the while loop version has to be multi-statement. For the record, using inline function brought it down to 5 secs.
thanks for your help π
Josh
August 24, 2015 at 3:17 am
Josh Leane-155117 (8/24/2015)
Hi Chris, I tried your suggestions,1) clustered index on tally table didn't make any difference, still 120 secs on test data
2) using LEN(@vstrText) < N made huge difference, down to 15 secs, thanks! Makes sense that this is so signficant
3) using a multi-statement table-valued function, this one I was aware of and was keeping as multi-statement to compare like for like, as the while loop version has to be multi-statement. For the record, using inline function brought it down to 5 secs.
thanks for your help π
Josh
Seems a little slow. Here's a tested version this time, with two changes which may be significant.
Firstly, the tally table is inline, secondly, I've put in a couple of LIKE comparisons.
From your point of view, the inline tally table may be worth a shot vs the hard table.
ALTER FUNCTION [dbo].[iTVF_DisplayIllegalChars2]
(@strText VARCHAR(4000))
RETURNS TABLE AS
RETURN (
WITH
E1(n) AS (SELECT 1 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)), --10E+1 or 10 rows
E2(n) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
Tally(n) AS (SELECT TOP (LEN(@strText)) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E2 a, E2 b)
SELECT
x.Chr,
AsciiValue = ASCII(x.Chr)
FROM Tally
CROSS APPLY (SELECT Chr = SUBSTRING(@strText, CAST(N AS INT), 1)) x
WHERE 1 = 1
AND x.Chr NOT LIKE '[A-Z]'
AND x.Chr NOT LIKE '[0-9]'
AND (
(ASCII(x.Chr) > 126)
OR
((ASCII(x.Chr) < 31) AND NOT ASCII(x.Chr) IN (9, 10, 13))
)
)
GO
SELECT * FROM dbo.iTVF_DisplayIllegalChars2 ('JND\VC879Y2`' + CHAR(8) + '34HMBzsmfcB`2' + CHAR(127) + '13')
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 24, 2015 at 11:26 am
Josh Leane-155117 (8/24/2015)
Hi Chris, I tried your suggestions,1) clustered index on tally table didn't make any difference, still 120 secs on test data
2) using LEN(@vstrText) < N made huge difference, down to 15 secs, thanks! Makes sense that this is so signficant
3) using a multi-statement table-valued function, this one I was aware of and was keeping as multi-statement to compare like for like, as the while loop version has to be multi-statement. For the record, using inline function brought it down to 5 secs.
thanks for your help π
Josh
Just to add to the mix, did you just create a clustered index or did you create the required UNIQUE clustered index?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 25, 2015 at 1:54 am
Thanks for the additional suggestions Chris. I tried with the Tally table in the function, that slowed it slightly from 5 secs to 7 secs. Presumably due to the tally table being created each time?
With the additional LIKE comparisons, it was slower again, 19 secs
ChrisM@Work (8/24/2015)
WHERE 1 = 1
AND x.Chr NOT LIKE '[A-Z]'
AND x.Chr NOT LIKE '[0-9]'
Also, whats the reason behind the "1 = 1" comparision? Does that force a certain evaluation order for the other comparisons in the where clause?
Using the cross apply to alias the "SELECT Chr = SUBSTRING(@strText, CAST(N AS INT), 1)" is a useful trick, thanks
Josh
August 25, 2015 at 2:06 am
Josh Leane-155117 (8/25/2015)
Thanks for the additional suggestions Chris. I tried with the Tally table in the function, that slowed it slightly from 5 secs to 7 secs. Presumably due to the tally table being created each time?With the additional LIKE comparisons, it was slower again, 19 secs
ChrisM@Work (8/24/2015)
WHERE 1 = 1
AND x.Chr NOT LIKE '[A-Z]'
AND x.Chr NOT LIKE '[0-9]'
Also, whats the reason behind the "1 = 1" comparision? Does that force a certain evaluation order for the other comparisons in the where clause?
Using the cross apply to alias the "SELECT Chr = SUBSTRING(@strText, CAST(N AS INT), 1)" is a useful trick, thanks
Josh
Hi Josh
IIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.
WHERE 1 = 1 is only a placeholder, no tricky stuff, the optimiser ignores it and so can you. SQL Server evaluates the filters in whatever order results in the lowest calculated cost in the chosen plan. Have you ever tried to cast a varchar to a date, using ISDATE() to check if the value can be cast, only to find that the cast is performed before the check?
Did you create the clustered index as unique as Jeff mentions above?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 25, 2015 at 3:17 am
Jeff Moden (8/24/2015)
Josh Leane-155117 (8/24/2015)
Hi Chris, I tried your suggestions,1) clustered index on tally table didn't make any difference, still 120 secs on test data
2) using LEN(@vstrText) < N made huge difference, down to 15 secs, thanks! Makes sense that this is so signficant
3) using a multi-statement table-valued function, this one I was aware of and was keeping as multi-statement to compare like for like, as the while loop version has to be multi-statement. For the record, using inline function brought it down to 5 secs.
thanks for your help π
Josh
Just to add to the mix, did you just create a clustered index or did you create the required UNIQUE clustered index?
Hi Jeff, I did only have a clustered index. When I originally said the clustered index made no difference, that was against my original tally table, which was using the 4000 rows each time.
I retested with Chris' solution, using LEN(@strText) rows from the tally table, and having the clustered index did make as difference of about 40% improvement. But it didn't matter if it was unique or not, though there might be a difference with a larger dataset.
thanks for your help
August 25, 2015 at 3:20 am
Hi Josh
IIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.
WHERE 1 = 1 is only a placeholder, no tricky stuff, the optimiser ignores it and so can you. SQL Server evaluates the filters in whatever order results in the lowest calculated cost in the chosen plan. Have you ever tried to cast a varchar to a date, using ISDATE() to check if the value can be cast, only to find that the cast is performed before the check?
Did you create the clustered index as unique as Jeff mentions above?
Hey Chris, did try with the unique clustered index and posted a comment in reply to Jeff. I wasn't aware that the optimiser could change the order of the cast and isdate() check, good to know π
thanks for all your help, appreciate it
Josh
August 25, 2015 at 9:40 am
ChrisM@Work (8/25/2015)
Hi JoshIIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.
No need to Google. Here's a chart from the "Hidden RBAR" article at http://www.sqlservercentral.com/articles/T-SQL/74118/ .
Note that use of a hardcoded Tally Table can cause a huge number of logical reads from memory once the relatively small physical Tally Table has been cached. The advantage of Itzik's method is that it generates absolutely no reads by itself.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 25, 2015 at 6:07 pm
Jeff Moden (8/25/2015)
ChrisM@Work (8/25/2015)
Hi JoshIIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.
No need to Google. Here's a chart from the "Hidden RBAR" article at http://www.sqlservercentral.com/articles/T-SQL/74118/ .
Note that use of a hardcoded Tally Table can cause a huge number of logical reads from memory once the relatively small physical Tally Table has been cached. The advantage of Itzik's method is that it generates absolutely no reads by itself.
thanks Jeff π I'll definitely read that article
Josh
August 26, 2015 at 2:17 am
Jeff Moden (8/25/2015)
ChrisM@Work (8/25/2015)
Hi JoshIIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.
No need to Google. Here's a chart from the "Hidden RBAR" article at http://www.sqlservercentral.com/articles/T-SQL/74118/ .
Note that use of a hardcoded Tally Table can cause a huge number of logical reads from memory once the relatively small physical Tally Table has been cached. The advantage of Itzik's method is that it generates absolutely no reads by itself.
Thanks Jeff, I knew it was out there somewhere.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 26, 2015 at 7:20 am
You bet. It's one of my favorite charts because of the line for the rCTE. Absolutely amazing how comparatively slow those bad boys can be.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply