July 15, 2015 at 4:40 am
Hi,
I am looking for the fastest way to strip non-numeric characters from a string.
I have a user database that has a column (USER_TELNO) in which the user can drop a telephone number (for example '+31 (0)12-123 456'). An extra computed column (FORMATTED_TELNO) should contain the formatted telephone number (31012123456 in the example)
Note: the column FORMATTED_TELNO must be indexed, so the UDF in the computed column has WITH SCHEMABINDING.... I think this implicates that a CLR call won't work....
Thanks!
July 15, 2015 at 4:55 am
Raymond van Laake (7/15/2015)
Hi,I am looking for the fastest way to strip non-numeric characters from a string.
I have a user database that has a column (USER_TELNO) in which the user can drop a telephone number (for example '+31 (0)12-123 456'). An extra computed column (FORMATTED_TELNO) should contain the formatted telephone number (31012123456 in the example)
Note: the column FORMATTED_TELNO must be indexed, so the UDF in the computed column has WITH SCHEMABINDING.... I think this implicates that a CLR call won't work....
Thanks!
Have a look at this thread from last year
😎
July 15, 2015 at 5:53 am
Yes, I saw older posts, was just wondering what fastest would be.
I eventually came up with this based on older posts:
ALTER FUNCTION [dbo].[__fx_test] (@countryNr INT, @telNr NCHAR(30))
RETURNS CHAR(30)
WITH SCHEMABINDING
AS
BEGIN
DECLARE @fTelNr VARCHAR(30) = ''
SELECT @fTelNr = @fTelNr + SUBSTRING(@telNr,N,1)
FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)) AS x(N)
WHERE N<=LEN(@telNr)
AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')
RETURN @fTelNr
END
July 15, 2015 at 6:53 am
Raymond van Laake (7/15/2015)
Yes, I saw older posts, was just wondering what fastest would be.I eventually came up with this based on older posts:
ALTER FUNCTION [dbo].[__fx_test] (@countryNr INT, @telNr NCHAR(30))
RETURNS CHAR(30)
WITH SCHEMABINDING
AS
BEGIN
DECLARE @fTelNr VARCHAR(30) = ''
SELECT @fTelNr = @fTelNr + SUBSTRING(@telNr,N,1)
FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)) AS x(N)
WHERE N<=LEN(@telNr)
AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')
RETURN @fTelNr
END
Since you're looking for the fastest way, might I suggest a couple of small improvements:
1. You have this coded as a scalar function. An inline table-valued function will almost always perform better (usually significantly). This would require changing your query calling the function to CROSS APPLY to this function, instead of calling it in the column list (see my blog post[/url] showing how an inline function works better).
2. Even though it is unlikely, since the query doesn't have an ORDER BY clause, the numbers could possibly be returned out of order. No ORDER BY clause = no guarantee of order.
3. String concatenation in Windows is notoriously slow. Better would be to:
4. Using FOR XML PATH(''), TYPE is usually the quickest way to create strings.
Based upon all of that, I suggest:
CREATE FUNCTION dbo.[__fx_test] (@countryNr INT, @telNr NCHAR(30))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT (SELECT SUBSTRING(@telNr,N,1)
FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)) AS x(N)
WHERE N<=LEN(@telNr)
AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')
ORDER BY N
FOR XML PATH(''), TYPE).value('.','NCHAR(30)')
GO
You can read about how this works in my SSC article[/url]
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 15, 2015 at 7:16 am
Wow... thanks for the improvements!
Raymond
July 15, 2015 at 7:19 am
I'm glad that you can use it.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 15, 2015 at 7:34 am
Wayne,
Why NCHAR(30) instead of VARCHAR(30) ? For a phone number, it makes no sense to waste the extra byte per digit when you only need to account for 0 thru 9, and you certainly need not waste the space on all the additional characters out to 30 for each and every record.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 15, 2015 at 7:37 am
Just piling on a little, here is a test harness with three different versions
😎
Echoing Wain's reply, don't recommend using the variable type concatenation as the order is not guarantied (but it is faster than XML concatenation )
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.TEST_CLEAN_NUM_FUNC') IS NOT NULL DROP TABLE dbo.TEST_CLEAN_NUM_FUNC;
CREATE TABLE dbo.TEST_CLEAN_NUM_FUNC
(
TCNF_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TEST_CLEAN_NUM_FUNC_TCFN_ID PRIMARY KEY CLUSTERED
,TCNF_NUMSTR VARCHAR(30) NOT NULL
)
;
GO
IF OBJECT_ID(N'dbo._iTVFN_tEEst') IS NOT NULL DROP FUNCTION dbo._iTVFN_tEEst;
GO
CREATE FUNCTION [dbo]._iTVFN_tEEst (@countryNr INT, @telNr NCHAR(30))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT
(
SELECT SUBSTRING(@telNr,N,1)
FROM (
SELECT TOP(LEN(@telNr)) N
FROM (VALUES (1) ,(2) ,(3) ,(4) ,(5) ,(6) ,(7) ,(8) ,(9) ,(10)
,(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)
,(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)
) AS Y(N)
) AS X(N)
WHERE ((ASCII(SUBSTRING(@telNr,X.N,1)) - 48) & 0x7FFF) < 10
FOR XML PATH(''),TYPE).value('.','NCHAR(30)'
) AS STR_OUT
;
GO
IF OBJECT_ID(N'dbo.__fx_tEEst') IS NOT NULL DROP FUNCTION dbo.__fx_tEEst;
GO
CREATE FUNCTION [dbo].__fx_tEEst (@countryNr INT, @telNr NCHAR(30))
RETURNS CHAR(30)
WITH SCHEMABINDING
AS
BEGIN
DECLARE @fTelNr VARCHAR(30) = ''
SELECT @fTelNr = @fTelNr + SUBSTRING(@telNr,N,1)
FROM (
SELECT TOP(LEN(@telNr)) N
FROM (VALUES (1) ,(2) ,(3) ,(4) ,(5) ,(6) ,(7) ,(8) ,(9) ,(10)
,(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)
,(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)
) AS Y(N)
) AS X(N)
WHERE ((ASCII(SUBSTRING(@telNr,X.N,1)) - 48) & 0x7FFF) < 10
RETURN @fTelNr
END
GO
IF OBJECT_ID(N'dbo.__fx_test') IS NOT NULL DROP FUNCTION dbo.__fx_test;
GO
CREATE FUNCTION [dbo].[__fx_test] (@countryNr INT, @telNr NCHAR(30))
RETURNS CHAR(30)
WITH SCHEMABINDING
AS
BEGIN
DECLARE @fTelNr VARCHAR(30) = ''
SELECT @fTelNr = @fTelNr + SUBSTRING(@telNr,N,1)
FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)) AS x(N)
WHERE N<=LEN(@telNr)
AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')
RETURN @fTelNr
END
GO
DECLARE @SAMPLE_NUMS TABLE (SN_NUM VARCHAR(8))
DECLARE @SAMPLE_DIGIT TABLE (SD_DIGIT VARCHAR(2))
INSERT INTO @SAMPLE_NUMS (SN_NUM)
SELECT
CONVERT(VARCHAR(8),ABS(SAC.object_id) % 10000000,0)
FROM sys.all_columns SAC
INSERT INTO @SAMPLE_DIGIT(SD_DIGIT)
VALUES
('(')
,(')')
,('-')
,(' ')
,('-(')
,(')-')
,(',')
,('.')
,('/')
,('@');
INSERT INTO dbo.TEST_CLEAN_NUM_FUNC(TCNF_NUMSTR)
SELECT TOP 1000000
S1.SD_DIGIT + N1.SN_NUM + S2.SD_DIGIT + N3.SN_NUM + S3.SD_DIGIT
FROM @SAMPLE_DIGIT S1
CROSS JOIN @SAMPLE_NUMS N1
CROSS JOIN @SAMPLE_DIGIT S2
CROSS JOIN @SAMPLE_NUMS N3
CROSS JOIN @SAMPLE_DIGIT S3;
DECLARE @CHAR_BUCKET VARCHAR(30) = '';
DECLARE @timer TABLE (T_TEXT VARCHAR(50) NOT NULL, T_TIME DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()));
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');
SELECT
@CHAR_BUCKET = TF.TCNF_NUMSTR
FROM dbo.TEST_CLEAN_NUM_FUNC TF;
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');
INSERT INTO @timer(T_TEXT) VALUES('[dbo].[__fx_test]');
SELECT
@CHAR_BUCKET = dbo.__fx_test('123', TF.TCNF_NUMSTR)
FROM dbo.TEST_CLEAN_NUM_FUNC TF;
INSERT INTO @timer(T_TEXT) VALUES('[dbo].[__fx_test]');
INSERT INTO @timer(T_TEXT) VALUES('[dbo].[__fx_tEEst]');
SELECT
@CHAR_BUCKET = dbo.__fx_tEEst('123', TF.TCNF_NUMSTR)
FROM dbo.TEST_CLEAN_NUM_FUNC TF;
INSERT INTO @timer(T_TEXT) VALUES('[dbo].[__fx_tEEst]');
INSERT INTO @timer(T_TEXT) VALUES('dbo._iTVFN_tEEst');
SELECT
@CHAR_BUCKET = X.STR_OUT
FROM dbo.TEST_CLEAN_NUM_FUNC TF
CROSS APPLY _iTVFN_tEEst('123', TF.TCNF_NUMSTR) AS X;
INSERT INTO @timer(T_TEXT) VALUES('dbo._iTVFN_tEEst');
SELECT
T.T_TEXT
,DATEDIFF(MICROSECOND,MIN(T.T_TIME),MAX(T.T_TIME)) AS DURATION
FROM @timer T
GROUP BY T.T_TEXT
ORDER BY DURATION;
Results
T_TEXT DURATION
-------------------------------------------------- -----------
DRY RUN 147009
[dbo].[__fx_tEEst] 11648666
dbo._iTVFN_tEEst 16361936
[dbo].[__fx_test] 17223985
July 15, 2015 at 7:52 am
Eirikur,
I changed your (very nice) test harness to put the results into a temp table vs. using @CHAR_BUCKET. My results are:
T_TEXT DURATION
-------------------------------------------------- -----------
DRY RUN 194011
dbo._iTVFN_tEEst 2383161
[dbo].[__fx_tEEst] 13615779
[dbo].[__fx_test] 20967199
Which shows the ITVF to be appreciably faster.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 15, 2015 at 7:53 am
Raymond van Laake (7/15/2015)
For the loop it should be the same:
DECLARE @a CHAR(30)
DECLARE @b-2 VARCHAR(30)
SET @a = 'hello'
SET @b-2 = 'hello'
SELECT LEN(@a), LEN(@b)
results in 5 on both variables
That's not where the concern is. It's the extra storage for the table to hold 30 characters instead of some lesser but variable number, and doubled because of NCHAR instead of CHAR.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 15, 2015 at 7:55 am
sgmunson (7/15/2015)
Wayne,Why NCHAR(30) instead of VARCHAR(30) ? For a phone number, it makes no sense to waste the extra byte per digit when you only need to account for 0 thru 9, and you certainly need not waste the space on all the additional characters out to 30 for each and every record.
Steve,
You're right... I just grabbed the datatype that was being passed in, and didn't even think about it.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 15, 2015 at 8:07 am
WayneS (7/15/2015)
Eirikur,I changed your (very nice) test harness to put the results into a temp table vs. using @CHAR_BUCKET. My results are:
T_TEXT DURATION
-------------------------------------------------- -----------
DRY RUN 194011
dbo._iTVFN_tEEst 2383161
[dbo].[__fx_tEEst] 13615779
[dbo].[__fx_test] 20967199
Which shows the ITVF to be appreciably faster.
Nice touch;-), thanks Wayne!
😎
July 15, 2015 at 8:25 am
What's the final code now?
July 15, 2015 at 8:30 am
Raymond van Laake (7/15/2015)
What's the final code now?
This one here is by far the fastest as Wayne's test clearly shows
😎
IF OBJECT_ID(N'dbo._iTVFN_tEEst') IS NOT NULL DROP FUNCTION dbo._iTVFN_tEEst;
GO
CREATE FUNCTION [dbo]._iTVFN_tEEst (@countryNr INT, @telNr NCHAR(30))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT
(
SELECT SUBSTRING(@telNr,N,1)
FROM (
SELECT TOP(LEN(@telNr)) N
FROM (VALUES (1) ,(2) ,(3) ,(4) ,(5) ,(6) ,(7) ,(8) ,(9) ,(10)
,(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)
,(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)
) AS Y(N)
) AS X(N)
WHERE ((ASCII(SUBSTRING(@telNr,X.N,1)) - 48) & 0x7FFF) < 10
FOR XML PATH(''),TYPE).value('.','NCHAR(30)'
) AS STR_OUT
;
GO
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply