Updated 20190307 with correctly applied ORDER BY statements for each function
Intro
In June, 2014 someone asked if anyone had a StripNonNumeric ITVF function. This request would lead to a few RBAR-free, set-based functions designed to clean strings. For the next few months members of the SQL Server community put their heads together and, after a bunch of testing came up with DigitsOnlyEE, the fastest T-SQL function for removing non-numeric characters from a string available today. Designing DigitsOnlyEE as an inline table valued function makes it possible to speed it up even more by forcing a parallel execution plan with Adam Machanic's make_parallel. A couple tweaks later we also had an AlphaNumericOnly ITVF. In April, 2017 I added AlphaOnly which removes all non-alphabetical characters.
What about PatExclude8K or PatReplace8K?
PatExclude8K and PatReplace8K came out of the aforementioned effort and could be used for removing non-numeric or non-alphanumeric characters as shown below.
--===== PatExclude8K: SELECT NewString FROM dbo.PatExclude8K('???ABC-123!!!','[^0-9]'); -- remove non-numeric characters SELECT NewString FROM dbo.PatExclude8K('???ABC-123!!!','[^0-9A-Za-z]'); -- remove non-alphanumeric --===== PatReplace8K: SELECT NewString FROM dbo.PatReplace8K('???ABC-123!!!','[^0-9]',''); -- remove non-numeric characters SELECT NewString FROM dbo.PatReplace8K('???ABC-123!!!','[^0-9A-Za-z]',''); -- remove non-alphanumeric
DigitsOnlyEE and AlphaNumericOnly are much faster and better suited for this task, let me explain why: PatExclude8K and PatReplace8K both use a tally table to split the string into unigrams, then use the FOR XML PATH('') trick to put the string back together excluding characters that match the exclusion pattern (@pattern). Here's the code from PatExlclude8K:
... SELECT NewString = ( SELECT SUBSTRING(@String,N,1) FROM iTally WHERE 0 = PATINDEX(@Pattern,SUBSTRING(@String COLLATE Latin1_General_BIN,N,1)) FOR XML PATH(''),TYPE ).value('.[1]','varchar(8000)');
The TYPE clause specifies to return the data as an XML type. The .value('.','varchar(max)') takes each value, and converts it into a varchar(max) data type. The combination of the TYPE and .value means that values are created at XML tags (such as the ampersand (&), and the greater than (>) and less than (<) signs), will not be tokenized into their XML representations and will remain as is.
We don't need to include the TYPE and .value('.[1]', 'varchar(8000)') code for DigitsOnlyEE or AlphaNumericOnly because the preserved characters are numbers and letters only. There are other performance enhancements which are documented in the comment section of each function.
Happy string manipulating!
DigitsOnlyEE
IF OBJECT_ID('dbo.digitsOnlyEE', 'IF') IS NOT NULL DROP FUNCTION dbo.digitsOnlyEE;
GO
CREATE FUNCTION dbo.digitsOnlyEE
(
@pString VARCHAR(8000)
)
/****************************************************************************************
[Purpose]:
Given a VARCHAR(8000) or less string, return only the numeric digits from the string.
[Author]:
Alan Burstein, EE, Jeff Moden
[Compatibility]:
SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse
[Parameters]:
@pString = VARCHAR(8000); Input string to be cleaned
[Returns]:
digitsOnly = NVARCHAR(MAX)
[Syntax]:
--===== Autonomous
SELECT ca.digitsOnly
FROM dbo.digitsOnly(@pString) AS ca;
--===== CROSS APPLY example
SELECT ca.digitsOnly
FROM dbo.SomeTable
CROSS APPLY dbo.digitsOnly(SomeVarcharCol) AS ca;
[Dependencies]:
N/A
[Developer Notes]:
1. This is an iTVF (Inline Table Valued Function) that performs the same task as a
scalar user defined function (UDF) accept that it requires the APPLY table operator.
Note the usage examples below and See this article for more details:
http://www.sqlservercentral.com/articles/T-SQL/91724/
The function will be slightly more complicated to use than a scalar UDF but will yeild
much better performance. For example - unlike a scalar UDF, this function does not
restrict the query optimizer's ability generate a parallel query plan. Initial testing
showed that the function generally gets a parallel execution plan
2. Runs 2-4 times faster with a parallel query plan. For optimal performance use Adam
Machanic's make_parallel() function (provided that you are on a machine with two or
more logical CPUs). make_parallel can be found here:
sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx
3. This is an iTVF (Inline Table Valued Function) that will be used as an iSF
(Inline Scalar Function) in that it returns a single value in the returned
table and should normally be used in the FROM clause as with any other iTVF.
4. CHECKSUM returns an INT and will return the exact number given if given an
INT to begin with. It's also faster than a CAST or CONVERT and is used as a
performance enhancer by changing the bigint of ROW_NUMBER() to a more
appropriately sized INT.
5. Another performance enhancement is using a WHERE clause calculation to prevent
the relatively expensive XML PATH concatentation of empty strings normally
determined by a CASE statement in the XML "loop".
6. Another performance enhancement is not making this function a generic function
that could handle a pattern. That allows us to use all integer math to do the
comparison using the high speed ASCII function convert characters to their numeric
equivalent. ASCII characters 48 through 57 are the digit characters of 0 through 9.
7. Keep in mind that digitsOnlyEE returns an nvarchar(max) value. If you are returning
small numbers consider casting or converting yout values to a numeric data type if
you are inserting the return value into a new table or using it for joins or comparison
purposes.
8. digitsOnlyEE is deterministic; for more about deterministic and nondeterministic
functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
Kudos:
1. Hats off to Eirikur Eiriksson for the ASCII conversion idea and for the reminders
that dedicated functions will always be faster than generic functions and that
integer math beats the tar out of character comparisons that use LIKE or PATINDEX.
2. Hats off to all of the good people that submitted and tested their code on the
following thread. It's this type of participation and interest that makes code
better. You've just gotta love this commmunity.
http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
[Examples]:
--===== 1. Basic use against a literal
SELECT f.digitsOnly
FROM dbo.digitsOnlyEE('xxx123abc999!!!') AS f;
--===== 2. Against a table
DECLARE @sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT @sampleTxt(txt) VALUES ('abc123'),('!!!555!!!'),('000.999'),(NULL);
SELECT st.txtID, OldTxt = st.txt, f.digitsOnly
FROM @sampleTxt AS st
CROSS APPLY dbo.digitsOnlyEE(st.txt) AS f;
---------------------------------------------------------------------------------------
[Revision History]:
Rev 00 - 20141029 - Initial Creation - Jeff Moden
Rev 01 - 20141210 - TOP clause changed to handle NULL inputs - Eirikur Eiriksson
Rev 02 - 20160512 - Substantial updates to the comments & examples - Alan Burstein
Rev 03 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
E1(N) AS
(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N)
),
iTally(N) AS
(
SELECT TOP (LEN(ISNULL(@pString,CHAR(32))))
(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)))
FROM E1 AS a CROSS JOIN E1 AS b CROSS JOIN E1 AS c CROSS JOIN E1 AS d
)
SELECT digitsOnly =
(
SELECT SUBSTRING(@pString,CHECKSUM(N),1)
FROM iTally AS i
WHERE ((ASCII(SUBSTRING(@pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10
ORDER BY i.N
FOR XML PATH('')
);
GO
AlphaNumericOnly
IF OBJECT_ID('dbo.alphaNumericOnly8K', 'IF') IS NOT NULL
DROP FUNCTION dbo.alphaNumericOnly8K;
GO
CREATE FUNCTION dbo.alphaNumericOnly8K
(
@pString varchar(8000)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
/****************************************************************************************
[Purpose]:
Given a varchar(8000) string or smaller, this function strips all but the alphanumeric
characters that exist in @pString.
[Author]:
Alan Burstein, EE, Jeff Moden
[Compatibility]:
SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse
[Parameters]:
@pString = VARCHAR(8000); Input string to be cleaned
[Returns]:
AlphaNumericOnly - VARCHAR(8000)
[Syntax]:
--===== Autonomous
SELECT ca.AlphaNumericOnly
FROM dbo.AlphaNumericOnly(@pString) AS ca;
--===== CROSS APPLY example
SELECT ca.AlphaNumericOnly
FROM dbo.SomeTable AS st
CROSS APPLY dbo.AlphaNumericOnly(st.SomeVarcharCol) AS ca;
[Dependencies]:
N/A
[Developer Notes]:
1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see:
http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
2. This is an iTVF (Inline Table Valued Function) that performs the same task as a
scalar user defined function (UDF) accept that it requires the APPLY table operator.
Note the usage examples below and see this article for more details:
http://www.sqlservercentral.com/articles/T-SQL/91724/
The function will be slightly more complicated to use than a scalar UDF but will yeild
much better performance. For example - unlike a scalar UDF, this function does not
restrict the query optimizer's ability generate a parallel query plan. Initial testing
showed that the function generally gets a parallel execution plan.
3. AlphaNumericOnly runs 2-4 times faster when using make_parallel() (provided that you
have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance).
4. This is an iTVF (Inline Table Valued Function) that will be used as an iSF (Inline
Scalar Function) in that it returns a single value in the returned table and should
normally be used in the FROM clause as with any other iTVF.
5. CHECKSUM returns an INT and will return the exact number given if given an INT to
begin with. It's also faster than a CAST or CONVERT and is used as a performance
enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT.
6. Another performance enhancement is using a WHERE clause calculation to prevent
the relatively expensive XML PATH concatentation of empty strings normally
determined by a CASE statement in the XML "loop".
7. Note that AlphaNumericOnly returns an nvarchar(max) value. If you are returning small
numbers consider casting or converting yout values to a numeric data type if you are
inserting the return value into a new table or using it for joins or comparison purposes.
8. AlphaNumericOnly is deterministic; for more about deterministic and nondeterministi
functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
[Examples]:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM dbo.alphaNumericOnly8K('xxx123abc999!!!') AS ao;
--===== 2. Against a table
DECLARE @sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT @sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT st.txtID, OldTxt = st.txt, f.AlphaNumericOnly
FROM @sampleTxt AS st
CROSS APPLY dbo.alphaNumericOnly8K(st.txt) AS f;
---------------------------------------------------------------------------------------
Rev 00 - 20150526 - Inital Creation - Alan Burstein
Rev 01 - 20150526 - 3rd line in WHERE clause to correct something that was missed
- Eirikur Eiriksson
Rev 02 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
WITH
E1(N) AS
(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS x(N)
),
iTally(N) AS
(
SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 AS a CROSS JOIN E1 AS b CROSS JOIN E1 AS c CROSS JOIN E1 AS d
)
SELECT AlphaNumericOnly =
(
SELECT SUBSTRING(@pString,CHECKSUM(N),1)
FROM iTally AS i
WHERE
((ASCII(SUBSTRING(@pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10
OR ((ASCII(SUBSTRING(@pString,CHECKSUM(N),1)) - 65) & 0x7FFF) < 26
OR ((ASCII(SUBSTRING(@pString,CHECKSUM(N),1)) - 97) & 0x7FFF) < 26
ORDER BY i.N
FOR XML PATH(''));
GO
AlphaOnly
IF OBJECT_ID('samd.alphaOnly8K') IS NOT NULL DROP FUNCTION samd.alphaOnly8K; GO CREATE FUNCTION samd.alphaOnly8K ( @pString VARCHAR(8000) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN /***************************************************************************************** [Purpose]: Given a VARCHAR(8000) or less string, returns only the alphabetical digits from the input string (@pString). [Author]: Alan Burstein, EE, Jeff Moden [Compatibility]: SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse [Parameters]: @pString = varchar(8000); Input string to be cleaned [Returns]: AlphaOnly = varchar(8000) [Syntax]: --===== Autonomous SELECT ca.AlphaOnly FROM dbo.AlphaOnly(@pString) AS ca; --===== CROSS APPLY example SELECT ca.AlphaOnly FROM dbo.SomeTable st CROSS APPLY dbo.AlphaOnly(st.SomeVarcharCol) AS ca; [Dependencies]: N/A [Developer Notes]: 1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see: https://goo.gl/Qo8rpy 2. This is an iTVF (Inline Table Valued Function) that performs the same task as a scalar user defined function (UDF) accept that it requires the APPLY table operator. Note the usage examples below and see this article for more details: https://goo.gl/3utbgx The function will be slightly more complicated to use than a scalar UDF but will yeild much better performance. For example - unlike a scalar UDF, this function does not restrict the query optimizer's ability generate a parallel query plan. Initial testing showed that the function generally gets a 3. AlphaOnly runs 2-4 times faster when using make_parallel() (provided that you have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance). 4. CHECKSUM returns an INT and will return the exact number given if given an INT to begin with. It's also faster than a CAST or CONVERT and is used as a performance enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT. 5. Another performance enhancement is using a WHERE clause calculation to prevent the relatively expensive XML PATH concatentation of empty strings normally determined by a CASE statement in the XML "loop". 6. Note that AlphaOnly returns an nvarchar(max) value. If you are returning small numbers consider casting or converting yout values to a numeric data type if you are inserting the return value into a new table or using it for joins or comparison purposes. 8. AlphaOnly is deterministic; for more about deterministic and nondeterministic functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx [Examples]: --===== 1. Basic use against a literal SELECT ao.AlphaOnly FROM samd.alphaOnly8K('xxx123abc999!!!') AS ao; --===== 2. Against a table DECLARE @sampleTxt TABLE (txtID int identity, txt varchar(100)); INSERT @sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999'); SELECT st.txtID, OldTxt = st.txt, AlphaOnly FROM @sampleTxt AS st CROSS APPLY samd.alphaOnly8K(st.txt) AS f; --------------------------------------------------------------------------------------- [Revision History]: Rev 00 - 20170411 - Inital Creation - Alan Burstein Rev 01 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside the final cte (digitsonly) so that ORDER BY N does not get sorted Rev 03 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside the final cte (digitsonly) so that ORDER BY N does not get sorted. ****************************************************************************************/WITH E1 AS (SELECT N FROM (VALUES ($),($),($),($),($),($),($),($),($),($)) AS x(N)), iTally(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E1 AS a CROSS JOIN E1 AS b CROSS JOIN E1 AS c CROSS JOIN E1 AS d ) SELECT AlphaOnly = ( SELECT SUBSTRING(@pString,CHECKSUM(i.N),1) FROM iTally AS i WHERE ((ASCII(SUBSTRING(@pString,CHECKSUM(i.N),1)) - 65) & 0x7FFF) < 26 OR ((ASCII(SUBSTRING(@pString,CHECKSUM(i.N),1)) - 97) & 0x7FFF) < 26 ORDER BY i.N FOR XML PATH('') ); GO