TSQL Fastest way to strip non-numeric characters from a string

  • If I read my SQL correct, one conversion takes 0.2 msec.... way to go! Thanks to all!

  • I am very surprised that the XML_PATH is fastest. I always thought XML handling in SQL Server was horribly inefficient and bloated (especially from a memory-use perspective).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I didn't notice it before, but that's not quite the code that I posted, which is (after modifying to varchar(30) as Steve suggested):

    CREATE FUNCTION dbo._tTVFN_tEEst2 (@countryNr INT, @telNr NCHAR(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (SELECT SUBSTRING(@telNr,N,1)

    FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    ORDER BY N

    FOR XML PATH(''), TYPE).value('.','VARCHAR(30)') AS STR_OUT

    GO

    So, I added this to the test harness (as dbo._iTVFN_tEEst2), and ran it to compare. The results are:

    T_TEXT DURATION

    -------------------------------------------------- -----------

    DRY RUN 182010

    dbo._iTVFN_tEEst2 2026713

    dbo._iTVFN_tEEst 2421165

    [dbo].[__fx_tEEst] 13453769

    [dbo].[__fx_test] 21526231

    So, this is even faster.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • TheSQLGuru (7/15/2015)


    I am very surprised that the XML_PATH is fastest. I always thought XML handling in SQL Server was horribly inefficient and bloated (especially from a memory-use perspective).

    XML shredding is; for building strings, this is the fastest T-SQL way without using CLR.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (7/15/2015)


    TheSQLGuru (7/15/2015)


    I am very surprised that the XML_PATH is fastest. I always thought XML handling in SQL Server was horribly inefficient and bloated (especially from a memory-use perspective).

    XML shredding is; for building strings, this is the fastest T-SQL way without using CLR.

    In this case, one reason the iTVFNs using FOR XML are faster is simply because the server chooses a parallel execution (cost threshold 50)

    😎

  • WayneS (7/15/2015)


    I didn't notice it before, but that's not quite the code that I posted, which is (after modifying to varchar(30) as Steve suggested):

    CREATE FUNCTION dbo._tTVFN_tEEst2 (@countryNr INT, @telNr NCHAR(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (SELECT SUBSTRING(@telNr,N,1)

    FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    ORDER BY N

    FOR XML PATH(''), TYPE).value('.','VARCHAR(30)') AS STR_OUT

    GO

    So, I added this to the test harness (as dbo._iTVFN_tEEst2), and ran it to compare. The results are:

    T_TEXT DURATION

    -------------------------------------------------- -----------

    DRY RUN 182010

    dbo._iTVFN_tEEst2 2026713

    dbo._iTVFN_tEEst 2421165

    [dbo].[__fx_tEEst] 13453769

    [dbo].[__fx_test] 21526231

    So, this is even faster.

    Proportionally identical to the results I'm getting but when I run this with MAXDOP 1 I get this

    😎

    T_TEXT DURATION

    -------------------------------------------------- -----------

    DRY RUN 625035

    [dbo].[__fx_tEEst] 13870793

    dbo._iTVFN_tEEst 16993972

    dbo._tTVFN_tEEst2 17880023

    [dbo].[__fx_test] 20197155

  • WayneS (7/15/2015)


    I didn't notice it before, but that's not quite the code that I posted, which is (after modifying to varchar(30) as Steve suggested):

    CREATE FUNCTION dbo._tTVFN_tEEst2 (@countryNr INT, @telNr NCHAR(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (SELECT SUBSTRING(@telNr,N,1)

    FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    ORDER BY N

    FOR XML PATH(''), TYPE).value('.','VARCHAR(30)') AS STR_OUT

    GO

    So, I added this to the test harness (as dbo._iTVFN_tEEst2), and ran it to compare. The results are:

    T_TEXT DURATION

    -------------------------------------------------- -----------

    DRY RUN 182010

    dbo._iTVFN_tEEst2 2026713

    dbo._iTVFN_tEEst 2421165

    [dbo].[__fx_tEEst] 13453769

    [dbo].[__fx_test] 21526231

    So, this is even faster.

    And using your code within the test harness, and just adjusting for data type instead of methodology, here's the code:

    USE tempdb

    GO

    CREATE FUNCTION dbo.iTVFN_GET_NUMS_ONLY_VC (@telNr nvarchar(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (

    SELECT SUBSTRING(@telNr,N,1)

    FROM (

    VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),

    (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),

    (21),(22),(23),(24),(25),(26),(27),(28),(29),(30)

    ) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    FOR XML PATH(''), TYPE).value('.','VARCHAR(30)') AS STR_OUT

    ;

    GO

    CREATE FUNCTION dbo.iTVFN_GET_NUMS_ONLY_ORDER_VC (@telNr nvarchar(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (

    SELECT SUBSTRING(@telNr,N,1)

    FROM (

    VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),

    (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),

    (21),(22),(23),(24),(25),(26),(27),(28),(29),(30)

    ) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    ORDER BY N

    FOR XML PATH(''), TYPE).value('.','VARCHAR(30)') AS STR_OUT

    ;

    GO

    CREATE FUNCTION dbo.iTVFN_GET_NUMS_ONLY_NC (@telNr nvarchar(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (

    SELECT SUBSTRING(@telNr,N,1)

    FROM (

    VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),

    (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),

    (21),(22),(23),(24),(25),(26),(27),(28),(29),(30)

    ) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    FOR XML PATH(''), TYPE).value('.','NCHAR(30)') AS STR_OUT

    ;

    GO

    CREATE FUNCTION dbo.iTVFN_GET_NUMS_ONLY_ORDER_NC (@telNr nvarchar(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (

    SELECT SUBSTRING(@telNr,N,1)

    FROM (

    VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),

    (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),

    (21),(22),(23),(24),(25),(26),(27),(28),(29),(30)

    ) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    ORDER BY N

    FOR XML PATH(''), TYPE).value('.','NCHAR(30)') AS STR_OUT

    ;

    GO

    CREATE FUNCTION dbo.iTVFN_GET_NUMS_ONLY_C (@telNr nvarchar(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (

    SELECT SUBSTRING(@telNr,N,1)

    FROM (

    VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),

    (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),

    (21),(22),(23),(24),(25),(26),(27),(28),(29),(30)

    ) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    FOR XML PATH(''), TYPE).value('.','CHAR(30)') AS STR_OUT

    ;

    GO

    CREATE FUNCTION dbo.iTVFN_GET_NUMS_ONLY_ORDER_C (@telNr nvarchar(30))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT (

    SELECT SUBSTRING(@telNr,N,1)

    FROM (

    VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),

    (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),

    (21),(22),(23),(24),(25),(26),(27),(28),(29),(30)

    ) AS x(N)

    WHERE N<=LEN(@telNr)

    AND SUBSTRING(@telNr,N,1) LIKE ('[0-9]')

    ORDER BY N

    FOR XML PATH(''), TYPE).value('.','CHAR(30)') AS STR_OUT

    ;

    GO

    SET NOCOUNT ON;

    DECLARE @TEST_CLEAN_NUM_FUNC AS TABLE (

    TCNF_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    TCNF_NUMSTR VARCHAR(30) NOT NULL

    );

    DECLARE @SAMPLE_NUMS TABLE (SN_NUM VARCHAR(8))

    DECLARE @SAMPLE_DIGIT TABLE (SD_DIGIT VARCHAR(2))

    INSERT INTO @SAMPLE_NUMS (SN_NUM)

    SELECT

    CONVERT(VARCHAR(8),ABS(SAC.object_id) % 10000000,0)

    FROM sys.all_columns SAC

    INSERT INTO @SAMPLE_DIGIT(SD_DIGIT)

    VALUES ('('),(')'),('-'),(' '),('-('),(')-'),(','),('.'),('/'),('@');

    INSERT INTO @TEST_CLEAN_NUM_FUNC(TCNF_NUMSTR)

    SELECT TOP 1000000

    S1.SD_DIGIT + N1.SN_NUM + S2.SD_DIGIT + N3.SN_NUM + S3.SD_DIGIT

    FROM @SAMPLE_DIGIT S1

    CROSS JOIN @SAMPLE_NUMS N1

    CROSS JOIN @SAMPLE_DIGIT S2

    CROSS JOIN @SAMPLE_NUMS N3

    CROSS JOIN @SAMPLE_DIGIT S3;

    DECLARE @CHAR_BUCKET VARCHAR(30) = '';

    DECLARE @timer TABLE (

    T_TEXT VARCHAR(50) NOT NULL,

    T_TIME DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME())

    );

    INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');

    SELECT

    @CHAR_BUCKET = TF.TCNF_NUMSTR

    FROM @TEST_CLEAN_NUM_FUNC AS TF;

    INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_VC');

    SELECT @CHAR_BUCKET = FN.STR_OUT

    FROM @TEST_CLEAN_NUM_FUNC AS TF

    CROSS APPLY dbo.iTVFN_GET_NUMS_ONLY_VC(TF.TCNF_NUMSTR) AS FN;

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_VC');

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_ORDER_VC');

    SELECT @CHAR_BUCKET = FN.STR_OUT

    FROM @TEST_CLEAN_NUM_FUNC AS TF

    CROSS APPLY dbo.iTVFN_GET_NUMS_ONLY_ORDER_VC(TF.TCNF_NUMSTR) AS FN;

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_ORDER_VC');

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_NC');

    SELECT @CHAR_BUCKET = FN.STR_OUT

    FROM @TEST_CLEAN_NUM_FUNC AS TF

    CROSS APPLY dbo.iTVFN_GET_NUMS_ONLY_NC(TF.TCNF_NUMSTR) AS FN;

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_NC');

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_ORDER_NC');

    SELECT @CHAR_BUCKET = FN.STR_OUT

    FROM @TEST_CLEAN_NUM_FUNC AS TF

    CROSS APPLY dbo.iTVFN_GET_NUMS_ONLY_ORDER_NC(TF.TCNF_NUMSTR) AS FN;

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_ORDER_NC');

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_C');

    SELECT @CHAR_BUCKET = FN.STR_OUT

    FROM @TEST_CLEAN_NUM_FUNC AS TF

    CROSS APPLY dbo.iTVFN_GET_NUMS_ONLY_C(TF.TCNF_NUMSTR) AS FN;

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_C');

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_ORDER_C');

    SELECT @CHAR_BUCKET = FN.STR_OUT

    FROM @TEST_CLEAN_NUM_FUNC AS TF

    CROSS APPLY dbo.iTVFN_GET_NUMS_ONLY_ORDER_C(TF.TCNF_NUMSTR) AS FN;

    INSERT INTO @timer(T_TEXT) VALUES('dbo.iTVFN_GET_NUMS_ONLY_ORDER_C');

    SELECT T.T_TEXT,

    DATEDIFF(MICROSECOND, MIN(T.T_TIME), MAX(T.T_TIME)) / 1000000. AS AVG_DURATION_USEC

    FROM @timer T

    GROUP BY T.T_TEXT

    ORDER BY AVG_DURATION_USEC;

    DROP FUNCTION dbo.iTVFN_GET_NUMS_ONLY_VC

    GO

    DROP FUNCTION dbo.iTVFN_GET_NUMS_ONLY_ORDER_VC

    GO

    DROP FUNCTION dbo.iTVFN_GET_NUMS_ONLY_NC

    GO

    DROP FUNCTION dbo.iTVFN_GET_NUMS_ONLY_ORDER_NC

    GO

    DROP FUNCTION dbo.iTVFN_GET_NUMS_ONLY_C

    GO

    DROP FUNCTION dbo.iTVFN_GET_NUMS_ONLY_ORDER_C

    GO

    And here are the results:

    T_TEXT AVG_DURATION_USEC

    --------------------------------------------------

    DRY RUN 0.19402000

    dbo.iTVFN_GET_NUMS_ONLY_C 2.40224100

    dbo.iTVFN_GET_NUMS_ONLY_NC 2.49124900

    dbo.iTVFN_GET_NUMS_ONLY_VC 2.69126900

    dbo.iTVFN_GET_NUMS_ONLY_ORDER_VC3.05230500

    dbo.iTVFN_GET_NUMS_ONLY_ORDER_C 3.18431800

    dbo.iTVFN_GET_NUMS_ONLY_ORDER_NC3.32533200

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply