how to remove characters char(0) to char(31)

  • Jeff Moden (11/19/2012)


    @Mark,

    That was also pretty quick. Have you been using such a thing? If so, what's your take on performance here (And, yes, I still do the test because I said I would and because I'm insanely curious about these things. 🙂 )

    @jeff

    Nope, I don't tend to use things like this. Although performance-wise it's probably faster than pure T-SQL,

    I find the overhead of deployment and maintenance doesn't make it worthwhile. Unless this sort of function is going

    to be very heavily used and is performance critical, I'd stick to a nested REPLACEs. We only use SQLCLRs for a few

    functions that can't be practically implemented in T-SQL, Levenshtein distance for strings for example.

    Writing a half dozen lines of C# shouldn't take long anyway and it was kinda fun 🙂

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (11/19/2012)


    Jeff Moden (11/19/2012)


    @Mark,

    That was also pretty quick. Have you been using such a thing? If so, what's your take on performance here (And, yes, I still do the test because I said I would and because I'm insanely curious about these things. 🙂 )

    @jeff

    Nope, I don't tend to use things like this. Although performance-wise it's probably faster than pure T-SQL,

    I find the overhead of deployment and maintenance doesn't make it worthwhile. Unless this sort of function is going

    to be very heavily used and is performance critical, I'd stick to a nested REPLACEs. We only use SQLCLRs for a few

    functions that can't be practically implemented in T-SQL, Levenshtein distance for strings for example.

    Writing a half dozen lines of C# shouldn't take long anyway and it was kinda fun 🙂

    I'm of the same ilk. It's frequently not worth the performance gain (if there is one) to handle separately managed code, etc.

    Speaking of performance gains, there frequently either isn't one or it actually causes a performance problem. I just spent a bunch of time replacing some audit log triggers that were written as a CLR because someone wanted it so that it would automatically figure out if you added a column to the given table. It would take 4 minutes to update just 5 columns on a 137 column (don't ask... I didn't design it). I wrote some code to build hardcoded triggers from the table schema and 10K rows could suddenly be upated in milliseconds.

    Things like the Levenshtein distance algorithm are a whole 'nuther story.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/19/2012)


    ...

    To answer the rest, the only way to know for sure is to run a test or two. If someone would create a CLR to drop characters 0 thru 31 and 127, script it out and send it to me (I don't have a C# environment setup and wouldn't know what to do with it if I did), I'd be happy to test it, publish the results, and share the documented test code.

    First, I would really like to mentioned that in my previous post I've tried to make the attention to a bit unfair comparison between dedicated UDF and generic CLR wrapper around RegEx.

    Now looking into more details of this OP question...

    I can see that to do what OP needs I wouldn't not use RegEx at all!

    However I wouldn't use StringBuilder either...

    Here is my version of CLR based on plain char array:

    [Microsoft.SqlServer.Server.SqlFunction]

    public static String RemoveCrap(String val)

    {

    if (val == null) return null;

    char[] chrs = new char[val.Length];

    int indx = 0;

    foreach (char c in val)

    {

    if ((int) c > 31 && (int) c != 127)

    {

    chrs[indx] = c;

    indx++;

    }

    }

    return new string(chrs, 0, indx);

    }

    Now, using Jeff's test setup some performance testing:

    DECLARE @Start DATETIME, @End DATETIME

    DECLARE @var varchar(max)

    PRINT 'T-SQL nested REPLACE'

    SET @Start = GETUTCDATE()

    SELECT @var = ca.CleanedString

    FROM #TestTable tt

    CROSS APPLY dbo.DropControlCharacters(tt.SomeString) ca

    PRINT DATEDIFF(ms,@start,GETUTCDATE());

    PRINT 'CLR with String Builder'

    SET @Start = GETUTCDATE()

    SELECT @var = dbo.RemoveCtrlChars(tt.SomeString)

    FROM #TestTable tt

    PRINT DATEDIFF(ms,@start,GETUTCDATE());

    PRINT 'CLR plain'

    SET @Start = GETUTCDATE()

    SELECT @var = dbo.RemoveCrap(tt.SomeString)

    FROM #TestTable tt

    PRINT DATEDIFF(ms,@start,GETUTCDATE());

    And here results I got:

    T-SQL nested REPLACE

    2503

    CLR with String Builder

    546

    CLR plain

    343

    CLR with StringBuilder is five times faster than T-SQL UDF with nested REPLACE.

    CLR based on char array gives another 40% of boost.

    Yes, CLR functions do add a bit more to database maintenance, but their performance (eg. in this case) justify such cost easily, at least for myself...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • hi i have this test code which replace our problem on production

    first replace for CHAR(0) removes character 'C' --- I dont know Why

    when u set input to for example 'Calling' C dont go away

    Can u help me?

    DECLARE @badStrings TABLE (item NVARCHAR(1))

    DECLARE @input VARCHAR(250)

    SET @input='Chluemcká' --COLLATE SQL_Latin1_General_CP1_CI_AS

    INSERT INTO @badStrings(item)

    SELECT CHAR(0) UNION ALL

    SELECT CHAR(1) UNION ALL

    SELECT CHAR(2) UNION ALL

    SELECT CHAR(3) UNION ALL

    SELECT CHAR(4) UNION ALL

    SELECT CHAR(5) UNION ALL

    SELECT CHAR(6) UNION ALL

    SELECT CHAR(7) UNION ALL

    SELECT CHAR(8) UNION ALL

    SELECT CHAR(11) UNION ALL

    SELECT CHAR(12) UNION ALL

    SELECT CHAR(14) UNION ALL

    SELECT CHAR(15) UNION ALL

    SELECT CHAR(16) UNION ALL

    SELECT CHAR(17) UNION ALL

    SELECT CHAR(18) UNION ALL

    SELECT CHAR(19) UNION ALL

    SELECT CHAR(20) UNION ALL

    SELECT CHAR(21) UNION ALL

    SELECT CHAR(22) UNION ALL

    SELECT CHAR(23) UNION ALL

    SELECT CHAR(24) UNION ALL

    SELECT CHAR(25) UNION ALL

    SELECT CHAR(26) UNION ALL

    SELECT CHAR(27) UNION ALL

    SELECT CHAR(28) UNION ALL

    SELECT CHAR(29) UNION ALL

    SELECT CHAR(30) UNION ALL

    SELECT CHAR(31)

    SELECT @input

    SELECT * FROM @badStrings bs

    DECLARE @zank nvarchar(1)

    DECLARE znaky CURSOR FAST_FORWARD READ_ONLY FOR SELECT bs.item FROM @badStrings bs

    OPEN znaky

    FETCH NEXT FROM znaky INTO @zank

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SELECT @input = REPLACE(@input, item, ' ') FROM @badStrings

    SELECT @input

    FETCH NEXT FROM znaky INTO @zank

    END

    CLOSE znaky

    DEALLOCATE znaky

  • lady.konvicka (11/15/2016)


    hi i have this test code which replace our problem on production

    first replace for CHAR(0) removes character 'C' --- I dont know Why

    when u set input to for example 'Calling' C dont go away

    Can u help me?

    DECLARE @badStrings TABLE (item NVARCHAR(1))

    DECLARE @input VARCHAR(250)

    SET @input='Chluemcká' --COLLATE SQL_Latin1_General_CP1_CI_AS

    INSERT INTO @badStrings(item)

    SELECT CHAR(0) UNION ALL

    SELECT CHAR(1) UNION ALL

    SELECT CHAR(2) UNION ALL

    SELECT CHAR(3) UNION ALL

    SELECT CHAR(4) UNION ALL

    SELECT CHAR(5) UNION ALL

    SELECT CHAR(6) UNION ALL

    SELECT CHAR(7) UNION ALL

    SELECT CHAR(8) UNION ALL

    SELECT CHAR(11) UNION ALL

    SELECT CHAR(12) UNION ALL

    SELECT CHAR(14) UNION ALL

    SELECT CHAR(15) UNION ALL

    SELECT CHAR(16) UNION ALL

    SELECT CHAR(17) UNION ALL

    SELECT CHAR(18) UNION ALL

    SELECT CHAR(19) UNION ALL

    SELECT CHAR(20) UNION ALL

    SELECT CHAR(21) UNION ALL

    SELECT CHAR(22) UNION ALL

    SELECT CHAR(23) UNION ALL

    SELECT CHAR(24) UNION ALL

    SELECT CHAR(25) UNION ALL

    SELECT CHAR(26) UNION ALL

    SELECT CHAR(27) UNION ALL

    SELECT CHAR(28) UNION ALL

    SELECT CHAR(29) UNION ALL

    SELECT CHAR(30) UNION ALL

    SELECT CHAR(31)

    SELECT @input

    SELECT * FROM @badStrings bs

    DECLARE @zank nvarchar(1)

    DECLARE znaky CURSOR FAST_FORWARD READ_ONLY FOR SELECT bs.item FROM @badStrings bs

    OPEN znaky

    FETCH NEXT FROM znaky INTO @zank

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SELECT @input = REPLACE(@input, item, ' ') FROM @badStrings

    SELECT @input

    FETCH NEXT FROM znaky INTO @zank

    END

    CLOSE znaky

    DEALLOCATE znaky

    You better start a new thread with your question, not many are going to be watching this one.

    😎

  • Eirikur Eiriksson (11/15/2016)


    lady.konvicka (11/15/2016)


    hi i have this test code which replace our problem on production

    first replace for CHAR(0) removes character 'C' --- I dont know Why

    when u set input to for example 'Calling' C dont go away

    Can u help me?

    You better start a new thread with your question, not many are going to be watching this one.

    😎

    This seems like a solution to the original problem. It also does a lot more work than it should.

    CHAR(0) doesn't remove character 'C'. It shouldn't as there's nothing that relates both of those characters.

    Here's a short version of the code with all unnecessary steps removed.

    DECLARE @badStrings TABLE (item NVARCHAR(1))

    DECLARE @input VARCHAR(250)

    SET @input='Chlu' + CHAR(4) + 'emcká' COLLATE SQL_Latin1_General_CP1_CI_AS

    INSERT INTO @badStrings(item)

    VALUES

    (CHAR(0) ),(CHAR(1) ),(CHAR(2) ),(CHAR(3) ),(CHAR(4) ),(CHAR(5) ),

    (CHAR(6) ),(CHAR(7) ),(CHAR(8) ),(CHAR(11)),(CHAR(12)),(CHAR(14)),

    (CHAR(15)),(CHAR(16)),(CHAR(17)),(CHAR(18)),(CHAR(19)),(CHAR(20)),

    (CHAR(21)),(CHAR(22)),(CHAR(23)),(CHAR(24)),(CHAR(25)),(CHAR(26)),

    (CHAR(27)),(CHAR(28)),(CHAR(29)),(CHAR(30)),(CHAR(31));

    SELECT @input = REPLACE(@input, item, '') FROM @badStrings;

    SELECT @input;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Eirikur Eiriksson (11/15/2016)


    ...

    You better start a new thread with your question, not many are going to be watching this one.

    😎

    Too late, it's sitting in the 'Active Threads' queue, so everyone will drop by to take a look.

    However, this should nail it shut.

    create table MySample

    (

    col1 varchar(30) null

    constraint cc_col1_InvalidData check

    (case when col1 like '%['+char(0)+'-'+char(31)+']%'

    then 0

    else 1

    end = 1)

    );

    insert into MySample ( col1 )

    values ('John'+char(9));

    Msg 547, Level 16, State 0, Line 18

    The INSERT statement conflicted with the CHECK constraint "cc_col1_InvalidData".

    The conflict occurred in database "Test", table "dbo.MySample", column 'col1'.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • its from our supporter and we are triing to resolve it, i dont know that its from the internet, but thanks

  • Here is a quick solution, quite efficient

    😎

    DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

    ;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

    , NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

    SELECT

    (

    SELECT SUBSTRING(@pString,NM.N,1)

    FROM NUMS NM

    WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

    FOR XML PATH('')

    ) AS OUT_STR

    ;

  • Eirikur Eiriksson (11/15/2016)


    Here is a quick solution, quite efficient

    😎

    DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

    ;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

    , NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

    SELECT

    (

    SELECT SUBSTRING(@pString,NM.N,1)

    FROM NUMS NM

    WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

    FOR XML PATH('')

    ) AS OUT_STR

    ;

    Try this and see what you get...

    DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + '&' + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

    ;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

    , NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

    SELECT

    (

    SELECT SUBSTRING(@pString,NM.N,1)

    FROM NUMS NM

    WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

    FOR XML PATH('')

    ) AS OUT_STR

    ;

    You need to use "TYPE" and that will make it about twice as slow.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/15/2016)


    Eirikur Eiriksson (11/15/2016)


    Here is a quick solution, quite efficient

    😎

    DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

    ;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

    , NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

    SELECT

    (

    SELECT SUBSTRING(@pString,NM.N,1)

    FROM NUMS NM

    WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

    FOR XML PATH('')

    ) AS OUT_STR

    ;

    Try this and see what you get...

    DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + '&' + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

    ;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

    , NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

    SELECT

    (

    SELECT SUBSTRING(@pString,NM.N,1)

    FROM NUMS NM

    WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

    FOR XML PATH('')

    ) AS OUT_STR

    ;

    You need to use "TYPE" and that will make it about twice as slow.

    One would use the text() function to mitigate the cost, makes a big difference by avoiding the XML reconstruction phase of the output.

    😎

    The cost ratio between the first and the second query is close to 1:99

    DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + '&' + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

    --Better

    ;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

    , NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

    SELECT

    (

    SELECT SUBSTRING(@pString,NM.N,1)

    FROM NUMS NM

    WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

    FOR XML PATH(''), TYPE

    ).value('(./text())[1]','VARCHAR(8000)') AS OUT_STR

    ;

    -- Bad

    ;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

    , NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

    SELECT

    (

    SELECT SUBSTRING(@pString,NM.N,1)

    FROM NUMS NM

    WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

    FOR XML PATH(''), TYPE

    ).value('(.)[1]','VARCHAR(8000)') AS OUT_STR

    ;

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply