how to remove characters char(0) to char(31)

Question

how to remove characters char(0) to char(31)

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 1

Jeff Moden (11/19/2012)
@Mark,
That was also pretty quick. Have you been using such a thing? If so, what's your take on performance here (And, yes, I still do the test because I said I would and because I'm insanely curious about these things. 🙂 )

@jeff

Nope, I don't tend to use things like this. Although performance-wise it's probably faster than pure T-SQL,

I find the overhead of deployment and maintenance doesn't make it worthwhile. Unless this sort of function is going

to be very heavily used and is performance critical, I'd stick to a nested REPLACEs. We only use SQLCLRs for a few

functions that can't be practically implemented in T-SQL, Levenshtein distance for strings for example.

Writing a half dozen lines of C# shouldn't take long anyway and it was kinda fun 🙂

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 2

Mark-101232 (11/19/2012)
Jeff Moden (11/19/2012)
@Mark,
That was also pretty quick. Have you been using such a thing? If so, what's your take on performance here (And, yes, I still do the test because I said I would and because I'm insanely curious about these things. 🙂 )
@jeff
Nope, I don't tend to use things like this. Although performance-wise it's probably faster than pure T-SQL,
I find the overhead of deployment and maintenance doesn't make it worthwhile. Unless this sort of function is going
to be very heavily used and is performance critical, I'd stick to a nested REPLACEs. We only use SQLCLRs for a few
functions that can't be practically implemented in T-SQL, Levenshtein distance for strings for example.
Writing a half dozen lines of C# shouldn't take long anyway and it was kinda fun 🙂

I'm of the same ilk. It's frequently not worth the performance gain (if there is one) to handle separately managed code, etc.

Speaking of performance gains, there frequently either isn't one or it actually causes a performance problem. I just spent a bunch of time replacing some audit log triggers that were written as a CLR because someone wanted it so that it would automatically figure out if you added a column to the given table. It would take 4 minutes to update just 5 columns on a 137 column (don't ask... I didn't design it). I wrote some code to build hardcoded triggers from the table schema and 10K rows could suddenly be upated in milliseconds.

Things like the Levenshtein distance algorithm are a whole 'nuther story.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 3

Jeff Moden (11/19/2012)

...
To answer the rest, the only way to know for sure is to run a test or two. If someone would create a CLR to drop characters 0 thru 31 and 127, script it out and send it to me (I don't have a C# environment setup and wouldn't know what to do with it if I did), I'd be happy to test it, publish the results, and share the documented test code.

First, I would really like to mentioned that in my previous post I've tried to make the attention to a bit unfair comparison between dedicated UDF and generic CLR wrapper around RegEx.

Now looking into more details of this OP question...

I can see that to do what OP needs I wouldn't not use RegEx at all!

However I wouldn't use StringBuilder either...

Here is my version of CLR based on plain char array:

[Microsoft.SqlServer.Server.SqlFunction]

public static String RemoveCrap(String val)

{

if (val == null) return null;

char[] chrs = new char[val.Length];

int indx = 0;

foreach (char c in val)

{

if ((int) c > 31 && (int) c != 127)

{

chrs[indx] = c;

indx++;

}

return new string(chrs, 0, indx);

}

Now, using Jeff's test setup some performance testing:

DECLARE @Start DATETIME, @End DATETIME

DECLARE @var varchar(max)

PRINT 'T-SQL nested REPLACE'

SET @Start = GETUTCDATE()

SELECT @var = ca.CleanedString

FROM #TestTable tt

CROSS APPLY dbo.DropControlCharacters(tt.SomeString) ca

PRINT DATEDIFF(ms,@start,GETUTCDATE());

PRINT 'CLR with String Builder'

SET @Start = GETUTCDATE()

SELECT @var = dbo.RemoveCtrlChars(tt.SomeString)

FROM #TestTable tt

PRINT DATEDIFF(ms,@start,GETUTCDATE());

PRINT 'CLR plain'

SET @Start = GETUTCDATE()

SELECT @var = dbo.RemoveCrap(tt.SomeString)

FROM #TestTable tt

PRINT DATEDIFF(ms,@start,GETUTCDATE());

And here results I got:

T-SQL nested REPLACE

2503

CLR with String Builder

546

CLR plain

343

CLR with StringBuilder is five times faster than T-SQL UDF with nested REPLACE.

CLR based on char array gives another 40% of boost.

Yes, CLR functions do add a bit more to database maintenance, but their performance (eg. in this case) justify such cost easily, at least for myself...

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

lady.konvicka SSC Enthusiast Points: 108 More actions · Answer 4

hi i have this test code which replace our problem on production

first replace for CHAR(0) removes character 'C' --- I dont know Why

when u set input to for example 'Calling' C dont go away

Can u help me?

DECLARE @badStrings TABLE (item NVARCHAR(1))

DECLARE @input VARCHAR(250)

SET @input='Chluemcká' --COLLATE SQL_Latin1_General_CP1_CI_AS

INSERT INTO @badStrings(item)

SELECT CHAR(0) UNION ALL

SELECT CHAR(1) UNION ALL

SELECT CHAR(2) UNION ALL

SELECT CHAR(3) UNION ALL

SELECT CHAR(4) UNION ALL

SELECT CHAR(5) UNION ALL

SELECT CHAR(6) UNION ALL

SELECT CHAR(7) UNION ALL

SELECT CHAR(8) UNION ALL

SELECT CHAR(11) UNION ALL

SELECT CHAR(12) UNION ALL

SELECT CHAR(14) UNION ALL

SELECT CHAR(15) UNION ALL

SELECT CHAR(16) UNION ALL

SELECT CHAR(17) UNION ALL

SELECT CHAR(18) UNION ALL

SELECT CHAR(19) UNION ALL

SELECT CHAR(20) UNION ALL

SELECT CHAR(21) UNION ALL

SELECT CHAR(22) UNION ALL

SELECT CHAR(23) UNION ALL

SELECT CHAR(24) UNION ALL

SELECT CHAR(25) UNION ALL

SELECT CHAR(26) UNION ALL

SELECT CHAR(27) UNION ALL

SELECT CHAR(28) UNION ALL

SELECT CHAR(29) UNION ALL

SELECT CHAR(30) UNION ALL

SELECT CHAR(31)

SELECT @input

SELECT * FROM @badStrings bs

DECLARE @zank nvarchar(1)

DECLARE znaky CURSOR FAST_FORWARD READ_ONLY FOR SELECT bs.item FROM @badStrings bs

OPEN znaky

FETCH NEXT FROM znaky INTO @zank

WHILE @@FETCH_STATUS = 0

BEGIN

SELECT @input = REPLACE(@input, item, ' ') FROM @badStrings

SELECT @input

FETCH NEXT FROM znaky INTO @zank

END

CLOSE znaky

DEALLOCATE znaky

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 5

lady.konvicka (11/15/2016)
hi i have this test code which replace our problem on production
first replace for CHAR(0) removes character 'C' --- I dont know Why
when u set input to for example 'Calling' C dont go away
Can u help me?
DECLARE @badStrings TABLE (item NVARCHAR(1))
DECLARE @input VARCHAR(250)
SET @input='Chluemcká' --COLLATE SQL_Latin1_General_CP1_CI_AS
INSERT INTO @badStrings(item)
SELECT CHAR(0) UNION ALL
SELECT CHAR(1) UNION ALL
SELECT CHAR(2) UNION ALL
SELECT CHAR(3) UNION ALL
SELECT CHAR(4) UNION ALL
SELECT CHAR(5) UNION ALL
SELECT CHAR(6) UNION ALL
SELECT CHAR(7) UNION ALL
SELECT CHAR(8) UNION ALL
SELECT CHAR(11) UNION ALL
SELECT CHAR(12) UNION ALL
SELECT CHAR(14) UNION ALL
SELECT CHAR(15) UNION ALL
SELECT CHAR(16) UNION ALL
SELECT CHAR(17) UNION ALL
SELECT CHAR(18) UNION ALL
SELECT CHAR(19) UNION ALL
SELECT CHAR(20) UNION ALL
SELECT CHAR(21) UNION ALL
SELECT CHAR(22) UNION ALL
SELECT CHAR(23) UNION ALL
SELECT CHAR(24) UNION ALL
SELECT CHAR(25) UNION ALL
SELECT CHAR(26) UNION ALL
SELECT CHAR(27) UNION ALL
SELECT CHAR(28) UNION ALL
SELECT CHAR(29) UNION ALL
SELECT CHAR(30) UNION ALL
SELECT CHAR(31)
SELECT @input
SELECT * FROM @badStrings bs
DECLARE @zank nvarchar(1)
DECLARE znaky CURSOR FAST_FORWARD READ_ONLY FOR SELECT bs.item FROM @badStrings bs
OPEN znaky
FETCH NEXT FROM znaky INTO @zank
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @input = REPLACE(@input, item, ' ') FROM @badStrings
SELECT @input
FETCH NEXT FROM znaky INTO @zank
END
CLOSE znaky
DEALLOCATE znaky

You better start a new thread with your question, not many are going to be watching this one.

😎

Luis Cazares SSC Guru Points: 183706 More actions · Answer 6

Eirikur Eiriksson (11/15/2016)
lady.konvicka (11/15/2016)
hi i have this test code which replace our problem on production
first replace for CHAR(0) removes character 'C' --- I dont know Why
when u set input to for example 'Calling' C dont go away
Can u help me?
You better start a new thread with your question, not many are going to be watching this one.
😎

This seems like a solution to the original problem. It also does a lot more work than it should.

CHAR(0) doesn't remove character 'C'. It shouldn't as there's nothing that relates both of those characters.

Here's a short version of the code with all unnecessary steps removed.

DECLARE @badStrings TABLE (item NVARCHAR(1))

DECLARE @input VARCHAR(250)

SET @input='Chlu' + CHAR(4) + 'emcká' COLLATE SQL_Latin1_General_CP1_CI_AS

INSERT INTO @badStrings(item)

VALUES

(CHAR(0) ),(CHAR(1) ),(CHAR(2) ),(CHAR(3) ),(CHAR(4) ),(CHAR(5) ),

(CHAR(6) ),(CHAR(7) ),(CHAR(8) ),(CHAR(11)),(CHAR(12)),(CHAR(14)),

(CHAR(15)),(CHAR(16)),(CHAR(17)),(CHAR(18)),(CHAR(19)),(CHAR(20)),

(CHAR(21)),(CHAR(22)),(CHAR(23)),(CHAR(24)),(CHAR(25)),(CHAR(26)),

(CHAR(27)),(CHAR(28)),(CHAR(29)),(CHAR(30)),(CHAR(31));

SELECT @input = REPLACE(@input, item, '') FROM @badStrings;

SELECT @input;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Eric M Russell SSC Guru Points: 125562 More actions · Answer 7

Eirikur Eiriksson (11/15/2016)
...
You better start a new thread with your question, not many are going to be watching this one.
😎

Too late, it's sitting in the 'Active Threads' queue, so everyone will drop by to take a look.

However, this should nail it shut.

create table MySample

(

col1 varchar(30) null

constraint cc_col1_InvalidData check

(case when col1 like '%['+char(0)+'-'+char(31)+']%'

then 0

else 1

end = 1)

);

insert into MySample ( col1 )

values ('John'+char(9));

Msg 547, Level 16, State 0, Line 18

The INSERT statement conflicted with the CHECK constraint "cc_col1_InvalidData".

The conflict occurred in database "Test", table "dbo.MySample", column 'col1'.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

lady.konvicka SSC Enthusiast Points: 108 More actions · Answer 8

its from our supporter and we are triing to resolve it, i dont know that its from the internet, but thanks

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 9

Here is a quick solution, quite efficient

😎

DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

, NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

SELECT

(

SELECT SUBSTRING(@pString,NM.N,1)

FROM NUMS NM

WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

FOR XML PATH('')

) AS OUT_STR

;

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

Eirikur Eiriksson (11/15/2016)
Here is a quick solution, quite efficient
😎
DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';
;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))
, NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)
SELECT
(
SELECT SUBSTRING(@pString,NM.N,1)
FROM NUMS NM
WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256
FOR XML PATH('')
) AS OUT_STR
;

Try this and see what you get...

DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + '&' + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

, NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

SELECT

(

SELECT SUBSTRING(@pString,NM.N,1)

FROM NUMS NM

WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

FOR XML PATH('')

) AS OUT_STR

;

You need to use "TYPE" and that will make it about twice as slow.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 11

Jeff Moden (11/15/2016)
Eirikur Eiriksson (11/15/2016)
Here is a quick solution, quite efficient
😎
DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';
;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))
, NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)
SELECT
(
SELECT SUBSTRING(@pString,NM.N,1)
FROM NUMS NM
WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256
FOR XML PATH('')
) AS OUT_STR
;
Try this and see what you get...
DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + '&' + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';
;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))
, NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)
SELECT
(
SELECT SUBSTRING(@pString,NM.N,1)
FROM NUMS NM
WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256
FOR XML PATH('')
) AS OUT_STR
;
You need to use "TYPE" and that will make it about twice as slow.

One would use the text() function to mitigate the cost, makes a big difference by avoiding the XML reconstruction phase of the output.

😎

The cost ratio between the first and the second query is close to 1:99

DECLARE @pString VARCHAR(8000) = 'Chlu' + CHAR(30) + CHAR(2) + CHAR(0) + CHAR(13) + '&' + CHAR(10) + CHAR(20) + CHAR(31) + 'emcká';

--Better

;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

, NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

SELECT

(

SELECT SUBSTRING(@pString,NM.N,1)

FROM NUMS NM

WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

FOR XML PATH(''), TYPE

).value('(./text())[1]','VARCHAR(8000)') AS OUT_STR

;

-- Bad

;WITH T(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))

, NUMS(N) AS ( SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM T T1,T T2,T T3,T T4)

SELECT

(

SELECT SUBSTRING(@pString,NM.N,1)

FROM NUMS NM

WHERE ((ASCII(SUBSTRING(@pString,NM.N,1)) - 32) & 0x7FFF) < 256

FOR XML PATH(''), TYPE

).value('(.)[1]','VARCHAR(8000)') AS OUT_STR

;