Removing empty carriage returns

Question

Removing empty carriage returns

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

GSquared (4/21/2008)
Okay, I have to ask if this is being over-complicated. Why not just:
declare @test-2 varchar(100)
select @test-2 = '1
2
3'
select @test-2 = replace(@test, '
', '')
select @test-2
That works for me. I get "123".
Is there some reason to code out the char values or use a RBAR regex?

He's trying to keep just one of the carriage returns - not dispose of them all.

Regex might be overkill - but the twist seems to be if you stack a "bunch" of carriage returns, and you want to get rid of all but one. T-SQL doesn't seem to get rid of them all in one shot. Otherwise -

Regex isn't exactly RBAR, unless we're defining pretty much every built-in function as RBAR.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Chris Gierlack SSCrazy Points: 2326 More actions · Answer 2

The point was to remove extra carriage returns without concatenating the entire string into one long sequence. So in your example:

set @var = '1

2

3'

I would want to be returned as:

1

2

3

The only time I want anything removed is when it would look like this:

set @var = '1

2

3

4

5'

What I want in return is this:

1

2

3

4

5

Not this:

12345

RyanRandall SSChampion Points: 13623 More actions · Answer 3

Well, I'm confused with this thread. Why not just this?

select replace(@var, char(10) + char(13), '')

This works on my machine. The string you gave is '1yxyx2yx3yxyx4yx5' where y = char(13) and x is char(10). Deleting all instances of 'xy' will give '1yx2yx3yx4yx5' which is what we need.

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 4

Michael Earl-395764

SSC Guru

Points: 53873

April 21, 2008 at 9:43 am

#805060

Ok, the dead horse is well-kicked.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

RyanRandall (4/21/2008)
Well, I'm confused with this thread. Why not just this?
select replace(@var, char(10) + char(13), '')
This works on my machine. The string you gave is '1yxyx2yx3yxyx4yx5' where y = char(13) and x is char(10). Deleting all instances of 'xy' will give '1yx2yx3yx4yx5' which is what we need.

yup that might do it:) Just be careful of that one unlucky time when the pattern ISN't "surrounded" by the correct 2 characters....

e.g.

1xy2yxyx3yxyxyx4yx5xy

I know - remote chance - but still....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

GSquared SSC Guru Points: 260824 More actions · Answer 6

My last example:

update dbo.Table

set Column = replace(Column, '

', '

')

Will do exactly what you're talking about.

1
2
3
4
5
6
7

Will turn into:

1
2
3
4
5
6
7

Whole table, all at once, all rows.

Matt: On the RBAR thing, it looked to me like the CLR function would run row-by-row, like any other inline UDF. I'm not comfortable with CLR UDFs yet, so maybe I misunderstood that. But it still seems like overkill to me, when a single Replace statement will do the whole thing as specified. I may be missing something, but it seems that way to me.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 7

Matt:

I just tested the CLR UDF vs a straight-up replace. I may have done something incorrectly, since this is the first time I've used a CLF UDF, so please double-check my work.

I copied your CLR from this thread. I changed the name to ReturnReplace, for no good reason. 🙂

Then here's what I ran:

create FUNCTION ReturnRemove

(@String_in nvarchar(100),

@Find_in nvarchar(100),

@Replace_in nvarchar(100))

RETURNS nvarchar(100)

AS EXTERNAL NAME ReturnRemove.UserDefinedFunctions.RegexReplace

create table #Temp (

ID int identity,

Chars1 nvarchar(100),

Chars2 nvarchar(100))

insert into #temp (chars1, chars2)

select '1

2

3','1

2

3'

from common.dbo.bignumbers

set statistics io on

set statistics time on

update #temp

set chars1 = dbo.returnremove(chars1, '(\r)+', char(13)+char(10))

The temp table had 1,000,001 rows, and here are the stats from the CLR UDF:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 5 ms.
Table '#Temp'. Scan count 1, logical reads 252067, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 76047 ms, elapsed time = 79162 ms.
(1000001 row(s) affected)

Then I ran:

update #temp

set chars2 = replace(chars2, '

', '

')

Results:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 4 ms.
Table '#Temp'. Scan count 1, logical reads 130400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 39719 ms, elapsed time = 39939 ms.
(1000001 row(s) affected)

Which means just over half the time of the Regex version.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 8

GSquared (4/21/2008)
Which means just over half the time of the Regex version.

Which is what I would expect if you know the actual pattern (meaning get rid of exactly 2 lines, replacing them with one). The fun comes in when you randomize how many duplicate line breaks you want to get rid of. At which point you'd have to run the replace several times over in order to find and replace them all.

For a little more of a challenge -

create table t1 (desk varchar(1000), d2 varchar(1000))

insert t1

select top 1000000

'a' + replicate(char(13)+char(10),cast(rand(checksum(newid()))*150+1 as integer)) +

'b' + replicate(char(13)+char(10),cast(rand(checksum(newid()))*150+1 as integer)) +

'c' + replicate(char(13)+char(10),cast(rand(checksum(newid()))*150+1 as integer)) + 'd',

''

from sys.all_columns sc1, sys.all_columns sc2

Otherwise - no doubt - the RegexReplace is overkill. Again - same as before - if you have a pattern and know a pattern - the built-in function will likely do the trick.

And you're right - Regex is a function, so will evaluate row-by-row, just like every other function (including the built-in ones).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

GSquared SSC Guru Points: 260824 More actions · Answer 9

Matt: I must be doing something wrong with the regex. It doesn't appear to handle the situation in your last post. I posted how I put it together in my prior post about the testing. What am I missing?

Edit: Never mind. I wasn't including the backslashes correctly in this test.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 10

Okay, now that I got the regex working (more accurately, now that I stopped making stupid mistakes while trying to use it), I got some figures on speed:

On the table Matt proposed, the regex took 1:34 on my computer, and got the desired result.

Looping through a simple replace command till @@rowcount = 0 on updated rows, took 1:32 seconds. (Took 3-10 iterations each time I tried this method.)

Those are the best times out of several runs for each one. Each time, I dropped and re-created the table and cleared the cache.

In other words, no real advantage either direction, in terms of speed.

For a more complex pattern, the regex would definitely be better.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

RyanRandall SSChampion Points: 13623 More actions · Answer 11

GSquared (4/22/2008)
Okay, now that I got the regex working (more accurately, now that I stopped making stupid mistakes while trying to use it), I got some figures on speed:
On the table Matt proposed, the regex took 1:34 on my computer, and got the desired result.
Looping through a simple replace command till @@rowcount = 0 on updated rows, took 1:32 seconds. (Took 3-10 iterations each time I tried this method.)
Those are the best times out of several runs for each one. Each time, I dropped and re-created the table and cleared the cache.
In other words, no real advantage either direction, in terms of speed.
For a more complex pattern, the regex would definitely be better.

Thanks GSquared. Did/could you try my suggestion? No iterations required...

select replace(@var, char(10) + char(13), '')

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

GSquared SSC Guru Points: 260824 More actions · Answer 12

Ryan, that will work if you want to get rid of them completely. What's needed here is eliminating duplicates, but keeping one per set. For that, it needs iteration, unless I'm missing something.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

RyanRandall SSChampion Points: 13623 More actions · Answer 13

GSquared (4/22/2008)
Ryan, that will work if you want to get rid of them completely. What's needed here is eliminating duplicates, but keeping one per set. For that, it needs iteration, unless I'm missing something.

I think you're missing something. Have you tried it?

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

GSquared SSC Guru Points: 260824 More actions · Answer 14

RyanRandall (4/22/2008)
GSquared (4/22/2008)
Ryan, that will work if you want to get rid of them completely. What's needed here is eliminating duplicates, but keeping one per set. For that, it needs iteration, unless I'm missing something.
I think you're missing something. Have you tried it?

You're right. I hadn't noticed that you'd inverted the sequence. Yep, that one works.

41 seconds for 1-million rows. Twice as fast as either of the other solutions.

Won't work for anything except extra return characters, but certainly works for those.

Very clever.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

RyanRandall SSChampion Points: 13623 More actions · Answer 15

Yep - it has it's limitations. I appreciate you doing the test. Many thanks 🙂

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.