April 21, 2008 at 9:13 am
GSquared (4/21/2008)
Okay, I have to ask if this is being over-complicated. Why not just:
declare @test-2 varchar(100)
select @test-2 = '1
2
3'
select @test-2 = replace(@test, '
', '')
select @test-2
That works for me. I get "123".
Is there some reason to code out the char values or use a RBAR regex?
He's trying to keep just one of the carriage returns - not dispose of them all.
Regex might be overkill - but the twist seems to be if you stack a "bunch" of carriage returns, and you want to get rid of all but one. T-SQL doesn't seem to get rid of them all in one shot. Otherwise -
Regex isn't exactly RBAR, unless we're defining pretty much every built-in function as RBAR.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 21, 2008 at 9:14 am
The point was to remove extra carriage returns without concatenating the entire string into one long sequence. So in your example:
set @var = '1
2
3'
I would want to be returned as:
1
2
3
The only time I want anything removed is when it would look like this:
set @var = '1
2
3
4
5'
What I want in return is this:
1
2
3
4
5
Not this:
12345
April 21, 2008 at 9:29 am
Well, I'm confused with this thread. Why not just this?
select replace(@var, char(10) + char(13), '')
This works on my machine. The string you gave is '1yxyx2yx3yxyx4yx5' where y = char(13) and x is char(10). Deleting all instances of 'xy' will give '1yx2yx3yx4yx5' which is what we need.
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
April 21, 2008 at 9:43 am
Ok, the dead horse is well-kicked.
April 21, 2008 at 9:44 am
RyanRandall (4/21/2008)
Well, I'm confused with this thread. Why not just this?
select replace(@var, char(10) + char(13), '')
This works on my machine. The string you gave is '1yxyx2yx3yxyx4yx5' where y = char(13) and x is char(10). Deleting all instances of 'xy' will give '1yx2yx3yx4yx5' which is what we need.
yup that might do it:) Just be careful of that one unlucky time when the pattern ISN't "surrounded" by the correct 2 characters....
e.g.
1xy2yxyx3yxyxyx4yx5xy
I know - remote chance - but still....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 21, 2008 at 2:02 pm
My last example:
update dbo.Table
set Column = replace(Column, '
', '
')
Will do exactly what you're talking about.
1
2
3
4
5
6
7
Will turn into:
1
2
3
4
5
6
7
Whole table, all at once, all rows.
Matt: On the RBAR thing, it looked to me like the CLR function would run row-by-row, like any other inline UDF. I'm not comfortable with CLR UDFs yet, so maybe I misunderstood that. But it still seems like overkill to me, when a single Replace statement will do the whole thing as specified. I may be missing something, but it seems that way to me.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 21, 2008 at 3:04 pm
Matt:
I just tested the CLR UDF vs a straight-up replace. I may have done something incorrectly, since this is the first time I've used a CLF UDF, so please double-check my work.
I copied your CLR from this thread. I changed the name to ReturnReplace, for no good reason. 🙂
Then here's what I ran:
create FUNCTION ReturnRemove
(@String_in nvarchar(100),
@Find_in nvarchar(100),
@Replace_in nvarchar(100))
RETURNS nvarchar(100)
AS EXTERNAL NAME ReturnRemove.UserDefinedFunctions.RegexReplace
create table #Temp (
ID int identity,
Chars1 nvarchar(100),
Chars2 nvarchar(100))
insert into #temp (chars1, chars2)
select '1
2
3','1
2
3'
from common.dbo.bignumbers
set statistics io on
set statistics time on
update #temp
set chars1 = dbo.returnremove(chars1, '(\r)+', char(13)+char(10))
The temp table had 1,000,001 rows, and here are the stats from the CLR UDF:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 5 ms.
Table '#Temp'. Scan count 1, logical reads 252067, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 76047 ms, elapsed time = 79162 ms.
(1000001 row(s) affected)
Then I ran:
update #temp
set chars2 = replace(chars2, '
', '
')
Results:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 4 ms.
Table '#Temp'. Scan count 1, logical reads 130400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 39719 ms, elapsed time = 39939 ms.
(1000001 row(s) affected)
Which means just over half the time of the Regex version.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 21, 2008 at 4:46 pm
GSquared (4/21/2008)
Which means just over half the time of the Regex version.
Which is what I would expect if you know the actual pattern (meaning get rid of exactly 2 lines, replacing them with one). The fun comes in when you randomize how many duplicate line breaks you want to get rid of. At which point you'd have to run the replace several times over in order to find and replace them all.
For a little more of a challenge -
create table t1 (desk varchar(1000), d2 varchar(1000))
insert t1
select top 1000000
'a' + replicate(char(13)+char(10),cast(rand(checksum(newid()))*150+1 as integer)) +
'b' + replicate(char(13)+char(10),cast(rand(checksum(newid()))*150+1 as integer)) +
'c' + replicate(char(13)+char(10),cast(rand(checksum(newid()))*150+1 as integer)) + 'd',
''
from sys.all_columns sc1, sys.all_columns sc2
Otherwise - no doubt - the RegexReplace is overkill. Again - same as before - if you have a pattern and know a pattern - the built-in function will likely do the trick.
And you're right - Regex is a function, so will evaluate row-by-row, just like every other function (including the built-in ones).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 22, 2008 at 8:30 am
Matt: I must be doing something wrong with the regex. It doesn't appear to handle the situation in your last post. I posted how I put it together in my prior post about the testing. What am I missing?
Edit: Never mind. I wasn't including the backslashes correctly in this test.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 22, 2008 at 10:16 am
Okay, now that I got the regex working (more accurately, now that I stopped making stupid mistakes while trying to use it), I got some figures on speed:
On the table Matt proposed, the regex took 1:34 on my computer, and got the desired result.
Looping through a simple replace command till @@rowcount = 0 on updated rows, took 1:32 seconds. (Took 3-10 iterations each time I tried this method.)
Those are the best times out of several runs for each one. Each time, I dropped and re-created the table and cleared the cache.
In other words, no real advantage either direction, in terms of speed.
For a more complex pattern, the regex would definitely be better.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 22, 2008 at 10:29 am
GSquared (4/22/2008)
Okay, now that I got the regex working (more accurately, now that I stopped making stupid mistakes while trying to use it), I got some figures on speed:On the table Matt proposed, the regex took 1:34 on my computer, and got the desired result.
Looping through a simple replace command till @@rowcount = 0 on updated rows, took 1:32 seconds. (Took 3-10 iterations each time I tried this method.)
Those are the best times out of several runs for each one. Each time, I dropped and re-created the table and cleared the cache.
In other words, no real advantage either direction, in terms of speed.
For a more complex pattern, the regex would definitely be better.
Thanks GSquared. Did/could you try my suggestion? No iterations required...
select replace(@var, char(10) + char(13), '')
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
April 22, 2008 at 11:19 am
Ryan, that will work if you want to get rid of them completely. What's needed here is eliminating duplicates, but keeping one per set. For that, it needs iteration, unless I'm missing something.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 22, 2008 at 11:30 am
GSquared (4/22/2008)
Ryan, that will work if you want to get rid of them completely. What's needed here is eliminating duplicates, but keeping one per set. For that, it needs iteration, unless I'm missing something.
I think you're missing something. Have you tried it?
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
April 22, 2008 at 12:10 pm
RyanRandall (4/22/2008)
GSquared (4/22/2008)
Ryan, that will work if you want to get rid of them completely. What's needed here is eliminating duplicates, but keeping one per set. For that, it needs iteration, unless I'm missing something.I think you're missing something. Have you tried it?
You're right. I hadn't noticed that you'd inverted the sequence. Yep, that one works.
41 seconds for 1-million rows. Twice as fast as either of the other solutions.
Won't work for anything except extra return characters, but certainly works for those.
Very clever.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 22, 2008 at 1:39 pm
Yep - it has it's limitations. I appreciate you doing the test. Many thanks 🙂
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
Viewing 15 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply