November 16, 2009 at 7:33 am
Nice job, Jeff. Learned a few things myself.
November 16, 2009 at 7:41 am
dhananjay.n.kulkarni (11/16/2009)
Hi , Article is nice. I have little very quick one liner solution for this :declare @str_having_multiplespaces varchar(200)
set @str_having_multiplespaces = 'hi onespace 2spaces 3spaces 25spaces'
select replace(@str_having_multiplespaces , space(2) , space(0))
--- REsult === >>> hi onespace2spaces 3spaces 25spaces
Try this and let us know if this faster
that looked like a better solution but when i tested it
DECLARE @a CHAR(200)
SET @a = 'TEST1 TEST2 TEST3 TEST4 TEST5'
SELECT REPLACE(@A,SPACE(2),SPACE(0))
results
TEST1 TEST2 TEST3TEST4 TEST5
even number of spaces craps things up unfortunately
--
Thiago Dantas
@DantHimself
November 16, 2009 at 7:49 am
jonathan allen-270981 (11/16/2009)
Guys,Rather than get over concerned about collation for choosing the character to replace the 2nd space you can use any string, so you could use
SELECT LTRIM(RTRIM(
REPLACE(REPLACE(REPLACE(OriginalString,' ',' ||*9*9||'),'||*9*9|| ',''),'||*9*9||','')
))
FROM @demo
its just a case of picking a delimiter that isnt 'ever' going to be part of the string that you are splitting...
Jeff: thanks for the post, good to have the example online for reference
Jonathan
Jeff stated in the article that he didn't want to increase the size of the data.
There are some additional requirements. You can't make the data any bigger during the process because it might get too big for VARCHAR(8000) (or whatever size the column is) or you can't use VARCHAR(MAX) because you're using SQL Server 2000.
Jason
November 16, 2009 at 7:49 am
Can anyone do some realistic performance compare on this?
DECLARE @sample VARCHAR(max)
SET @sample= 'r ggg ggg g gg '
SELECT LTRIM(RTRIM(
REPLACE(REPLACE(REPLACE(@sample,' ',' ||*9*9||'),'||*9*9|| ',''),'||*9*9||','')
))
has this kind of result:-
Time Statistics
Client processing time averarge 25.4000
Total execution time average 26.5000
Wait time on server repliesaverage 1.1000
And then this code
DECLARE @sample VARCHAR(max)
SET @sample= 'r ggg ggg g gg '
SELECT dbo.fn_CleanUp(@sample)
has this kind of result:-
Time Statistics
Client processing time 2.8750
Total execution time 4.7500
Wait time on server replies1.8750
C# Gnu
____________________________________________________
November 16, 2009 at 7:59 am
I think it is better not to depend on a single character. So IMO better use set of junk characters as shown here
http://sqlblogcasts.com/blogs/madhivanan/archive/2007/12/06/squeeze-function.aspx
Failing to plan is Planning to fail
November 16, 2009 at 8:14 am
Jason,
I missed that line totally in Jeff's post. sorry to have broken the rules ... 🙁
Jonathan
November 16, 2009 at 8:14 am
To solve this same problem in the past, I've created a CLR function or stored procedure to do a regular expression replace. You could also use the ...OA stored procedures to instantiate the VBScript Regular Expression Object do a regular expression replace.
November 16, 2009 at 8:15 am
I think the results were being affected by one query showing query plan, the other not: now we have :
DECLARE @sample VARCHAR(max)
SET @sample= 'r ggg ggg g gg '
SELECT LTRIM(RTRIM(
REPLACE(REPLACE(REPLACE(@sample,' ',' ||*9*9||'),'||*9*9|| ',''),'||*9*9||','')
))
Time Statistics Client processing time 1.4000
Total execution time 2.5000
Wait time on server replies1.1000
And then this code
DECLARE @sample VARCHAR(max)
SET @sample= 'r ggg ggg g gg '
SELECT dbo.fn_CleanUp(@sample)
has this kind of result:-
Time Statistics
Client processing time 2.7000
Total execution time 3.5000
Wait time on server replies 0.8000
[/quote]
C# Gnu
____________________________________________________
November 16, 2009 at 8:24 am
One other point to consider is sentence endings. Normally there are two spaces at the end of sentences (for readability, etc., and yes, not everyone uses them. So a second character is necessary to "keep double spaces following a period".
November 16, 2009 at 8:27 am
In order to alleviate (but not eliminate) the problem of choosing a delimiter, the final replace could replace Space+delimiter instead of just the delimiter. So as in the examples(using O for space and X for delimiter)
Replace(Replace(replace(text,'OO','OX'),'XO',''),'OX','O')
This way you can use as a delimiter any character that will never follow a space in the dataset which could expand your possibilities.
--
JimFive
November 16, 2009 at 8:42 am
Interesting?
Here is results using C# CLR
DECLARE @sample VARCHAR(max)
SET @sample= 'r ggg ggg g gg '
SELECT dbo.fn_CleanString_CLR(@sample)
Client processing time 1.9000
Total execution time 2.4000
Wait time on server replies 0.5000
Here is the c#
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[SqlFunction()]
public static string fn_CleanString_CLR(string CleanMeUpScottie)
{
// Put your code here
while (CleanMeUpScottie.IndexOf(" ", 0) > 0)
{
CleanMeUpScottie = CleanMeUpScottie.Replace(" ", " ");
}
return CleanMeUpScottie;
}
};
C# Gnu
____________________________________________________
November 16, 2009 at 8:49 am
--PREPARE
CREATE FUNCTION dbo.fn_CleanUp(@FooString VARCHAR(max))
RETURNS VARCHAR(max)
BEGIN
WHILE CHARINDEX(' ',@FooString) > 0
SELECT @FooString = REPLACE(@FooString,' ',' ')
RETURN @FooString
END
GO
CREATE TABLE #TEMP1 (COL1 VARCHAR(900))
CREATE TABLE #TEMP2 (COL2 VARCHAR(900))
--INSERT 200k ROWS WITH RANDOM SPACES ON EACH TABLE, SEPARATE TABLES ARE USED TO AVOID CACHING, THIS MAY TAKE QUITE AWHILE
DECLARE @SPACECOUNT1 INT,@SPACECOUNT2 INT,@SPACECOUNT3 INT,@SPACECOUNT4 INT
SELECT @SPACECOUNT1 = CAST(CAST(NEWID() AS VARBINARY(1)) AS INT),@SPACECOUNT2 = CAST(CAST(NEWID() AS VARBINARY(1)) AS INT)
INSERT INTO #TEMP1
OUTPUT inserted.COL1 INTO #TEMP2
SELECT 'TEST1'+SPACE(@SPACECOUNT1)+'TEST2'+SPACE(@SPACECOUNT2)+'TEST3'
GO 200000
--SELECTS
SELECT LTRIM(RTRIM(
REPLACE(REPLACE(REPLACE(COL1,' ',' ||*9*9||'),'||*9*9|| ',''),'||*9*9||','')
))
FROM #TEMP1
GO
SELECT dbo.fn_CleanUp(COL2)
FROM #TEMP2
GO
--CLEANUP
DROP FUNCTION dbo.fn_CleanUp
DROP TABLE #TEMP1
DROP TABLE #TEMP2
first select
Client processing time: 145453
Total execution time: 145609
second select
Client processing time:80438
Total execution time:80734
personally i thought the first would be faster
--
Thiago Dantas
@DantHimself
November 16, 2009 at 8:54 am
That is interesting,
do you have Visual Studio there ....or I can create T-SQL CREATE ASEMBLY line so u can also test the C# CLR version ... ?
Cheers
C# Gnu
____________________________________________________
November 16, 2009 at 8:55 am
Excellent solution. I would have written a loop:
while (charindex(' ', @myString) > 0) begin
set @myString = replace( @myString, ' ', ' ');
end;
but your solution is nicer in that it can be used directly in a select statement with a result set.
November 16, 2009 at 8:58 am
Thank you Jeff,
Your solution you presented in the article makes so much sense and is very easy to follow. It will come in handy as I'm going through a massive data cleansing and move right now. Great article!
Viewing 15 posts - 31 through 45 (of 425 total)
You must be logged in to reply to this topic. Login to reply