REPLACE Multiple Spaces with One

Question

REPLACE Multiple Spaces with One

Viewing 15 posts - 406 through 420 (of 426 total)

You must be logged in to reply to this topic. Login to reply

Michael Meierruth SSChampion Points: 10051 More actions · Answer 1

Steven Willis (5/26/2013)
I'm late to this party, but here's my entirely TSQL offering to remove duplicate spaces. Most of the code below has been not so shamelessly borrowed from Jeff Moden's work, especially the famous DelimitedSplit8K function which I have found to be one of the most useful tools in the toolbox.
I have no idea how this compares to the other solutions but I thought I'd just throw it into the mix. It's based mostly off of some other variations I've developed from the DelimitedSplit8K function and I think it takes a different approach to the problem. Basically, it splits the string and then puts it back together again. Whether that's better or not I'll let the scorekeeper determine.

OK. Yours is 6344ms. Mine is 2594ms.

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 2

Michael Meierruth (5/28/2013)
Steven Willis (5/26/2013)
I'm late to this party, but here's my entirely TSQL offering to remove duplicate spaces. Most of the code below has been not so shamelessly borrowed from Jeff Moden's work, especially the famous DelimitedSplit8K function which I have found to be one of the most useful tools in the toolbox.
I have no idea how this compares to the other solutions but I thought I'd just throw it into the mix. It's based mostly off of some other variations I've developed from the DelimitedSplit8K function and I think it takes a different approach to the problem. Basically, it splits the string and then puts it back together again. Whether that's better or not I'll let the scorekeeper determine.

OK. Yours is 6344ms. Mine is 2594ms.

Thanks for taking the time to test this. My initial test using client statistics did show your method to be faster on my 1000 row test table. What testing method did you use? Do you think scalability becomes a factor for any of these methods where one might work better than another for a very large or very wide table? Not disputing the results--just wondering.

Sporidium SSCommitted Points: 1550 More actions · Answer 3

Michael Meierruth (5/28/2013)
Steven Willis (5/26/2013)
I'm late to this party, but here's my entirely TSQL offering to remove duplicate spaces. Most of the code below has been not so shamelessly borrowed from Jeff Moden's work, especially the famous DelimitedSplit8K function which I have found to be one of the most useful tools in the toolbox.
I have no idea how this compares to the other solutions but I thought I'd just throw it into the mix. It's based mostly off of some other variations I've developed from the DelimitedSplit8K function and I think it takes a different approach to the problem. Basically, it splits the string and then puts it back together again. Whether that's better or not I'll let the scorekeeper determine.

OK. Yours is 6344ms. Mine is 2594ms.

DelimitedSplit8K has many and varied uses, though I'm afraid this isn't one that flatters it.

Michael's nested REPLACEs might look cumbersome, but they can't be beat for speed, especially inline.

I tried out this more concise UDF variant of his algorithm using CTEs, but it doesn't quite fly fast enough.

CREATE FUNCTION dbo.fnCoalesceSpaces(@Str varchar(max))

RETURNS varchar(max) AS

BEGIN

;WITH A AS (SELECT n FROM (VALUES (32),(16),(8),(4),(2),(1)) A(n))

SELECT @Str = REPLACE(@Str COLLATE Latin1_General_BIN2, SPACE(n + 1), SPACE(1))

FROM A

ORDER BY n DESC

RETURN @Str

END

Gary

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

I believe that the reasons for that are many...

1. It's a scalar UDF.

2. It does calculations instead of using constants.

3. It doesn't take advantage of nested replaces.

4. It has a relatively very expensive ORDER BY.

The nested replaces could actually be made into an Inline Table Valued Function as a kind of "iSF" or Inline Scalar Valued Function. Please see the following for more information on that subject.

http://www.sqlservercentral.com/articles/T-SQL/91724/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TomThomson SSC Guru Points: 104768 More actions · Answer 5

Jeff Moden (3/26/2015)
I believe that the reasons for that are many...
1. It's a scalar UDF.
2. It does calculations instead of using constants.
3. It doesn't take advantage of nested replaces.
4. It has a relatively very expensive ORDER BY.
The nested replaces could actually be made into an Inline Table Valued Function as a kind of "iSF" or Inline Scalar Valued Function. Please see the following for more information on that subject.
http://www.sqlservercentral.com/articles/T-SQL/91724/

I'm not 100% certain, but I suspect that the ORDER BY is technically meaningless, it can only order the final output rows generated by a query although it may sometimes, if one is lucky, order the internal workings of the query too. So having this UDF produce the right result requires luck - not something I would therefor use in a production environment (because I don't want upgrades - including bugfixes - to the server to risk breaking the code even if I know it currently works - unless I could devise a good reliable method of detecting when things were happening in the wrong order so that I could use a dfferent method in that case, and I can see any efficient method of doing that.

Writing a user defined table valued function with a single select statement nesting the 6 (or however many you choose) replace calls so that the thing runs inline and using explicit string constants instead of calls on SPACE is going to improve performance rather more than somewhat as well as providing formally valid SQL that will work for ever unless someone changes the standard.

Tom

Alan Burstein SSC Guru Points: 61152 More actions · Answer 6

I love this article Jeff and have been using your technique since I discovered this article a little more than a year ago.

I was using your technique to create a function to do this today; out of habit I created an iTVF version (AKA inline scalar version). Out of curiosity I created a scalar version and tested it. As expected the iTVF version was faster (by about 50%). I thought the results were worth sharing.

CREATE FUNCTION NormalizeWhitespace(@string varchar(max))

RETURNS TABLE WITH SCHEMABINDING AS RETURN

SELECT cleanstring =

CASE

WHEN charindex(' ',@string) > 0 THEN

REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(@string)),' ',' '+CHAR(127)),CHAR(127)+' ',''),CHAR(127),'')

ELSE @string

END;

GO

CREATE FUNCTION dbo.NormalizeWhitespace2 (@string varchar(max))

RETURNS varchar(max) WITH SCHEMABINDING AS

BEGIN

RETURN

CASE

WHEN charindex(' ',@string) > 0 THEN

REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(@string)),' ',' '+CHAR(127)),CHAR(127)+' ',''),CHAR(127),'')

ELSE @string

END;

END

I used varchar(max) because that's what my app needs (keeping it "real world"). Here's the test harness:

SET NOCOUNT ON

IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;

CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null);

INSERT #vals (val)

SELECT TOP 500000 REPLACE(REPLACE(newid(),'-',' '),'A',' ')

FROM sys.all_columns a, sys.all_columns b;

dbcc freeproccache with no_infomsgs

dbcc dropcleanbuffers with no_infomsgs

PRINT '========== itvf version ==========';

DECLARE

@result varchar(8000),

@StartTime DATETIME = GETDATE();

SELECT @result = cleanstring

FROM #vals

CROSS APPLY dbo.NormalizeWhitespace(val);

PRINT CAST(@@rowcount AS varchar(10))+' rows.';

PRINT DATEDIFF(ms,@StartTime,GETDATE());

GO 3

PRINT '========== scalar version ==========';

DECLARE

@result varchar(8000),

@StartTime DATETIME = GETDATE();

SELECT @result = dbo.NormalizeWhitespace2(val)

FROM #vals;

PRINT CAST(@@rowcount AS varchar(10))+' rows.';

PRINT DATEDIFF(ms,@StartTime,GETDATE());

GO 3

And the results:

Beginning execution loop

========== itvf version ==========

500000 rows.

3496

========== itvf version ==========

500000 rows.

3210

========== itvf version ==========

500000 rows.

3256

Batch execution completed 3 times.

Beginning execution loop

========== scalar version ==========

500000 rows.

4813

========== scalar version ==========

500000 rows.

4780

========== scalar version ==========

500000 rows.

4810

Batch execution completed 3 times.

Note: COLLATE Latin1_General_BIN made no difference.

Please forgive me if someone has already done this. Cheers.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

Alan.B (5/26/2015)
I love this article Jeff and have been using your technique since I discovered this article a little more than a year ago.

Thanks, Alan, but you need to revisit the article. Read the Prologue and see the first link there, which leads to Michael's method of using nested replaces just for the spaces. It blows the door off my method and virtually all comers. That's what I like about this community... lot's of people joined in the discussion and the result was a truly nasty fast method that passed my on the highway so fast that it looked like my method was standing still.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Alan Burstein SSC Guru Points: 61152 More actions · Answer 8

Jeff Moden (5/26/2015)
Alan.B (5/26/2015)
I love this article Jeff and have been using your technique since I discovered this article a little more than a year ago.
Thanks, Alan, but you need to revisit the article. Read the Prologue and see the first link there, which leads to Michael's method of using nested replaces just for the spaces. It blows the door off my method and virtually all comers. That's what I like about this community... lot's of people joined in the discussion and the result was a truly nasty fast method that passed my on the highway so fast that it looked like my method was standing still.

I should start reading prologues I guess 😉

Nonetheless, I tried the same exercise with Michael's function and the iTVF was about 25% faster still. Long live the inline scalar iTVF!

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

Alan.B (5/26/2015)
Jeff Moden (5/26/2015)
Alan.B (5/26/2015)
I love this article Jeff and have been using your technique since I discovered this article a little more than a year ago.
Thanks, Alan, but you need to revisit the article. Read the Prologue and see the first link there, which leads to Michael's method of using nested replaces just for the spaces. It blows the door off my method and virtually all comers. That's what I like about this community... lot's of people joined in the discussion and the result was a truly nasty fast method that passed my on the highway so fast that it looked like my method was standing still.
I should start reading prologues I guess 😉
Nonetheless, I tried the same exercise with Michael's function and the iTVF was about 25% faster still. Long live the inline scalar iTVF!

Can't go wrong there. Guess I'm going to have to revisit this article. Heh... maybe add a pre-prologue. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Michael Meierruth SSChampion Points: 10051 More actions · Answer 10

Alan B.

How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?

Alan Burstein SSC Guru Points: 61152 More actions · Answer 11

Michael Meierruth (5/27/2015)
Alan B.
How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?

Okay, first - both functions are much slower with a * at the beginning/end. My first test was at 500K rows and I cancelled before the first test finished. I did 10K, 50K and 100K rows. Here's DDL, Test and results:

-- Ran on my Laptop -- X64 (8 Logical CPU, 2.8ghz, 32GB) Microsoft SQL Server 2014 - 12.0.2254.0 (X64)

USE tempdb

GO

-- CREATE THE FUNCTIONS

-----------------------------------------------------------------

IF OBJECT_ID('dbo.fn_CleanUp_MichaelMeierruth_MKIV') IS NOT NULL

DROP FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV

GO

CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(max))

RETURNS VARCHAR(max) WITH SCHEMABINDING AS

BEGIN

RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))),

' ',' '),

' ',' ')

END;

GO

IF OBJECT_ID('dbo.NormalizeWhiteSpace8K') IS NOT NULL

DROP FUNCTION dbo.NormalizeWhiteSpace8K

GO

CREATE FUNCTION dbo.NormalizeWhiteSpace8K(@String varchar(max))

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

SELECT

NewString =

replace(replace(replace(replace(replace(replace(replace(

ltrim(rtrim(@String)),

' ',' '),

' ',' ');

GO

-- MAKE SURE BOTH STILL WORK (yep)

-----------------------------------------------------------------

DECLARE @string varchar(8000);

SET @string = '*'+REPLICATE(' ',7998)+'*';

SELECT * FROM dbo.NormalizeWhiteSpace8K(@string);

SELECT dbo.fn_CleanUp_MichaelMeierruth_MKIV(@string);

--THE TESTS

-----------------------------------------------------------------

SET NOCOUNT ON

IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;

CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null);

INSERT #vals (val)

SELECT TOP 100000 '*'+REPLICATE(' ',7998)+'*'

FROM sys.all_columns a, sys.all_columns b;

dbcc freeproccache with no_infomsgs

dbcc dropcleanbuffers with no_infomsgs

PRINT '========== itvf version ==========';

DECLARE

@result varchar(8000),

@StartTime DATETIME = GETDATE();

SELECT @result = NewString

FROM #vals

CROSS APPLY dbo.NormalizeWhiteSpace8K(val);

PRINT CAST(@@rowcount AS varchar(10))+' rows.';

PRINT DATEDIFF(ms,@StartTime,GETDATE());

GO 3

PRINT '========== scalar version ==========';

DECLARE

@result varchar(8000),

@StartTime DATETIME = GETDATE();

SELECT @result = dbo.fn_CleanUp_MichaelMeierruth_MKIV(val)

FROM #vals;

PRINT CAST(@@rowcount AS varchar(10))+' rows.';

PRINT DATEDIFF(ms,@StartTime,GETDATE());

GO 3

/*

--10,000 Records

Beginning execution loop

========== itvf version ==========

10000 rows.

1450

========== itvf version ==========

10000 rows.

1380

========== itvf version ==========

10000 rows.

1413

Batch execution completed 3 times.

Beginning execution loop

========== scalar version ==========

10000 rows.

2300

========== scalar version ==========

10000 rows.

2290

========== scalar version ==========

10000 rows.

2276

Batch execution completed 3 times.

*/

/*

-- 50,000 rows

Beginning execution loop

========== itvf version ==========

50000 rows.

7166

========== itvf version ==========

50000 rows.

7116

========== itvf version ==========

50000 rows.

7070

Batch execution completed 3 times.

Beginning execution loop

========== scalar version ==========

50000 rows.

11503

========== scalar version ==========

50000 rows.

11530

========== scalar version ==========

50000 rows.

11493

Batch execution completed 3 times.

*/

/*

-- 100,000 rows

Beginning execution loop

========== itvf version ==========

100000 rows.

14426

========== itvf version ==========

100000 rows.

14226

========== itvf version ==========

100000 rows.

14226

Batch execution completed 3 times.

Beginning execution loop

========== scalar version ==========

100000 rows.

22900

========== scalar version ==========

100000 rows.

23076

========== scalar version ==========

100000 rows.

22943

Batch execution completed 3 times.

*/

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Sporidium SSCommitted Points: 1550 More actions · Answer 12

Alan.B (5/27/2015)
Michael Meierruth (5/27/2015)
Alan B.
How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?
Okay, first - both functions are much slower with a * at the beginning/end. My first test was at 500K rows and I cancelled before the first test finished. I did 10K, 50K and 100K rows. Here's DDL, Test and results:
-- Ran on my Laptop -- X64 (8 Logical CPU, 2.8ghz, 32GB) Microsoft SQL Server 2014 - 12.0.2254.0 (X64)
USE tempdb
GO
-- CREATE THE FUNCTIONS
-----------------------------------------------------------------
IF OBJECT_ID('dbo.fn_CleanUp_MichaelMeierruth_MKIV') IS NOT NULL
DROP FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV
GO
CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(max))
RETURNS VARCHAR(max) WITH SCHEMABINDING AS
BEGIN
RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ')
END;
GO
IF OBJECT_ID('dbo.NormalizeWhiteSpace8K') IS NOT NULL
DROP FUNCTION dbo.NormalizeWhiteSpace8K
GO
CREATE FUNCTION dbo.NormalizeWhiteSpace8K(@String varchar(max))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT
NewString =
replace(replace(replace(replace(replace(replace(replace(
ltrim(rtrim(@String)),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ');
GO
-- MAKE SURE BOTH STILL WORK (yep)
-----------------------------------------------------------------
DECLARE @string varchar(8000);
SET @string = '*'+REPLICATE(' ',7998)+'*';
SELECT * FROM dbo.NormalizeWhiteSpace8K(@string);
SELECT dbo.fn_CleanUp_MichaelMeierruth_MKIV(@string);
--THE TESTS
-----------------------------------------------------------------
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;
CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null);
INSERT #vals (val)
SELECT TOP 100000 '*'+REPLICATE(' ',7998)+'*'
FROM sys.all_columns a, sys.all_columns b;
dbcc freeproccache with no_infomsgs
dbcc dropcleanbuffers with no_infomsgs
PRINT '========== itvf version ==========';
DECLARE
@result varchar(8000),
@StartTime DATETIME = GETDATE();
SELECT @result = NewString
FROM #vals
CROSS APPLY dbo.NormalizeWhiteSpace8K(val);
PRINT CAST(@@rowcount AS varchar(10))+' rows.';
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO 3
PRINT '========== scalar version ==========';
DECLARE
@result varchar(8000),
@StartTime DATETIME = GETDATE();
SELECT @result = dbo.fn_CleanUp_MichaelMeierruth_MKIV(val)
FROM #vals;
PRINT CAST(@@rowcount AS varchar(10))+' rows.';
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO 3
/*
--10,000 Records
Beginning execution loop
========== itvf version ==========
10000 rows.
1450
========== itvf version ==========
10000 rows.
1380
========== itvf version ==========
10000 rows.
1413
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
10000 rows.
2300
========== scalar version ==========
10000 rows.
2290
========== scalar version ==========
10000 rows.
2276
Batch execution completed 3 times.
*/
/*
-- 50,000 rows
Beginning execution loop
========== itvf version ==========
50000 rows.
7166
========== itvf version ==========
50000 rows.
7116
========== itvf version ==========
50000 rows.
7070
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
50000 rows.
11503
========== scalar version ==========
50000 rows.
11530
========== scalar version ==========
50000 rows.
11493
Batch execution completed 3 times.
*/
/*
-- 100,000 rows
Beginning execution loop
========== itvf version ==========
100000 rows.
14426
========== itvf version ==========
100000 rows.
14226
========== itvf version ==========
100000 rows.
14226
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
100000 rows.
22900
========== scalar version ==========
100000 rows.
23076
========== scalar version ==========
100000 rows.
22943
Batch execution completed 3 times.
*/

Surely both of these methods would benefit from using a binary collation, allowing the REPLACE function to execute faster?

For example, adding COLLATE Latin1_General_BIN2 to the function's inner @string parameter reference gives a significant speed improvement in my tests.

Maybe there's a downside I haven't thought of, but I'd have thought a space is a space is a space in any collation.

Alan Burstein SSC Guru Points: 61152 More actions · Answer 13

Gary Harding (5/28/2015)
Alan.B (5/27/2015)
Michael Meierruth (5/27/2015)
Alan B.
How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?
Okay, first - both functions are much slower with a * at the beginning/end. My first test was at 500K rows and I cancelled before the first test finished. I did 10K, 50K and 100K rows. Here's DDL, Test and results:
-- Ran on my Laptop -- X64 (8 Logical CPU, 2.8ghz, 32GB) Microsoft SQL Server 2014 - 12.0.2254.0 (X64)
USE tempdb
GO
-- CREATE THE FUNCTIONS
-----------------------------------------------------------------
IF OBJECT_ID('dbo.fn_CleanUp_MichaelMeierruth_MKIV') IS NOT NULL
DROP FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV
GO
CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(max))
RETURNS VARCHAR(max) WITH SCHEMABINDING AS
BEGIN
RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ')
END;
GO
IF OBJECT_ID('dbo.NormalizeWhiteSpace8K') IS NOT NULL
DROP FUNCTION dbo.NormalizeWhiteSpace8K
GO
CREATE FUNCTION dbo.NormalizeWhiteSpace8K(@String varchar(max))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT
NewString =
replace(replace(replace(replace(replace(replace(replace(
ltrim(rtrim(@String)),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ');
GO
-- MAKE SURE BOTH STILL WORK (yep)
-----------------------------------------------------------------
DECLARE @string varchar(8000);
SET @string = '*'+REPLICATE(' ',7998)+'*';
SELECT * FROM dbo.NormalizeWhiteSpace8K(@string);
SELECT dbo.fn_CleanUp_MichaelMeierruth_MKIV(@string);
--THE TESTS
-----------------------------------------------------------------
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;
CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null);
INSERT #vals (val)
SELECT TOP 100000 '*'+REPLICATE(' ',7998)+'*'
FROM sys.all_columns a, sys.all_columns b;
dbcc freeproccache with no_infomsgs
dbcc dropcleanbuffers with no_infomsgs
PRINT '========== itvf version ==========';
DECLARE
@result varchar(8000),
@StartTime DATETIME = GETDATE();
SELECT @result = NewString
FROM #vals
CROSS APPLY dbo.NormalizeWhiteSpace8K(val);
PRINT CAST(@@rowcount AS varchar(10))+' rows.';
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO 3
PRINT '========== scalar version ==========';
DECLARE
@result varchar(8000),
@StartTime DATETIME = GETDATE();
SELECT @result = dbo.fn_CleanUp_MichaelMeierruth_MKIV(val)
FROM #vals;
PRINT CAST(@@rowcount AS varchar(10))+' rows.';
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO 3
/*
--10,000 Records
Beginning execution loop
========== itvf version ==========
10000 rows.
1450
========== itvf version ==========
10000 rows.
1380
========== itvf version ==========
10000 rows.
1413
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
10000 rows.
2300
========== scalar version ==========
10000 rows.
2290
========== scalar version ==========
10000 rows.
2276
Batch execution completed 3 times.
*/
/*
-- 50,000 rows
Beginning execution loop
========== itvf version ==========
50000 rows.
7166
========== itvf version ==========
50000 rows.
7116
========== itvf version ==========
50000 rows.
7070
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
50000 rows.
11503
========== scalar version ==========
50000 rows.
11530
========== scalar version ==========
50000 rows.
11493
Batch execution completed 3 times.
*/
/*
-- 100,000 rows
Beginning execution loop
========== itvf version ==========
100000 rows.
14426
========== itvf version ==========
100000 rows.
14226
========== itvf version ==========
100000 rows.
14226
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
100000 rows.
22900
========== scalar version ==========
100000 rows.
23076
========== scalar version ==========
100000 rows.
22943
Batch execution completed 3 times.
*/
Surely both of these methods would benefit from using a binary collation, allowing the REPLACE function to execute faster?
For example, adding COLLATE Latin1_General_BIN2 to the function's inner @string parameter reference gives a significant speed improvement in my tests.
Maybe there's a downside I haven't thought of, but I'd have thought a space is a space is a space in any collation.

I have to try Latin1_General_BIN2. I tested with Latin1_General_BIN and saw no improvement.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

jessinthomas012 SSC Rookie Points: 46 More actions · Answer 14

Hi Jeff,

Well explained.. Thanks for sharing this :)..

Thanks,
Jessin

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 15

Thanks for the feedback, Jessin.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)