remove spaces more than double

Question

remove spaces more than double

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110640 More actions · Answer 1

GSquared,

My concern was with any processes that replaced any more than 2 spaces with none, which has a potential problem, as I pointed out previously.

With regard to your example of re-occurring bulk data loads, I'd absolutely insist on cleaning up the data long before it was ready for bulk insert. The risks associated with accepting data from sources that provide it in such a "dirty" fashion often go far beyond a mere few extra spaces, and that kind of data integrity problem is no way to run ANY business. Merely cleaning up the spaces has far too much potential for either avoiding or entirely ignoring data integrity problems - and at a level no one should be willing to accept. Ever... and no matter what. And, generally, until management actually feels the pinch from an outage caused by poor data, they may continue to just go happily along believing that they can get away with running a business that way. Again, just my two cents.

Steve

(aka smunson)

:):):)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

GSquared SSC Guru Points: 260824 More actions · Answer 2

RyanRandall (7/16/2008)
As usual, different methods will be better in different situations. They're all tools for the toolbox 🙂
GSquared, your last idea also cropped up in 2002...
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=21511
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(colname,
REPLICATE(' ',21),' '),
REPLICATE(' ', 6),' '),
REPLICATE(' ', 3),' '),
REPLICATE(' ', 2),' '),
REPLICATE(' ', 2),' ')

That would need to be nested inside the other replace (with the placeholder character), in case the number of spaces was a multiple that wouldn't be fixed by those exact replaces.

Here's a test:

create table #T (

Number int,

String varchar(1000))

insert into #t (number, string)

select number, '|' + replicate(' ', number) + '|'

from dbo.numbers

where number between 1 and 998

update #t

set string = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(string,

REPLICATE(' ',21),' '),

REPLICATE(' ', 6),' '),

REPLICATE(' ', 3),' '),

REPLICATE(' ', 2),' '),

REPLICATE(' ', 2),' ')

select * -- Test for wrong results

from #t

where string like '% %' -- Still has double-space

or string = '||' -- All spaces eliminated

Out of 998 rows, 120 ended up with double-spacing still in them. They were all places where it was a large number of spaces. No rows had all spaces eliminated.

Probably not applicable in any normal situation, but worth keeping in mind. I don't like cases where it's mathematically possible to end up with bad data.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

RyanRandall SSChampion Points: 13623 More actions · Answer 3

GSquared (7/16/2008)
Out of 998 rows, 120 ended up with double-spacing still in them.

Slow down!

The link from 2002 says:

I'm pretty convinced now that it requires 6 REPLACEs. I think that, using 5 REPLACEs it is possible to collapse up to 461 spaces (using the sequence* 21, 6, 3, 2, 2). With 6 REPLACEs, this increases to 53591 (prepending 231 to the sequence).

If you take this into account it works just fine 🙂

Thanks for doing the tests by the way. There are plenty around from over the years, but it's always interesting to see more.

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

GSquared SSC Guru Points: 260824 More actions · Answer 4

Makes sense. (I didn't actually click through the link, because I kind of overlooked it in my overenthusiam for testing.)

On the original test table, with 1-million rows, that nested replace method took 52,671. Faster than the one with the placeholder character, but still much slower than the staged replace.

Results were completely accurate.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

RyanRandall SSChampion Points: 13623 More actions · Answer 5

Thanks GSquared.

By the way, I've just noticed that Test3 references string1 several times. Is that supposed to be the case?

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

GSquared SSC Guru Points: 260824 More actions · Answer 6

Nope. That comes from me editing (I changed some stuff for clarity) after I tested. It's a typo. (Par for the course today. I try to make something more clear, and it ends up with lead-like transparency!)

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Kaushal SSC Enthusiast Points: 118 More actions · Answer 7

Try creating a function like the following:

create function [dbo].[removeMultiSpace] (@cString varchar(300) )

returns varchar(300)

as

BEGIN

declare @k smallint, @left varchar(300), @right varchar(300)

select @cString = ltrim(rtrim(@cString))

select @k = len(ltrim(rtrim(@cString)))

IF @k > 1

BEGIN

SELECT @k = CHARINDEX(' ',@cString)

WHILE @k > 0

BEGIN

SELECT @left = SUBSTRING(@cString,1,(@k - 1))

SELECT @right = SUBSTRING(@cString,@k,LEN(@cString))

SET @cString = @left + ' ' + ltrim(rtrim(@right))

SELECT @k = CHARINDEX(' ',@cString)

END

RETURN @cString

END

;

.. and test that with the following:

IF OBJECT_ID (N'DBO.TEMP1', N'U') IS NOT NULL DROP TABLE DBO.TEMP1;

create table DBO.TEMP1 (COL1 varchar(30));

insert into DBO.TEMP1 VALUES ('JOHN SMITH');

SELECT DBO.removeMultiSpace(COL1) FROM DBO.TEMP1;