November 23, 2009 at 4:43 am
Michael Meierruth (11/23/2009)
The problem is the where clause in your function.It gets interpreted only once.
It took me a while to realise why - I can see now though that the only replace rows that are included are those where there is a charindex hit on the original string. The where clause not being re-evaluated.
If I add spaces to the data with tabs in :
SELECT 'A' + SPACE(10)+ REPLICATE(CHAR(9), 3) etc
[Edit : changed spaces to SPACE() so you see in post]
Then the tab string is cleaned up ok - as the original string matches the 'space replace' records.
So now I understand, thankyou!!
Shame really as I added the WHERE clause to try and cut down on workload...
Thanks
C# Gnu
____________________________________________________
November 23, 2009 at 4:53 am
Just put WHERE clause back:-) and added tabs to your pattern, of course!!:
CREATE FUNCTION dbo.fn_CleanString(@MyString VARCHAR(max))
RETURNS VarChar(MAX)
AS
BEGIN
-- Remove unwanted strings
SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)
FROM tPatterns
WHERE CHARINDEX(RemovePattern, @MyString) > 0
ORDER BY ReplaceOrder
RETURN @MyString
END
go
-- Create a table of all the rotten stuff we want to remove
SELECT 'tablescan' RemovePattern, 'seek' ReplaceWith, 1 AS ReplaceOrder
INTO tPatterns
UNION ALL SELECT 'RBAR', 'r***[censored word]',3 AS ReplaceOrder
UNION ALL SELECT 'screw', 's***[censored word]' ,4 AS ReplaceOrder
UNION ALL SELECT replicate(' ',32+1),' ',5 AS ReplaceOrder
UNION ALL SELECT replicate(' ',16+1),' ',6 AS ReplaceOrder
UNION ALL SELECT replicate(' ',8+1), ' ',7 AS ReplaceOrder
UNION ALL SELECT replicate(' ',4+1), ' ',8 AS ReplaceOrder
UNION ALL SELECT replicate(' ',2+1), ' ',9 AS ReplaceOrder
UNION ALL SELECT replicate(' ',1+1), ' ',10 AS ReplaceOrder
UNION ALL SELECT replicate(' ',1+1), ' ',11 AS ReplaceOrder
UNION ALL SELECT replicate(CHAR(9),32+1),CHAR(9),12 AS ReplaceOrder
UNION ALL SELECT replicate(CHAR(9),16+1),CHAR(9),13 AS ReplaceOrder
UNION ALL SELECT replicate(CHAR(9),8+1), CHAR(9),14 AS ReplaceOrder
UNION ALL SELECT replicate(CHAR(9),4+1), CHAR(9),15 AS ReplaceOrder
UNION ALL SELECT replicate(CHAR(9),2+1), CHAR(9),16 AS ReplaceOrder
UNION ALL SELECT replicate(CHAR(9),1+1), CHAR(9),17 AS ReplaceOrder
UNION ALL SELECT replicate(CHAR(9),1+1), CHAR(9),18 AS ReplaceOrder
-- Create some dirty test data
SELECT 'a b c d e f g h rbar rbar scscrewrew you i tablescan j k' myString
INTO tMyStrings
UNION ALL
SELECT ' tablescan rbar x scscrewrew you b c d e f g h i j k'
-- and a string with duplicate tabs
UNION ALL
SELECT 'A' + REPLICATE(CHAR(9), 3) + 'B' + REPLICATE(CHAR(9), 7) + 'C' + REPLICATE(CHAR(9), 6)
-- Clean & output the our test data
SELECT dbo.fn_CleanString(myString) FROM tMyStrings
go
-- Cleanup
DROP TABLE tPatterns
DROP TABLE tMyStrings
DROP FUNCTION dbo.fn_CleanString
Edit : I found a text file of bad words that could be downloaded and inserted:
please dont follow the link if you are easliy offended!
Enjoy;-)
C# Gnu
____________________________________________________
November 23, 2009 at 5:37 am
Jeff Moden (11/22/2009)
C# Screw (11/22/2009)
:satisfied:!!!!Wait a minute - I thought I was onto something!!!!!!:satisfied:This works nicely without the Nested replace:-):-):
Very cool revelation but I'd suggest using the same spacing scheme that Michael used for two reasons... it's faster and it will handle up to 65536 spaces.
As a side bar, it's interesting that some folks in my group at work recently received some 3rd party files with (you won't believe this) a freakin' million spaces between some words. That's not an exaggeration... there were over 1 million contiguous spaces in several spots in the file. I'm thinking "road trip" to find out in person if those kids like pork chops. 😛
Jeff,
If you read my comments on http://www.sqlservercentral.com/Forums/Topic819042-203-15.aspx you'll find out that, by just increasing to 64+1 the binary stack of spaces, you can handle almost 10 million spaces.
Those kids might like pork chops but they know nothing about the power of binary approaches to quickly clean up their mess. If this is not enough you can always move to 128+1 which will handle a string of spaces 1.27GB big.
November 23, 2009 at 6:09 am
Understood but thanks for the reminder...
The real problem is that this type of thing should never happen... it's simple and easy to prevent. Data providers that cough up that kind of garbage should be forced to eat their own computers. (Just venting a little).
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2009 at 1:24 pm
Jeff Moden (11/23/2009)
Data providers that cough up that kind of garbage should be forced to eat their own computers. (Just venting a little).
Without apple sauce 😎
November 23, 2009 at 2:37 pm
Paul White (11/23/2009)
Jeff Moden (11/23/2009)
Data providers that cough up that kind of garbage should be forced to eat their own computers. (Just venting a little).Without apple sauce 😎
With Chips 😀
C# Gnu
____________________________________________________
November 24, 2009 at 5:18 am
Interesting idea. And it's collation insensitive too.
But this approach will lose data!
Thus try:
DECLARE @STR VARCHAR(8000)
set @STR='*' + replicate(' ',7998) + '*'
select @STR
select @STR = REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')
SELECT @STR
select len(@str)
It loses my trailing asterisk.
November 24, 2009 at 6:08 am
Michael Meierruth (11/24/2009)
Interesting idea. And it's collation insensitive too.But this approach will lose data!
Thus try:
DECLARE @STR VARCHAR(8000)
set @STR='*' + replicate(' ',7998) + '*'
select @STR
select @STR = REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')
SELECT @STR
select len(@str)
It loses my trailing asterisk.
Mind you in 2005 VarChar(max) it would seem pretty cool..
Send it down to Performance Lab?
C# Gnu
____________________________________________________
November 24, 2009 at 6:19 am
Could be used to (maybe) speed up Generic Cleanup function - but WHERE clause has to be excluded so may offset any gain
CREATE FUNCTION dbo.fn_CleanString(@MyString VARCHAR(max))
RETURNS VarChar(MAX)
AS
BEGIN
-- Remove unwanted strings
SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)
FROM tPatterns
--WHERE CHARINDEX(RemovePattern, @MyString) > 0
ORDER BY ReplaceOrder
RETURN @MyString
END
go
-- Create a table of all the rotten stuff we want to remove
SELECT 'tablescan' RemovePattern, 'seek' ReplaceWith, 1 AS ReplaceOrder
INTO tPatterns
UNION ALL SELECT 'RBAR', 'r***[censored word]',3 AS ReplaceOrder
UNION ALL SELECT 'screw', 's***[censored word]' ,4 AS ReplaceOrder
UNION ALL SELECT ' ','$#$',5 AS ReplaceOrder -- Space cleanup
UNION ALL SELECT '$$#','',6 AS ReplaceOrder
UNION ALL SELECT '$#$', ' ',7 AS ReplaceOrder
UNION ALL SELECT CHAR(9),'$#$',8 AS ReplaceOrder -- TAB cleanup
UNION ALL SELECT '$$#','',9 AS ReplaceOrder
UNION ALL SELECT '$#$', CHAR(9),10 AS ReplaceOrder
-- Create some dirty test data
SELECT 'a b c d e f g h rbar rbar scscrewrew you i tablescan j k' myString
INTO tMyStrings
UNION ALL
SELECT ' tablescan rbar x scscrewrew you b c d e f g h i j k'
-- and a string with duplicate tabs
UNION ALL
SELECT 'A' + REPLICATE(CHAR(9), 3) + 'B' + REPLICATE(CHAR(9), 7) + 'C' + REPLICATE(CHAR(9), 6)
-- Clean & output the our test data
SELECT dbo.fn_CleanString(myString) FROM tMyStrings
go
-- Cleanup
DROP TABLE tPatterns
DROP TABLE tMyStrings
DROP FUNCTION dbo.fn_CleanString
C# Gnu
____________________________________________________
November 24, 2009 at 6:25 am
Very strange this MAX stuff on SS2005. The code below behaves very strange. I.e. I set a string to 8004 characters and yet the LEN function shows it to be 8002.
But the code to remove the extra blanks correctly returns * *.
Strange indeed!
declare @STR varchar(max)
declare @s2 varchar(max)
set @s2 = replicate(' ',8002)
set @STR = '*' + @s2 + '*'
select @STR
select len(@Str)
select @STR = REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')
select @STR
select len(@Str)
November 24, 2009 at 6:32 am
Michael Meierruth (11/24/2009)
Very strange this MAX stuff on SS2005. The code below behaves very strange. I.e. I set a string to 8004 characters and yet the LEN function shows it to be 8002.But the code to remove the extra blanks correctly returns * *.
Strange indeed!
declare @STR varchar(max)
declare @s2 varchar(max)
set @s2 = replicate(' ',8002)
set @STR = '*' + @s2 + '*'
select @STR
select len(@Str)
select @STR = REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')
select @STR
select len(@Str)
hmmm and this shows length of 0!
declare @s2 varchar(max)
set @s2 = replicate(space(1),8002)
SELECT LEN (@s2)
C# Gnu
____________________________________________________
November 24, 2009 at 6:34 am
len() removes trailing spaces. Use Datalength()
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
November 24, 2009 at 6:39 am
jcrawf02 (11/24/2009)
len() removes trailing spaces. Use Datalength()
solves mine - but not Michaels observation
C# Gnu
____________________________________________________
Viewing 15 posts - 271 through 285 (of 425 total)
You must be logged in to reply to this topic. Login to reply