November 22, 2009 at 6:34 pm
Ok
1/ modified so will cope with multiple tabs
2/ -1 issue resolved! [Edit : too much switching between SQL/C# !]
3/ Performance won't be an issue as I am suggesting this function is called on after update trigger.
After update trigger would have something like:
If INSERTED record count > 1 exit trigger.
Edit
4/ Changed to use Michaels space system
Of course I agree with your comments - just trying to find a neat solution to handle clean up of all sorts of junk.
CREATE FUNCTION dbo.fn_CleanString(@MyString VARCHAR(max))
RETURNS VarChar(MAX)
AS
BEGIN
-- Remove unwanted strings
SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)
FROM tPatterns
WHERE CHARINDEX(RemovePattern, @MyString) > 0
RETURN @MyString
END
go
-- Create a table of all the rotten stuff we want to remove
SELECT 'tablescan' RemovePattern, 'seek' ReplaceWith
INTO tPatterns
UNION ALL SELECT 'RBAR', 'R***'
UNION ALL SELECT 'anyword', 'R*******'
UNION ALL SELECT CHAR(9), ' ' -- Get rid of all tabs
UNION ALL SELECT replicate(' ',32+1),' '
UNION ALL SELECT replicate(' ',16+1),' '
UNION ALL SELECT replicate(' ',8+1), ' '
UNION ALL SELECT replicate(' ',4+1), ' '
UNION ALL SELECT replicate(' ',2+1), ' '
UNION ALL SELECT replicate(' ',1+1), ' '
-- Create some dirty test data
SELECT 'a b c d e f g h rbar i tablescan j k' myString
INTO tMyStrings
UNION ALL
SELECT ' tablescan rbar x b c d e f g h i j k'
-- and a string with duplicate tabs
UNION ALL
SELECT 'A' + REPLICATE(CHAR(9), 3) + 'B' + REPLICATE(CHAR(9), 7) + 'C' + REPLICATE(CHAR(9), 6)
-- Clean & output the our test data
SELECT dbo.fn_CleanString(myString) FROM tMyStrings
go
-- Cleanup
DROP TABLE tPatterns
DROP TABLE tMyStrings
DROP FUNCTION fn_CleanString
Allow me a little excitement in my hour of discovery 🙂 I must go sleep now is 1:30 am!
C# Gnu
____________________________________________________
November 22, 2009 at 6:42 pm
C# Screw,
You are certainly allowed your hour of excitement! Sorry if I come across as beating your code up, it's not intentional - I've been most impressed by the ideas behind your submissions 🙂
Paul
November 22, 2009 at 6:45 pm
Jeff Moden (11/22/2009)
Paul White (11/22/2009)
5. Actually, don't do data access at all in functions!! Unless you really really really know what you are doing, and even then be sure to test it to death!
Fixed 😉
November 22, 2009 at 6:48 pm
Paul White (11/22/2009)
C# Screw,You are certainly allowed your hour of excitement! Sorry if I come across as beating your code up, it's not intentional - I've been most impressed by the ideas behind your submissions 🙂
Paul
Cheers Paul,
this forum is great, its like working with real cool dudes - you guys really know your stuff.
Have to say good night from UK now :doze:
C# Gnu
____________________________________________________
November 22, 2009 at 6:48 pm
C# Screw (11/22/2009)
Of course I agree with your comments - just trying to find a neat solution to handle clean up of all sorts of junk.
Heh... I'm sure that what I'm about to say will surely knock some folks right out of their chair because everyone that knows me know that I believe that speed/performance is only secondary to accuracy... but...
Even if your good function turns out to be a bit slower than some other method that may pop up, having such a thing in a function and being a bit slower is a whole lot better than someone trying to solve the problem on their own with a While Loop or worse... they might get things wrong. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
November 22, 2009 at 6:51 pm
Paul White (11/22/2009)
Jeff Moden (11/22/2009)
Paul White (11/22/2009)
5. Actually, don't do data access at all in functions!! Unless you really really really know what you are doing, and even then be sure to test it to death!Fixed 😉
LMAO!!! Well done! 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
November 22, 2009 at 6:57 pm
Paul White (11/22/2009)
(which I still admire BTW)
Heh... The odd thing is, I don't deserve that... like I said in the article, it's an old method and I didn't come up with it. I've just never seen anyone explain it before (kind of like the basis of the Tally table article... everyone said to use it but never explained it). The only thing I did was get rid of a 4th replace and, I'm sure if you look for it, someone probably beat me there, as well.
I do appreciate the feedback, though... I guess I can start backing out the screws on some of the handrails. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
November 22, 2009 at 7:58 pm
Performance:
This test run was done having 10 replace strings in the Data Driven table, as in the T-SQL sample above.
Not really fair to compare as this is the only solution offering multi-pattern replace so far.
But still it seems quite respectable and time wise is on the line with Jeff original (in a function)..
New Data Driven solution comes in :
C# Gnu
____________________________________________________
November 23, 2009 at 2:15 am
Jeff Moden (11/22/2009)
I'll also agree that nested replaces will still beat the RBAR of doing single replaces even if they are table driven in a single query... we've already seen that in testing.
NopeNot sure on that one, in the test results I provided, looping is 20% faster than the In-Line Replace in the artcle. BUT I agreee it was only 10k rows with 2GB RAM.
Jeff Moden (11/22/2009)
Just in case someone is confused (not you Paul), not all seemingly "loopless" code is Set Based... you also have to look at how many times each row or item is "touched". For example, a "Triangular Join" to do a running total on a lousy 10,000 rows will produce more than 50 MILLION rows of internal RBAR which is why it's so very slow compared to even a Cursor.
In my opinion every single solution posted on this thread is doing RBAR!
The only way you will get away from RBAR is by creating a calculated, peristed column of :
CHARINDEX(SPACE(2),yourcol). Then Create an INDEX on this column.
Say the column was called 'HasSpace', then adding WHERE HasSpace > 0
Only then can you say good bye RBAR, as SQL will be able to use the index and you will have the fastest solution possible [Edit: apart from cleaning data on a trigger of course].
:-P:-)
C# Gnu
____________________________________________________
November 23, 2009 at 2:31 am
C#Screw
My compliments on this neat cleanup engine. Here are some comments:
In the 32+1, 16+1, etc part of the code, you need to add an extra replace of two spaces with one space. See my earlier comments on why this is needed.
I thought, and Paul mentioned this earlier, that you do not have control over the order the rows are read from a table if there is no explicit order by. With a small table like yours the rows seems to get read in the order you load them. But that's just luck. Thus I added a rownum column which forces the 32+1, 16+1,... strings to be loaded and processed in reverse order. This causes a string like '*'+replicate(' ',7998)+'*' to end up as '*' + 5 spaces + '*'.
Now for some laughter. I noticed you were trying to clean up obscenities (including RBAR:w00t:). So here is my version of 'obscenity injection' which your code cannot handle.
Let's say your table wants to remove from all strings any parts containing 'screw you':w00t:. You do this by doing a
union all 'screw you','' into you cleanup table.
However, I'm an obscenity obsessed hacker and will insert into your data table the following:
'scscrew yourew you'
And here you thought you were all done!:-D
November 23, 2009 at 2:37 am
Michael Meierruth (11/23/2009)
'scscrewrew you'
lol !
I wonder if could change to LIKE
then table entry would be '%screw%you%'
SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)
FROM tPatterns
WHERE MyString LIKE RemovePattern
RETURN @MyString
🙂
C# Gnu
____________________________________________________
November 23, 2009 at 2:43 am
C# Screw (11/23/2009)
Michael Meierruth (11/23/2009)
'scscrewrew you'lol !
I wonder if could change to LIKE
then table entry would be '%screw%you%'
SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)
FROM tPatterns
WHERE MyString LIKE RemovePattern
RETURN @MyString
🙂
Actually no that won't work... hey Michael this is a new game altogether!!
But maybe [Edit: replace string table structure]:
SearchString | CSVReplaceBits | CSVReplaceBitsWith
'%screw%you%' | Screw, You | S***,Y**
Lots more RBAR- ing though to get that to work 🙁
C# Gnu
____________________________________________________
November 23, 2009 at 3:08 am
Any maintenance screen would have to look after / allow change of the ReplaceOrder column...
CREATE FUNCTION dbo.fn_CleanString(@MyString VARCHAR(max))
RETURNS VarChar(MAX)
AS
BEGIN
-- Remove unwanted strings
SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)
FROM tPatterns
WHERE CHARINDEX(RemovePattern, @MyString) > 0
ORDER BY ReplaceOrder
RETURN @MyString
END
go
-- Create a table of all the rotten stuff we want to remove
SELECT 'tablescan' RemovePattern, 'seek' ReplaceWith, 1 AS ReplaceOrder
INTO tPatterns
UNION ALL SELECT CHAR(9), ' ',2 AS ReplaceOrder -- Get rid of all tabs
UNION ALL SELECT 'RBAR', 'r***[censored word]',3 AS ReplaceOrder
UNION ALL SELECT 'screw', 's****[censored word]' ,4 AS ReplaceOrder
UNION ALL SELECT replicate(' ',32+1),' ',5 AS ReplaceOrder
UNION ALL SELECT replicate(' ',16+1),' ',6 AS ReplaceOrder
UNION ALL SELECT replicate(' ',8+1), ' ',7 AS ReplaceOrder
UNION ALL SELECT replicate(' ',4+1), ' ',8 AS ReplaceOrder
UNION ALL SELECT replicate(' ',2+1), ' ',9 AS ReplaceOrder
UNION ALL SELECT replicate(' ',1+1), ' ',10 AS ReplaceOrder
UNION ALL SELECT replicate(' ',1+1), ' ',11 AS ReplaceOrder
-- Create some dirty test data
SELECT 'a b c d e f g h rbar scscrewrew you i tablescan j k' myString
INTO tMyStrings
UNION ALL
SELECT ' tablescan rbar x scscrewrew you b c d e f g h i j k'
-- and a string with duplicate tabs
UNION ALL
SELECT 'A' + REPLICATE(CHAR(9), 3) + 'B' + REPLICATE(CHAR(9), 7) + 'C' + REPLICATE(CHAR(9), 6)
-- Clean & output the our test data
SELECT dbo.fn_CleanString(myString) FROM tMyStrings
go
-- Cleanup
DROP TABLE tPatterns
DROP TABLE tMyStrings
DROP FUNCTION fn_CleanString
C# Gnu
____________________________________________________
November 23, 2009 at 3:29 am
Don't know if you noticed, but the TAB replace doesn't seem to work:
output:
a b c d e f g h r***[censored word] scs****[censored word]rew you i seek j k
seek r***[censored word] x scs****[censored word]rew you b c d e f g h i j k
A---B-------C--- << where --- are spaces, I would have expected single space?
Any ideas?
C# Gnu
____________________________________________________
November 23, 2009 at 3:50 am
The problem is the where clause in your function.
It gets interpreted only once.
Viewing 15 posts - 256 through 270 (of 425 total)
You must be logged in to reply to this topic. Login to reply