November 18, 2009 at 1:13 pm
Paul White (11/18/2009)
- no doubt by then someone will have found an improvement which runs in negative time :laugh:Paul
Um, tell the vendor to clean it up on their end? 😀
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
November 18, 2009 at 1:15 pm
vliet (11/18/2009)
Hello Jeff,Thank you so much for posting this solution! We will apply your method in a SSIS Derived Column Transformation on data coming from raw files. This data needs to be cleansed before it even reaches the SQL Server staging tables, so we can not use CLR functions or SQL functions.
I will let you know how it performs, I'll try to use the horizontal tab (ASCII code 9) as the special character because this character will be replaced by a single space in this same transformation step. I'm curious about the performance in this context, but I must admit that your solution is much more elegant than my initial version with the 65, 33, 17, 9, 5, 3 an 2 spaces nested REPLACE construction. Your method can be applied on very wide columns without a limit to the number of contiguous spaces, and all it needs is one special character.
We can use the same expression for every string column that needs to be cleansed and we only need to replace the column name once. This is an aspect of your method that comes in handy in real life situations where you need to cleans more than a few columns of a few tables.
Keep up the good work!
Dony.
Hi Dony... thanks for the great feedback. BUT... as so often happens (it's almost designed to happen because so many good people get involved), the method I posted has been soundly beaten by, you guessed it, a nested replace very similar to what you said. Please see the following post on this thread....
http://www.sqlservercentral.com/Forums/Topic819042-203-3.aspx#BM820760
--Jeff Moden
Change is inevitable... Change for the better is not.
November 18, 2009 at 1:20 pm
Paul White (11/18/2009)
Hey everyone, just a quick note before I go to work this morning. I love the way this thread it going - huge thanks especially to C# Screw (lol!) and Flo for tidying my (still slightly amateur) C# as usual! I have a fairly busy morning ahead but will look in this afternoon - no doubt by then someone will have found an improvement which runs in negative time :laugh:I hope Jeff pops by soon because I'm really interested in the huge performance difference (10x) I found working with the different collations with his cool method.
Paul
I'm at work right now so I can only provide quick "verbal" answers... can't take the time to test any code at work. I'll give the collation code a test tonight... very interesting find you made, Paul. Heh... more proof that SQL is NOT SQL.... well, unless it's Tuesday, it's raining, and you have precisely 3 pork chops on the thaw. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
November 18, 2009 at 1:41 pm
I am wondering if most usefull/usable Community result should include strip multiple Tab characters as well?
Thanks
Screw
10k rows
Each execution completed 10 times.
SQL function: Replace Technique : but using replacement chars ||*9*9||
2635
SQL function :Brigzy (C#Screw) Looping
2517
SQL function: Jeff Original : single bell char
1884
CLR: Brigzy (C#Screw) Looping
332
SQL function: Michael Meierruth Original
332
CLR: Not looping using ||*9*9|| technique in C#
312
SQL function: Michael Meierruth : with hard coded spaces
304
CLR: using Jeff's single char technique in C#
243
CLR: C#Screw II - Spaces and TABS
181
CLR :using JCBnew space replacer C#
160
CLR: Michael Meierruth - C#Screw conversion: hard coded spaces
143
CLR: using Flo s space replacer C#
121
CLR: using Paul White space replacer C#
105
[SqlFunction()]
public static string fn_TidySpace_SharpScrewII_CLR(string s)
{
StringBuilder sb = new StringBuilder();
string[] parts = s.Split(new char[] { ' ','\t' }, StringSplitOptions.RemoveEmptyEntries);
int size = parts.Length;
for (int i = 0; i < size; i++)
sb.AppendFormat("{0} ", parts);
return sb.ToString();
}
C# Gnu
____________________________________________________
November 18, 2009 at 1:48 pm
Hi C# Screw
Good point, maybe try this instead of the StringBuilder:
string[] parts = s.Split(new char[] { ' ', '\t' });
return string.Join(" ", parts);
But I think the char-loop based solutions will still be much faster, since they do less memory copies.
Greets
Flo
November 18, 2009 at 1:54 pm
Flo - thank you so much for the Tui billboard :laugh:
I should have known you'd get your revenge! Man, that made me snigger out loud at my desk....!
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 18, 2009 at 1:59 pm
Paul White (11/18/2009)
Flo - thank you so much for the Tui billboard :laugh:I should have known you'd get your revenge! Man, that made me snigger out loud at my desk....!
Always welcome! 😀
Remember, what you give is what you get.
(... Damn! Yesterday I just called you crazy :pinch: )
November 18, 2009 at 2:02 pm
Florian Reischl (11/18/2009)
Hi C# ScrewGood point, maybe try this instead of the StringBuilder:
string[] parts = s.Split(new char[] { ' ', '\t' });
return string.Join(" ", parts);
But I think the char-loop based solutions will still be much faster, since they do less memory copies.
Greets
Flo
Hi Flo
I tried the above but it is returning the original string :unsure:
C# Gnu
____________________________________________________
November 18, 2009 at 2:07 pm
C# Screw (11/18/2009)
Florian Reischl (11/18/2009)
Hi C# ScrewGood point, maybe try this instead of the StringBuilder:
string[] parts = s.Split(new char[] { ' ', '\t' });
return string.Join(" ", parts);
But I think the char-loop based solutions will still be much faster, since they do less memory copies.
Greets
Flo
Hi Flo
I tried the above but it is returning the original string :unsure:
Oups, forgot the StringSplitOptions... Try this:
string[] parts = s.Split(new char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
return string.Join(" ", parts);
Greets
Flo
Edit:Removed the leading spaces
November 18, 2009 at 2:14 pm
Yep thats it Flo:
10k rows
10x execution AVG
SQL function: Replace Technique : but using replacement chars ||*9*9||
2648
SQL function :Brigzy (C#Screw) Looping
2538
SQL function: Jeff Original : single bell char
1891
CLR: Brigzy (C#Screw) Looping
348
SQL function: Michael Meierruth : with hard coded spaces
342
SQL function: Michael Meierruth Original
317
CLR: Not looping using ||*9*9|| technique in C#
296
CLR: using Jeff's single char technique in C#
255
CLR: C#Screw II - Spaces and TABS
197
CLR: Flo - Spaces and TABS - v.compact
169
CLR :using JCBnew space replacer C#
162
CLR: Michael Meierruth - C#Screw conversion: hard coded spaces
152
CLR: using Paul White space replacer C#
109
CLR: using Flo s space replacer C#
108
C# Gnu
____________________________________________________
November 18, 2009 at 2:21 pm
Oh look ! I have become a ValuedMember :blush:
Screw
C# Gnu
____________________________________________________
November 18, 2009 at 2:23 pm
C# Screw (11/18/2009)
Oh look ! I have become a ValuedMember :blush:
Congrats Screw!
And thanks for all your time to do those tests 🙂
Greets
Flo
November 18, 2009 at 2:42 pm
I noticed that 'vliet' in a post earlier today seems to know about my binary approach calling it the "65, 33, 17, 9, 5, 3 an 2 spaces nested REPLACE construction".
I did notice one bug in the code I published. It appears to not work for strings of length K*2^N+1 for K>1 and 2^N+1 corresponding to the largest string substitution you are doing (33). Strings of this length would end up in a string of length 2 (e.g. 65, 97, 129, etc). Thus one additional replace(@s," "," ") is needed to handle these relatively rare cases. Here is the correct version for C#Screw because this might slow things down a nanosecond or so and put me in a different place in the leaderboard.
CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKII(@S VARCHAR(8000))
RETURNS VARCHAR(8000)
BEGIN
RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(@s)),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ')
END
I have also done some quick simulations to see maximum lengths of strings that can be handled with each value of N. These are the results so far:
N=1 10
N=2 38
N=3 286
N=4 4622
N=5 151534
Thus 2^N+1 = 33 for N=5 covers SQL Server's max length of 8000 quite nicely. No need for vliet's 65 for which I haven't figured out yet the maximum length string it could handle. From the above sequence it looks like something very big. Can anyone see the formula for this sequence?
OK. Thought about it a bit.
For N=6 it appears to be 9845678.
The formula seems to be 2^(N+1)+Product for N=0 to N of 2^N+1
E.g. for N=5 we get 64 + 33*17*9*5*3*2 = 151534
November 18, 2009 at 3:08 pm
Michael,
here you are now (included change to c# version):
Slick T-SQL version to strip Spaces and TABS anyone?
Time to sleep here shortly in UK ...:doze:
Thanks
Screw
10k rows
Each execution completed 10 times.
SQL function: Replace Technique : but using replacement chars ||*9*9||
2638
SQL function :Brigzy (C#Screw) Looping
2511
SQL function: Jeff Original : single bell char
1866
SQL function: Michael Meierruth Original
321
CLR: Brigzy (C#Screw) Looping
316
SQL function: Michael Meierruth III - extra replace
306
SQL function: Michael Meierruth : with hard coded spaces
305
CLR: Not looping using ||*9*9|| technique in C#
287
CLR: using Jeff's single char technique in C#
248
CLR: C#Screw II - Spaces and TABS
181
CLR: Flo - Spaces and TABS - v.compact
167
CLR :using JCBnew space replacer C#
160
CLR: Michael Meierruth - C#Screw conversion: with extra Replace
149
CLR: Michael Meierruth - C#Screw conversion: hard coded spaces
145
CLR: using Flo s space replacer C#
109
CLR: using Paul White space replacer C#
104
C# Gnu
____________________________________________________
November 18, 2009 at 3:25 pm
Mark-101232 (11/18/2009)
Just out of interest I converted your code to use pointers which improved performance by 5-10%. Not that I would recommend using UNSAFE code though...
Hey Mark,
Wouldn't mind seeing that code if you have time to post it - I considered using pointers but ran out of talent! Same goes for a C++ implementation. The pointer thing should be faster, probably more than 5-10%, but IIRC unmanaged code means the thread has to go pre-emptive rather than co-operative, so that probably slows it down a bit overall...? Anyway, would be nice to see the code 🙂
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 136 through 150 (of 425 total)
You must be logged in to reply to this topic. Login to reply