January 26, 2009 at 12:34 pm
Hi all,
I'm trying to rewrite the following function that we use to trim certain catalog numbers in the company. The business rules are easy, if a catalog contains incorrect characters, we trim them and return a correct string containing only correct characters.
It is the only way I found of doing this, but it contains a while loop that I do not actually like.
I'm thinking adding the "RBAR" tag to the title of this article should awake a certain Moden to this article, and see if there is an easy way of doing this.
By the way, it might stop swarming my profiler with that damn while loop, since it is used against roughly a few million rows.
IF EXISTS (SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[fnGICSPFParseGICCatalog]')
AND TYPE IN (N'FN',N'IF',N'TF',N'FS',
N'FT'))
DROP FUNCTION [dbo].[fnGICSPFParseGICCatalog]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION dbo.fnGICSPFParseGICCatalog
(@string VARCHAR(8000),
@TrimAll BIT)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @IncorrectCharLoc SMALLINT
IF @TrimAll = 1
BEGIN
SET @IncorrectCharLoc = PATINDEX('%[^0-9A-Za-z]%',@string)
WHILE @IncorrectCharLoc > 0
BEGIN
SET @string = STUFF(@string,@IncorrectCharLoc,1,'')
SET @IncorrectCharLoc = PATINDEX('%[^0-9A-Za-z]%',@string)
END
SET @string = @string
END
ELSE
BEGIN
SET @IncorrectCharLoc = PATINDEX('%[^-0-9A-Za-z&.$/]%',@string)
WHILE @IncorrectCharLoc > 0
BEGIN
SET @string = STUFF(@string,@IncorrectCharLoc,1,'')
SET @IncorrectCharLoc = PATINDEX('%[^-0-9A-Za-z&.$/]%',@string)
END
SET @string = @string
END
RETURN @string
END
GO
--=== When we send the TRIMALL Parameter a 1, it means we want to trim all weird characters
-- that are not numeric or letters ===--
select dbo.[fnGICSPFParseGICCatalog] ('QO115',1)
-- Gives QO115
select dbo.[fnGICSPFParseGICCatalog] ('QO115&-$?',1)
-- Gives QO115
--=== When we dont, it will trim only the parameters not in the list. ===--
select dbo.[fnGICSPFParseGICCatalog] ('QO115&-$?*',0)
-- Gives QO115&-$ since these 3 characters are permitted in the PAtIndex List.
-- The ? and * were trimmed from the string to avoid problems with the old Cobol system.
Thanks all,
Cheers,
J-F
January 26, 2009 at 12:54 pm
How about this?
[font="Courier New"]
CREATE FUNCTION dbo.fnGICSPFParseGICCatalog_Test(
@string VARCHAR(8000),
@TrimAll BIT)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @B VARCHAR(8000)
SET @B = '' -- Initialize @b-2
IF @TrimAll = 1
SELECT @B = @B + SUBSTRING(@String,N,1)
FROM Tally
WHERE N <= DATALENGTH(@String)
AND
SUBSTRING(@String,N,1) LIKE '%[0-9A-Za-z]%' -- Remove everything but these.
ELSE
SELECT @B = @B + SUBSTRING(@String,N,1)
FROM Tally
WHERE N <= DATALENGTH(@String)
AND
SUBSTRING(@String,N,1) LIKE '%[-0-9A-Za-z&.$/]%' -- Remove everything but these.
RETURN @B
END[/font]
January 26, 2009 at 1:12 pm
Hi Seth,
Thanks for the quick reply,
I've tested the code, and I'm disapointed to see that the code is not performant. It takes twice the time to run against 200 000 rows. (10 secs compared to 5 secs).
I thought it would improve the speed, but it does not... Any idea?
Thanks,
Cheers,
J-F
January 26, 2009 at 1:18 pm
Unfortunately, not right away, no. I know that normally doing any kind of data access in an inline function is a "bad idea", in the same respect that playing darts in a crowded room while blindfolded and drunk is a bad idea... but for some reason I believed this one was the exception to the rule. I just created a very similar function here for the same purpose but my rowsets are much much smaller and I didn't notice any performance hit.
I'll have to do some testing on this and see.
Thanks for the info J-F
January 26, 2009 at 1:31 pm
Thanks for the feedback Seth,
I'm still in need of a quicker function to replace mine, anyone can do it set based? I'll test it to see if it outperforms mine!
Thanks,
Cheers,
J-F
January 27, 2009 at 6:59 am
I'm surprised to see Jeff did not throw in something, I thought RBAR was the trigger for an answer! π
Cheers,
J-F
January 27, 2009 at 8:43 am
Jeff mentioned on another thread yesterday that he's been really busy. There's a few he hasn't had time to reply to yet. Give it time :P.
January 27, 2009 at 9:03 am
I have found that the best solution for this is to use persisted calculated columns!
* Noel
January 27, 2009 at 9:05 pm
noeld (1/27/2009)
I have found that the best solution for this is to use persisted calculated columns!
Haven't tried those yet, Noel... can you give us a quick "how to" on something like this? I'd seriously like to know.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 27, 2009 at 9:07 pm
Jean-François Bergeron (1/27/2009)
I'm surprised to see Jeff did not throw in something, I thought RBAR was the trigger for an answer! π
Heh... what... you think I do a search for these things? Nah... I go through 250+ titles a day. Or, I should say "night"... I don't mix my work time with answering questions on a forum... wouldn't be ethical.
Obviously, I found it... lemme see what's up. I'm also very much looking forward to a response from NoelD... he normally comes up with some good stuff and, I believe, he's been around longer than even me. π
--Jeff Moden
Change is inevitable... Change for the better is not.
January 27, 2009 at 9:16 pm
p.s. Wrapping the problem up in a couple of "Free Pizza" coupons and wrapping that in a cold case of beer would give me better incentive to look for these things. π
--Jeff Moden
Change is inevitable... Change for the better is not.
January 27, 2009 at 9:52 pm
I actually heard about persisted computed the other day and haven't had a chance to look into exactly how one sets them up yet. I'd be interested to see that too.
January 27, 2009 at 11:08 pm
...and Jean-François came up with one of the very few places where a memory-only While Loop actually squeaks past the Tally table (at least on my machine, it does).
CREATE FUNCTION dbo.fnGICSPFParseGICCatalog
(
@String VARCHAR(8000),
@TrimAll BIT
)
RETURNS VARCHAR(8000)
AS
BEGIN
--===== Declare and preset local variables
DECLARE @Len INT
SELECT @Len = LEN(@String)+1
--===== Remove bad characters according to @TrimAll
IF @TrimAll = 0
SELECT @String = STUFF(@String,@Len-t.N,1,'')
FROM dbo.Tally t
WHERE t.N < @Len
AND SUBSTRING(@String,@Len-t.N,1) LIKE '[^0-9A-Z]'
ORDER BY t.N ASC
ELSE
SELECT @String = STUFF(@String,@Len-t.N,1,'')
FROM dbo.Tally t
WHERE t.N < @Len
AND SUBSTRING(@String,@Len-t.N,1) LIKE '[^-0-9A-Z&.$/]'
ORDER BY t.N ASC
--===== Return the cleaned value and exit
RETURN @String
END
I've got a pretty old (but ever dependable) 7 year old 1.8Ghz 1GB desktop computer where the while loop and the code above takes between 23 seconds (yours) and 27 seconds (code above) on 200 k rows of a 69 character column to clean instead of the 5 seconds you got on your machine. So, I've gotta ask, why type of machine do you have?
And, yes, just like your code, the code runs a bit faster with two separate queries and constants for the pattern than it does with a variable for the pattern assigned a value through a decision.
p.s. Make sure your Tally table has a Clustered Primary Key Constraint on "N".
--Jeff Moden
Change is inevitable... Change for the better is not.
January 28, 2009 at 6:24 am
Jeff,
Can you supply the code necessary to generate the test data you used? I have a quad-core 2.83GHz machine with SQL 2005 DE SP2 that I'd like to test this scenario against, so I can start to learn how to generate test data, as well as start to see how to test these kinds of things. Thanks!
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) π π π
Rent Servers for Income (picks and shovels strategy)
January 28, 2009 at 7:18 am
Well, I got a simple 3.0Ghz Dev machine with 4GB of Ram. It's not that quick, trust me. And I ran the query on 200k rows that has around 30 characters in it.
Don't worry, the tally has the clustered key on N, π Thanks for the advice,
I'm looking forward to know about persisted calculated columns, but I do not think it will help in this case. This validation is within a huge procedure that checks for vendors price files, and we try to match on various columns, catalog, UPC, VndCatalog. So, we need to generate a "Home" catalog from the VndCatalog, which can have really weird characters at times.
I wouldn't want to calculate a whole column of GICCatalog from a VndCatalog on a Working table to actually validate the sent data. I was really looking forward a quicker way then a while, but seems it's the best way.
I've been following your posts, articles, and everything made sense to actually remove those damn cursors or while loops, can you tell me why the loop beats the tally in this case? Do I have to test the cursor, and the tally everytime I develop a new solution, to see which one can be quicker?
Thanks in advance,
Cheers,
J-F
Viewing 15 posts - 1 through 15 (of 36 total)
You must be logged in to reply to this topic. Login to reply