September 30, 2014 at 7:36 am
CELKO (9/30/2014)
I use nested REPLACE() functions. Some day we might have the ANSI/ISO Standard TRANSLATE()
Hi Joe,
I agree that using nested replace functions are the berries for performance especially when replacing multiple adjacent spaces with just one. Would that be practical in this case? You'd only be keeping 52 out of 256 characters (ASCII characters) and that's if it's not Unicode. I haven't actually tested it but would seem that the 254 nested replaces required would be a bit of a performance problem even for nested replaces.
Does your alpha-only function have that many replaces or is there another trick to it?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 30, 2014 at 1:48 pm
Not sure if this query extract is helpful but it helped me in a recent situation where I was trying to deal with non-alpha characters:
WHERE ([Bill Freq] IS NOT NULL) AND ([Bill Freq] NOT LIKE '^[A-Za-z0-9]+$') AND ([Bill Freq] NOT LIKE '')
October 1, 2014 at 3:01 pm
The original version I wrote of this was in 2006 for Experts Exchange, and it was to strip nonnumeric chars (although of course the basic idea is the same):
http://www.experts-exchange.com/Database/MS-SQL-Server/Q_21957163.html
Of course unfortunately you'll now need an EE membership -- or free trial -- to see it, grrrr.
Edit: And of course it's certainly possible someone else wrote similar code before that. If they did, I'm unaware of it.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 1, 2014 at 5:20 pm
bgalway (9/30/2014)
Not sure if this query extract is helpful but it helped me in a recent situation where I was trying to deal with non-alpha characters:WHERE ([Bill Freq] IS NOT NULL) AND ([Bill Freq] NOT LIKE '^[A-Za-z0-9]+$') AND ([Bill Freq] NOT LIKE '')
The idea isn't to find things... it's to fix things by changing the actual data. Also, your test full null is likely not necessary because, unless you've changed the server defaults or have the appropriate SET statement somewhere in your code, NULLs will naturally be excluded because they cannot be compared even to each other.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 1, 2014 at 6:13 pm
Some day we might have the ANSI/ISO Standard TRANSLATE()
In the meantime you can use the one that I created.
I am working on an article I plan to submit to SSC and was reluctant to share this without a little more testing but, perhaps, this would be a good opportunity to get some feedback/suggestions.
One of them uses this simple getnums function:
CREATE FUNCTION dbo.getnums(@rows int)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
(
WITH
L1 AS (SELECT n=n FROM (values (1),(1)) t(n)),
L2 AS (SELECT n=1 FROM L1 a CROSS APPLY L1 b),
L3 AS (SELECT n=1 FROM L2 a CROSS APPLY L2 b),
L4 AS (SELECT n=1 FROM L3 a CROSS APPLY L3 b),
L5 AS (SELECT n=1 FROM L4 a CROSS APPLY L4 b),
iTally AS
(
SELECT n = row_number() over (order by (select null))
FROM L5 a CROSS APPLY L5 b
)
SELECT TOP (@rows) n
FROM itally
ORDER BY n
);
GO
Here are the first three versions (yes, we all know how bad loops are, I'll explain why I created one with a loop in a moment...)
;-- (1) dbo.loopTranslate (scalar using a loop)
CREATE FUNCTION dbo.loopTranslate
(@string varchar(8000),
@replace varchar(100),
@with varchar(100)
)
/*
-- Use
DECLARE @string varchar(100) = '##!!xyxxyyzz!!##';
SELECT [@string]= @string, newstring = dbo.loopTranslate(@string,'xyz#!','abc')
*/
RETURNS varchar(8000)
AS
BEGIN
DECLARE @i int = 1;
WHILE @i <= len(@replace)
BEGIN
SELECT@string = replace(@string,substring(@replace,@i,1),substring(@with,@i,1)),
@i = @i+1
END
RETURN @string
END;
GO
-- (2) dbo.svfTranslate (scalar using a tally table)
CREATE FUNCTION dbo.svfTranslate
(@string varchar(8000),
@replace varchar(100),
@with varchar(100)
)
/*
-- Use
DECLARE @string varchar(100) = '##!!xyxxyyzz!!##';
SELECT [@string]= @string, newstring = dbo.svfTranslate(@string,'xyz#!','abc')
*/
RETURNS varchar(8000) AS
BEGIN
SELECT @string = replace(@string,substring(@replace,n,1),substring(@with,n,1))
FROM dbo.getnums(len(@replace));
RETURN @string;
END;
GO
-- (3) itvfTranslate (inline tvf using recursive cte)
CREATE FUNCTION dbo.itvfTranslate
(@string varchar(8000),
@replace varchar(100),
@with varchar(100)
)
/*
-- Use
DECLARE @string varchar(100) = '##!!xyxxyyzz!!##';
SELECT [@string]= @string, newstring = x
FROM dbo.itvfTranslate(@string,'xyz#!','abc')
*/
RETURNS table AS
RETURN
(
WITH a AS
(
SELECT n=1, x=replace(@string,substring(@replace,1,1),substring(@with,1,1))
UNION ALL
SELECT n+1, x=replace(x,substring(@replace,n+1,1),substring(@with,n+1,1))
FROM a
WHERE n < len(@replace)
)
SELECT x
FROM a
WHERE n = len(@replace)
);
GO
--Note: I also did a couple variations using the "Quirky Update" method but they performed badly...
...and here's one using a scalar recursive function (note the commented section for examples of use):
-- (4) rTranslate (recursive scalar)
CREATE FUNCTION dbo.rTranslate
(@s-2 varchar(1000)(8000),
@r varchar(32),
@w varchar(32),
@i int=1
)
/*
Created by: Alan Burstein
Created on: 9/2/2014
Usage Examples:
-- (1) basic replace and remove (both functions)
-- remove $ and #, Replace a with A, c with C, and b with x...
DECLARE @string varchar(20)='###$$$aaabbbccc$$$###',
@pre varchar(5)='acb#$',
@post varchar(5)='ACx';
SELECToriginal = @string,
Translated = dbo.rTranslate(@string,@pre,@post,1);
GO
-- (2) format phone numbers
-- (a) format phone (atomic value)
DECLARE @string varchar(20)='(425)555-1212',
@pre varchar(5)=')(', @post varchar(5)='-';
SELECToriginal = @string,
Translated = dbo.rTranslate(@string,@pre,@post,1);
-- (b) format phone numbers(from table)
WITH phoneNbrs(n,pn) AS
(SELECT 1, '(425)555-1212' UNION ALL
SELECT 2, '425.555.1212' UNION ALL
SELECT 3, '425_555_1212' UNION ALL
SELECT 4, '(425)555.1212' )
SELECT n, pn AS before, [after] = dbo.rTranslate(pn,x,y,1)
FROM phoneNbrs
CROSS APPLY (VALUES('.)_('+char(32),'---')) t(x,y);
-- (c) hide phone numbers
WITH phoneNbrs(n,pn) AS
(SELECT 1, '(425) 555-1212' UNION ALL
SELECT 2, '425.555.1212' UNION ALL
SELECT 3, '425-555-1212' UNION ALL
SELECT 4, '4255551212' )
SELECT n, pn AS before, [after] = dbo.rTranslate(pn,x,y,1)
FROM phoneNbrs
CROSS APPLY (VALUES('()-.0123456789','()-.**********')) t(x,y)
GO
-- (3) Replace accent characters with normal characters,
DECLARE @string varchar(100)='Thë Quìck Greeñ Fox jumpëd over thë lázy dogs back!',
@special1 varchar(32)='áâãäæèïéìëíîçåñòóôöõàøúüûùýÁÃÄ',
@special2 varchar(32)='ÆÈÏÉÌËÍÎÅÑÒÓÔÖÕÀØÚÜÛÙÝ!',
@normal1 varchar(32)='aaaaaeieieiicanoooooaouuuuyAAAAA',
@normal2 varchar(32)='EIEIEIIANOOOOOAOUUUUY.';
SELECT@string AS original,
newstring = dbo.rTranslate(dbo.rTranslate(@string,@special1,@normal1,1),@special2,@normal2,1);
-- note how I deal with the recursion limit, note the query plan
GO
-- (4) using each to find a pattern in a set of strings...
-- looking for the string/pattern combination that will produce ABCABC
DECLARE @strings TABLE(string varchar(20) not null);
DECLARE @patterns TABLE(p_id int identity primary key,
pattern varchar(10) not null,
p_replace varchar(10) not null);
DECLARE @SearchFor varchar(10)='ABCABC';
INSERT @strings VALUES('123abcABC1'),('222defACC'),('123ABCmno');
INSERT @patterns VALUES('123','E'),('mno','XYZ'),('abc123','ABC'),('ABC','XYZ');
WITH pattern_search AS
(
SELECT string, pattern, p_replace, newstring = dbo.rTranslate(string,pattern,p_replace,1)
FROM @strings s
CROSS APPLY @patterns p
)
SELECT *, [matched] = case newstring when @SearchFor then 'yes' else 'no' end
FROM pattern_search;
*/
RETURNS varchar(8000)
AS
BEGIN
RETURN
(CASE
WHEN @i <= len(@r)
THEN dbo.rTranslate(replace(@s,substring(@r,@i,1),substring(@w,@i,1)),@r,@w,@i+1)
ELSE @s-2
END
)
END
GO
The downside to this guy, of course, is the 32 char limit but this can be circumvented using nesting; note example #3 in my code
dbo.rTranslate(dbo.rTranslate(@string,@special1,@normal1,1),@special2,@normal2,1);
...
The test harness I've been using:
IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;
SELECT val = replicate(cast(newid() as char(36)),200)
INTO #vals
FROM dbo.getnums(5000);
GO
--SELECT * FROM #vals
SET NOCOUNT ON;
dbcc freeproccache
dbcc dropcleanbuffers
DECLARE @r varchar(20)='ABCDEF190',
@w varchar(20)='00000000';
DECLARE @x varchar(8000);
SET STATISTICS TIME ON;
--SET STATISTICS IO ON;
PRINT char(13)+'svf loop version:'
SELECT @x = dbo.loopTranslate(val, @r, @w)
FROM #vals;
PRINT char(13)+'svf tally version:'
SELECT @x = dbo.svfTranslate(val, @r, @w)
FROM #vals;
PRINT char(13)+'itvf recursive cte version:'
SELECT @x = x
FROM #vals
CROSS APPLY dbo.itvfTranslate(val, @r, @w);
PRINT char(13)+'recursive version:'
SELECT @x = dbo.rTranslate(val, @r, @w, 1)
FROM #vals;
-- add a replace() function for testing
SET STATISTICS TIME OFF;
--SET STATISTICS IO OFF;
GO
and the results:
svf loop version:
SQL Server Execution Times:
CPU time = 4742 ms, elapsed time = 4792 ms.
svf tally version:
SQL Server Execution Times:
CPU time = 4665 ms, elapsed time = 4735 ms.
itvf recursive cte version:
SQL Server Execution Times:
CPU time = 5460 ms, elapsed time = 5561 ms.
recursive version:
SQL Server Execution Times:
CPU time = 812 ms 4665, elapsed time = 871 ms 4680.
I included the loop version to show how badly the other other 2 performed. I have tested the recursive version with various string lengths and row counts and the results are the same: the recursive scalar version is 5-10 times faster. The code is simple as is the query plan: never mind, they're all basically the same - the recursive scalar version however is the cleanest as is the query plan.
Edit: I realized that the recursive version was taking an input @string input of varchar(1000) vs. varchar(8000) for the other three. Nonetheless, here's a few examples of how to create a translate() function in SQL Server 2000 through SQL Server 2014. For anyone interested.
Final edit: just read through the thread and saw Jeff's comment...
"Note that we cannot use "SET STATISTICS" to measure performance here because there's a scalar function involved. "SET STATISTICS" greatly skews the performance results when scalar functions are present making the scalar function look a whole lot worse than it actually is. Please see the following article for more on that little testing nuance..."
I'll need to re-do my test.
-- Itzik Ben-Gan 2001
October 1, 2014 at 10:56 pm
Ah... my apologies. I had to withdraw my post that was in this spot. At first, it looked like your CROSS APPLY "GetNums" was twice as fast as the traditional CROSS JOIN method. Much to my embarrassment, I found that I had phat-phingered the number "2000000" instead of "1000000" on the traditional method.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 2, 2014 at 5:37 pm
Maybe I'm missing something here, but this is simple and fast:
declare @t varchar(5000) = 'RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row." This-divided-by,that|&/multiplied\by(100)give*you,%;<percent>.'
declare @s-2 varchar(5000) = ''
select @s-2 = @s-2 + substring(@t, i, 1)
from dbo.fnTalleyTable(len(isnull(@t, ''))) tt
where substring(@t, i, 1) like '[a-z]'
select @s-2
Don
Don Simpson
October 2, 2014 at 5:51 pm
DonlSimpson (10/2/2014)
Maybe I'm missing something here, but this is simple and fast:
declare @t varchar(5000) = 'RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row." This-divided-by,that|&/multiplied\by(100)give*you,%;<percent>.'
declare @s-2 varchar(5000) = ''
select @s-2 = @s-2 + substring(@t, i, 1)
from dbo.fnTalleyTable(len(isnull(@t, ''))) tt
where substring(@t, i, 1) like '[a-z]'
select @s-2
Don
Not recommended.
Multi-Row Variable Assignment and ORDER BY
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 2, 2014 at 5:55 pm
The problem with your code, Don, is that it only handles scalar values.
October 2, 2014 at 10:37 pm
dwain.c (10/2/2014)
DonlSimpson (10/2/2014)
Maybe I'm missing something here, but this is simple and fast:
declare @t varchar(5000) = 'RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row." This-divided-by,that|&/multiplied\by(100)give*you,%;<percent>.'
declare @s-2 varchar(5000) = ''
select @s-2 = @s-2 + substring(@t, i, 1)
from dbo.fnTalleyTable(len(isnull(@t, ''))) tt
where substring(@t, i, 1) like '[a-z]'
select @s-2
Don
Not recommended.
Hmmm... Itzik used concatenation of a column of data in his example went haywire. Does anyone have an example of this type of code going crazy?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2014 at 9:59 am
All right then, how about replacing the explicit string concatenation with "for xml path"?
declare @originalText varchar(7500), @includeCharacters varchar(20), @cleanText varchar(7500)
select @originalText = 'RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row." This-divided-by,that|&/multiplied\by(100)give*you,%;<percent>.'
, @includeCharacters = '[a-z]', @cleanText = ''
select @cleanText =
(
select substring(@originalText, i, 1)
from dbo.fnTalleyTable(len(isnull(@originalText, ''))) tt
where substring(@originalText, i, 1) like @includeCharacters
for xml path ('')
)
select @cleanText
Don
Don Simpson
December 3, 2018 at 8:49 pm
Heh... here I go ago... necro'ing the same wicked old post.
@DonlSimpson ... Try the following and see why I prefer not to use XML for this type of thing.
declare @originalText varchar(7500), @includeCharacters varchar(20), @cleanText varchar(7500)
select @originalText = 'RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row." <<&whoopsi&>> This-divided-by,that|&/multiplied\by(100)give*you,%;<percent>.'
, @includeCharacters = '[a-z<>&]', @cleanText = ''
select @cleanText =
(
select substring(@originalText, i, 1)
from dbo.fnTalleyTable(len(isnull(@originalText, ''))) tt
where substring(@originalText, i, 1) like @includeCharacters
for xml path ('')
)
select @cleanText
Of course, it the code were for alpha-numeric only, that problem wouldn't show up.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 3, 2018 at 8:53 pm
Shifting gears a bit,.. Lordy, I miss Dwain. :unsure:
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply