January 23, 2013 at 5:13 am
ChrisM@Work (1/23/2013)
Dwain, that's quite inspired. Who'd have thought of performing a QU in a function?It can be speeded up though:
Thank you you dog you!
Speeding up my approach is cheating isn't it?
Well, guess not. I figured it wouldn't stand.
As I'm often to say, wish I'd've thought of that.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 23, 2013 at 5:25 am
Hold on there one minute. If memory serves me (and I'm working without a net here so bear with me), if I recall all the QU caveats in Jeff Moden's seminal article on it (Solving the Running Total and Ordinal Rank Problems[/url]) one of the caveats mentioned is that you need to use an anchor assignment based on a real column in the table (the way I did).
Your approach would need to be disproved with thorough testing and it would require overlapping replacements that must be ordered to achieve a specific result.
So there! 😛
But I definitely liked the way you eliminated my final DELETE.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 23, 2013 at 8:57 am
Wow...I'm impressed! I don't think the OP realized the impact of his request! The more I think about this the more uses I can see for such a function. For example, I was thinking yesterday how I could use this function as a profanity filter against public user input (reviews, testimonials, etc.). I can read in the values from a stoplist, parse the text to check, then replace any value found that's in the stoplist with a suitable replacement value.
As for the function itself...I knew my first version was going to lose the performance contest with 2 cross joins with DelimitedSplit8K, a cursor loop, and a delete all within a multi-statement TVF. It looks like Chris's NestedReplaceXQU_CM02 is the winner for now.
I realize the testing code from Jeff is involved (I added some perhaps unnecessary complexity to it); Jeff's code was written specifically to test DelimitedSplit8K against competitor functions. But I think the concept of testing behind it gives functions a good, fair test. Unfortunately, a lot of customization is necessary because every function tested has different input requirements. Next time I have an hour or two to spare (:-)) I'll plug in these new functions and post the comparative results.
January 23, 2013 at 5:35 pm
dwain.c (1/23/2013)
Hold on there one minute. If memory serves me (and I'm working without a net here so bear with me), if I recall all the QU caveats in Jeff Moden's seminal article on it (Solving the Running Total and Ordinal Rank Problems[/url]) one of the caveats mentioned is that you need to use an anchor assignment based on a real column in the table (the way I did).Your approach would need to be disproved with thorough testing and it would require overlapping replacements that must be ordered to achieve a specific result.
So there! 😛
But I definitely liked the way you eliminated my final DELETE.
While I am chiding ChrisM about not following the QU rules, I find I am guilty of the same offense.
It's necessary to add:
OPTIONS(MAXDOP 1)
To the QU step to ensure SQL doesn't parallelize the query.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 28, 2013 at 8:16 am
Steven Willis (1/23/2013)
Wow...I'm impressed! I don't think the OP realized the impact of his request! The more I think about this the more uses I can see for such a function. For example, I was thinking yesterday how I could use this function as a profanity filter against public user input (reviews, testimonials, etc.). I can read in the values from a stoplist, parse the text to check, then replace any value found that's in the stoplist with a suitable replacement value.
Good point!
I've been playing around with this function and it performs really well!
select Companyname, a.*
from dbo.sourcetable
cross apply [NestedReplaceXQU_CM02] (Companyname, 'THE', '', ' ') as a
WHERE Companyname like '% THE %'
I don't think it's possible to do nested cross applies so how do I loop through a list of words to remove or replace?
Any ideas anyone?
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
January 28, 2013 at 8:21 am
Is it a "fixed" list? I guess what I mean by that is - would it be useful to you to have a function which contained the list definition, rather than passing it in?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 28, 2013 at 8:26 am
I have a one table which contains various noise words which I would like to remove from the company name field before I generate my phoentic code. The list can grow as more words are added in the future.
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
January 28, 2013 at 8:35 am
Abu Dina (1/28/2013)
I have a one table which contains various noise words which I would like to remove from the company name field before I generate my phoentic code. The list can grow as more words are added in the future.
IIRC this function is the quickest with a fixed table;
ALTER FUNCTION [dbo].[IF_MultipleReplace01]
(
@strTarget VARCHAR(8000)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
(
WITH
MappingTable (Seq, Oldval, Newval) AS (
SELECT 1, 'Lorem', '100' UNION ALL
SELECT 2, 'ipsum', '200' UNION ALL
SELECT 3, 'dolor', '300' UNION ALL
SELECT 4, 'diam', '400' UNION ALL
SELECT 5, 'tincidunt', '500' UNION ALL
SELECT 6, 'laoreet', '600' UNION ALL
SELECT 7, 'aliquam', '700' UNION ALL
SELECT 8, 'not in string', '800'),
Calculator AS (
SELECT
m.Seq,
OutputString = REPLACE(@strTarget COLLATE LATIN1_GENERAL_BIN, Oldval, Newval)
FROM MappingTable m
WHERE Seq = 1
UNION ALL
SELECT
m.Seq,
OutputString = REPLACE(c.OutputString, Oldval, Newval)
FROM Calculator c
INNER JOIN MappingTable m ON m.Seq = c.Seq+1
)
SELECT TOP 1 OutputString
FROM Calculator
ORDER BY Seq DESC
- however, I suspect that one of the posters on this thread may be intending to publish a few more performance comparisons. It wouldn't do you any harm to use the procedure now, but be prepared to accept some changes should this be necessary later.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 28, 2013 at 9:51 am
Thanks Chris.
I'm probably being stupid, but I've tried to modify the function so it works off a permanent table which looks like this:
And the function is changed to:
ALTER FUNCTION [dbo].[NoiseRemover] (@strTarget VARCHAR(8000))
RETURNS TABLE AS
RETURN
(
WITH Calculator AS (
SELECT
m.NoiseID,
OutputString = REPLACE(@strTarget COLLATE LATIN1_GENERAL_BIN, Name, Equivalent)
FROM CompanyNameNoiseWords m
WHERE NoiseID = 1
UNION ALL
SELECT
m.NoiseID,
OutputString = REPLACE(c.OutputString, Name, Equivalent)
FROM Calculator c
INNER JOIN CompanyNameNoiseWords m ON m.NoiseID = c.NoiseID+1
)
SELECT TOP 1 OutputString
FROM Calculator
ORDER BY NoiseID DESC
)
But this is removing words like 'TO' from TOWER and 'EL from HOTEL etc...
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
January 28, 2013 at 11:25 am
Abu Dina (1/28/2013)
Thanks Chris.But this is removing words like 'TO' from TOWER and 'EL from HOTEL etc...
That makes it tricky ... as you need to start looking for word bounds. This would include string start and end, spaces, commas and periods etc. This is probably best done using a regular expression. Using REPLACE to do this could be a bit messy. You would need to add a space to either end of the original string, then have a list of additional replaces in your table. Lastly left and right trim the string.
For example.
" TO "
" TO."
".TO "
" TO,"
",TO "
I suppose this could all be built into the function though
ALTER FUNCTION [dbo].[NoiseRemover] (@strTarget VARCHAR(8000))
RETURNS TABLE AS
RETURN
(
WITH Calculator AS (
SELECT
m.NoiseID,
OutputString = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
' ' + @strTarget + ' ' COLLATE LATIN1_GENERAL_BIN
, ' ' + Name + ' ', Equivalent)
, ' ' + Name + '.', Equivalent)
, '.' + Name + ' ', Equivalent)
, ' ' + Name + ',', Equivalent)
, ',' + Name + ' ', Equivalent)
FROM CompanyNameNoiseWords m
WHERE NoiseID = 1
UNION ALL
SELECT
m.NoiseID,
OutputString = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
c.OutputString
, ' ' + Name + ' ', Equivalent)
, ' ' + Name + '.', Equivalent)
, '.' + Name + ' ', Equivalent)
, ' ' + Name + ',', Equivalent)
, ',' + Name + ' ', Equivalent)
FROM Calculator c
INNER JOIN CompanyNameNoiseWords m ON m.NoiseID = c.NoiseID+1
)
SELECT TOP 1 LTRIM(RTRIM(OutputString))
FROM Calculator
ORDER BY NoiseID DESC
)
Micky
January 28, 2013 at 6:46 pm
ChrisM@Work (1/28/2013)
- however, I suspect that one of the posters on this thread may be intending to publish a few more performance comparisons. It wouldn't do you any harm to use the procedure now, but be prepared to accept some changes should this be necessary later.
Who would that be?
I think if you need to replace "noise words" within a string it might be better to use a pattern splitter, such as the one in my 4th signature link (PatternSplitCM).
DECLARE @Noise TABLE
(ID INT IDENTITY PRIMARY KEY
,NoiseWord VARCHAR(100)
,Replacement VARCHAR(100))
INSERT INTO @Noise
SELECT 'cat', '000' UNION ALL SELECT 'hat', '001'
UNION ALL SELECT 'doctor', '002' UNION ALL SELECT 'seuss', '003'
SELECT NewString=(
SELECT CASE WHEN Replacement IS NOT NULL THEN Replacement ELSE Item + '' END
FROM PatternSplitCM('The cat in the hat, by doctor seuss, is a wonderful tale.', '[a-z]') a
LEFT JOIN @Noise b ON b.NoiseWord = a.Item
ORDER BY ItemNumber
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')
This way you don't need to worry about punctuation and it hits all replacements needed in a single string.
And special thanks again to ChrisM@Work for his collaboration in developing PatternSplitCM!
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 11 posts - 31 through 40 (of 40 total)
You must be logged in to reply to this topic. Login to reply