April 16, 2015 at 7:07 pm
So I have a bit of a puzzle. 🙂
I have an Excel spreadsheet that I got from somebody else. Being Excel, it has a bunch of autocorrect characters - curly quotes, long dashes, and the like, that I needed converted to their more plain counterparts (the whole purpose is for displaying on a webpage, with a limited font). Got it all done, just using a series of UPDATE statements... but then I wondered what happens when I use a single UPDATE statement and a "bad strings" table. So I tried it out (with test data, of course):
(Note: dbo.Table$ is the imported spreadsheet. It has a column called, "Question")
CREATE TABLE [dbo].[ReplaceCharStrings](
[BadString] [char](1) NOT NULL,
[ReplaceString] [char](1) NULL,
CONSTRAINT [PK_ReplaceCharStrings_BadString] PRIMARY KEY CLUSTERED
(
[BadString] ASC
)
GO
INSERT INTO [dbo].[ReplaceCharStrings] (BadString, ReplaceString)
VALUES (CHAR(147),'"'), (CHAR(148),'"'), (CHAR(151),'-'), (CHAR(150),'-'), (CHAR(146),'''')
GO
UPDATE t SET Question = REPLACE(t.Question, c.BadString, c.ReplaceString)
FROM dbo.Table$ t CROSS JOIN dbo.ReplaceCharStrings c
WHERE CHARINDEX(c.BadString,t.Question)>0
Results: mixed. Whenever there's a row where there's a long dash, OR a curly apostrophe, OR a short dash, etc., then it replaces, no problem.
However:
Define “insert one thing here,” “insert another thing here,” and other [things].
...became:
Define "insert one thing here,” "insert another thing here,” and other [things].
It replaced the left quotes (both of them), but not the right quotes.
Then, in a different row:
Describe “insert something to describe here”.
...became:
Describe “insert something to describe here".
Here, it did the opposite (replaced the right quote but not the left).
:blink: Why? And how to fix?
April 16, 2015 at 7:43 pm
I really like the way you think. Unfortunately UPDATE does not work the way you are trying to use it.
Your DDL is bad and some sample data for questions would be helpful. I am thinking of a couple solutions for you. In the meantime - here's some better DDL so other people can take a crack at this...
USE tempdb
GO
IF OBJECT_ID('tempdb.dbo.ReplaceCharStrings') IS NOT NULL DROP TABLE dbo.ReplaceCharStrings;
GO
CREATE TABLE dbo.ReplaceCharStrings
(
BadString char(1) NOT NULL,
ReplaceString char(1) NOT NULL, -- this should be NOT NULL, we're replace one thing with something else right?
CONSTRAINT PK_ReplaceCharStrings_BadString PRIMARY KEY CLUSTERED (BadString ASC)
);
GO
IF OBJECT_ID('tempdb.dbo.questions') IS NOT NULL DROP TABLE dbo.questions;
GO
CREATE TABLE dbo.questions
(
q_id int identity PRIMARY KEY,
question varchar(100) NOT NULL
);
GO
INSERT INTO dbo.ReplaceCharStrings (BadString, ReplaceString)
VALUES (CHAR(147),'"'), (CHAR(148),'"'), (CHAR(151),'-'), (CHAR(150),'-'), (CHAR(146),'''');
GO
INSERT dbo.questions (question)
VALUES
('Im okay'),
('One bad character:'+CHAR(148)),
(CHAR(151)+'flanked by bad things'+CHAR(150)),
('...Poser apostophes:'+CHAR(146)+CHAR(146));
-- SELECT statement to understand the problem with the update
SELECT *, REPLACE(t.Question, c.BadString, c.ReplaceString)
FROM dbo.questions t
CROSS JOIN dbo.ReplaceCharStrings c
WHERE CHARINDEX(c.BadString,t.Question)>0;
UPDATE t SET Question = REPLACE(t.Question, c.BadString, c.ReplaceString)
FROM dbo.questions t CROSS JOIN dbo.ReplaceCharStrings c
WHERE CHARINDEX(c.BadString,t.Question)>0;
Update: fixed sample code
-- Itzik Ben-Gan 2001
April 16, 2015 at 8:03 pm
Alan.B - thanks for the DDL correction. Sorry about that; I totally forgot to indicate NULL/NOT NULL when setting up the table originally, so SQL Server assumed NULL for the non-PK field. My bad. 😀
Table$ is just what the Import/Export wizard (I'm running Express) insisted on calling the imported spreadsheet, btw! :rolleyes:
I think I may have come up with an answer to my own question, and I want to run this by everybody and see if there's a better option. 🙂
In the code above, maybe wrap the UPDATE statement in a loop and break the loop if @@ROWCOUNT=0?
Like:
DECLARE @rowct int
WHILE (1=1)
BEGIN
UPDATE t SET Question = REPLACE(t.Question, c.BadString, c.ReplaceString)
FROM dbo.questions t CROSS JOIN dbo.ReplaceCharStrings c
WHERE CHARINDEX(c.BadString,t.Question)>0;
SELECT @rowct = @@ROWCOUNT;
IF (@rowct=0)
BEGIN
BREAK
END
END
Thoughts?
April 16, 2015 at 9:03 pm
Okay, I can't come up with a pure set-based way to handle this but I have a solution that I think is pretty slick using a recursive CTE wrapped in an inline table valued function.
Here we have a inline table-valued translate function:
ALTER FUNCTION dbo.itvfTranslate
(@string varchar(8000),
@replace varchar(100),
@with varchar(100)
)
/*
-- Use
DECLARE @string varchar(100) = '##!!xyxxyyzz!!##';
SELECT [@string]= @string, newstring = x
FROM dbo.itvfTranslate(@string,'xyz#!','abc')
*/
RETURNS table AS
RETURN
(
WITH a AS
(
SELECT n=1, newstring=replace(@string,substring(@replace,1,1),substring(@with,1,1))
UNION ALL
SELECT n+1, newstring=replace(newstring,substring(@replace,n+1,1),substring(@with,n+1,1))
FROM a
WHERE n < len(@replace)
)
SELECT newstring
FROM a
WHERE n = len(@replace)
);
GO
MS SQL does not have translate so I included a link to Oracle's explanation of what Translate does.
Okay, back to your code. Using the example DDL that I posted above you could use my translate function like so:
WITH q AS
(
SELECT * FROM dbo.questions
)
UPDATE q
SET question = newstring
FROM q
CROSS APPLY dbo.itvfTranslate
(q.question,
(SELECT CAST((SELECT BadString+'' FROM dbo.ReplaceCharStrings FOR XML PATH('')) AS varchar(100))),
(SELECT CAST((SELECT ReplaceString+'' FROM dbo.ReplaceCharStrings FOR XML PATH('')) AS varchar(100)))
)
-- Itzik Ben-Gan 2001
April 16, 2015 at 9:16 pm
Katerine459 (4/16/2015)
Alan.B - thanks for the DDL correction. Sorry about that; I totally forgot to indicate NULL/NOT NULL when setting up the table originally, so SQL Server assumed NULL for the non-PK field. My bad. 😀Table$ is just what the Import/Export wizard (I'm running Express) insisted on calling the imported spreadsheet, btw! :rolleyes:
I think I may have come up with an answer to my own question, and I want to run this by everybody and see if there's a better option. 🙂
In the code above, maybe wrap the UPDATE statement in a loop and break the loop if @@ROWCOUNT=0?
Like:
DECLARE @rowct int
WHILE (1=1)
BEGIN
UPDATE t SET Question = REPLACE(t.Question, c.BadString, c.ReplaceString)
FROM dbo.questions t CROSS JOIN dbo.ReplaceCharStrings c
WHERE CHARINDEX(c.BadString,t.Question)>0;
SELECT @rowct = @@ROWCOUNT;
IF (@rowct=0)
BEGIN
BREAK
END
END
Thoughts?
Our comments crossed paths.. I did not see this when I posted my solution...
What you posted is works. What I posted is actually very similar (a recursive CTE is essentially a loop when you learn how they work). The ideal solution would be one that does not use a loop or a recursive CTE as both generally don't perform very well. That said, what I posted was the best I could do at this late Chicago hour.
I know a guy from Thailand who I bet could come up with a slicker way to handle this - hopefully he chimes in :).
-- Itzik Ben-Gan 2001
April 16, 2015 at 9:52 pm
Alan.B called me out through a PM to this thread, so you can thank him for what follows assuming it works for you.
If I understand correctly, this is an operation that I call "nested REPLACE." I played around some time ago and came up with several SQL FUNCTION approaches to this problem, one of which I'll provide you below.
CREATE FUNCTION [dbo].[NestedReplace]
-- Author: Dwain.C
-- Date: 24-Jan-2013
-- Remarks: NestedReplace is designed to repeatedly apply the REPLACE built-in function using
-- two delimited lists of target strings (those to be replaced in @Target) and source
-- strings (the final values to be replaced).
-- Note: Since strings are replaced in the order they are supplied, beware of dependencies
-- during successive replacements. The direction the replacement is applied may be changed
-- using the fifth parameter (@Direction).
--
-- For example, try this:
-- SELECT Left2Right=a.Item, Right2Left=b.Item
-- FROM NestedReplace('abcdd', 'abc,ddd', 'd,ee', NULL, NULL) a
-- CROSS APPLY NestedReplace('abcdd', 'abc,ddd', 'd,ee', NULL, 'R') b
-- Left2Right Right2Left
-- First replace results in: ddd abcdd
-- Second replace results in: ee ddd
(
-- On calling the FUNCTION this is the source string. Upon completion it holds the target (replaces applied).
@Target VARCHAR(8000),
-- Delimited list of strings to be replaced
@ToReplaceList VARCHAR(8000),
-- Delimited list of strings that replace the list in @ToReplaceList
@ReplaceWithList VARCHAR(8000),
-- The delmiter character for both lists: defaults to comma if NULL specified
@Delim CHAR(1),
-- Direction that the replacements are applied: LEFT (to Right) or RIGHT (to Left)
@Direction VARCHAR(5)
)
RETURNS @Results TABLE (Item VARCHAR(8000))
WITH SCHEMABINDING
AS
BEGIN
DECLARE @SplitStrings TABLE
-- CLUSTERED INDEX to drive the QU
(ID INT PRIMARY KEY CLUSTERED, Item VARCHAR(8000));
-- Defaults for input parameters 4 and 5
SELECT @Delim = LEFT(ISNULL(@Delim, ','), 1), @Direction = LEFT(ISNULL(@Direction, 'L'),1);
-- Initialize the table from the strings to be replaced (@ToReplaceList)
INSERT INTO @SplitStrings
SELECT ItemNumber * CASE @Direction WHEN 'R' THEN -1 ELSE 1 END, Item
FROM dbo.DelimitedSplit8K(@ToReplaceList, @Delim);
-- QU the strings in the table with the nested replacements, while
-- retaining the final result in @Target
UPDATE a
SET @Target = REPLACE(@Target COLLATE Latin1_General_BIN, a.Item, b.Item)
,Item = @Target
FROM @SplitStrings a
-- This section splits the list of items that will replace the original strings
INNER JOIN (
SELECT ItemNumber=ItemNumber * CASE @Direction WHEN 'R' THEN -1 ELSE 1 END, Item
FROM dbo.DelimitedSplit8K(@ReplaceWithList, @Delim)) b
ON a.ID = b.ItemNumber
OPTION (MAXDOP 1);
-- Put the resulting string into the @Results table
INSERT INTO @Results SELECT @Target;
RETURN;
END
So then you'd use it something like this:
USE tempdb;
GO
IF OBJECT_ID('tempdb.dbo.ReplaceCharStrings') IS NOT NULL DROP TABLE dbo.ReplaceCharStrings;
GO
CREATE TABLE dbo.ReplaceCharStrings
(
BadString char(1) NOT NULL,
ReplaceString char(1) NOT NULL, -- this should be NOT NULL, we're replace one thing with something else right?
CONSTRAINT PK_ReplaceCharStrings_BadString PRIMARY KEY CLUSTERED (BadString ASC)
);
GO
IF OBJECT_ID('tempdb.dbo.questions') IS NOT NULL DROP TABLE dbo.questions;
GO
CREATE TABLE dbo.questions
(
q_id int identity PRIMARY KEY,
question varchar(100) NOT NULL
);
GO
INSERT INTO dbo.ReplaceCharStrings (BadString, ReplaceString)
VALUES (CHAR(147),'"'), (CHAR(148),'"'), (CHAR(151),'-'), (CHAR(150),'-'), (CHAR(146),'''');
GO
INSERT dbo.questions (question)
VALUES
('Im okay'),
('One bad character:'+CHAR(148)),
(CHAR(151)+'flanked by bad things'+CHAR(150)),
('...Poser apostophes:'+CHAR(146)+CHAR(146));
DECLARE @BadStrings VARCHAR(8000), @ReplaceStrings VARCHAR(8000);
WITH ReplaceCRs AS
(
SELECT rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), BadString, ReplaceString
FROM dbo.ReplaceCharStrings
)
SELECT @BadStrings=STUFF((
(
SELECT ',' + BadString
FROM ReplaceCRs
ORDER BY rn
FOR XML PATH(''), TYPE)
).value('.', 'VARCHAR(8000)'), 1, 1, '')
,@ReplaceStrings=STUFF((
(
SELECT ',' + ReplaceString
FROM ReplaceCRs
ORDER BY rn
FOR XML PATH(''), TYPE)
).value('.', 'VARCHAR(8000)'), 1, 1, '')
FROM ReplaceCRs;
-- Display the questions with the "poser" characters (love that term)
SELECT *
FROM dbo.Questions;
UPDATE a
SET Question = Item
FROM dbo.Questions a
CROSS APPLY [DB-WHERE-YOU-CREATE-MY-FUNCTION].dbo.NestedReplace(question, @BadStrings, @ReplaceStrings, ',', 'LEFT');
-- Display the questions after cleansing
SELECT *
FROM dbo.Questions;
GO
IF OBJECT_ID('tempdb.dbo.ReplaceCharStrings') IS NOT NULL DROP TABLE dbo.ReplaceCharStrings;
If you use it, please don't place a comment like this in your code:
http://www.sqlservercentral.com/Forums/FindPost1675182.aspx
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 16, 2015 at 9:54 pm
Oh yes, I forgot to mention the reason why you can't use that CROSS JOIN to repeatedly update is A Hazard of Using the SQL Update Statement [/url]
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 17, 2015 at 3:21 am
Quick and somewhat naïve ( compared to Dwain's ) solution
😎
Note: added 'a'-->'@' for demonstration purposes 😉
USE tempdb;
GO
SET NOCOUNT ON;
/* sample data */
IF OBJECT_ID(N'dbo.ReplaceCharStrings') IS NOT NULL DROP TABLE dbo.ReplaceCharStrings;
CREATE TABLE dbo.ReplaceCharStrings
(
BadString char(1) NOT NULL,
ReplaceString char(1) NULL,
CONSTRAINT PK_ReplaceCharStrings_BadString PRIMARY KEY CLUSTERED (BadString ASC )
);
INSERT INTO dbo.ReplaceCharStrings (BadString, ReplaceString)
VALUES (CHAR(147),'"')
,(CHAR(148),'"')
,(CHAR(151),'-')
,(CHAR(150),'-')
,('a','@') -- ;^)
,(CHAR(146),CHAR(39))
;
IF OBJECT_ID('tempdb.dbo.questions') IS NOT NULL DROP TABLE dbo.questions;
CREATE TABLE dbo.questions
(
q_id int identity PRIMARY KEY,
question varchar(100) NOT NULL
);
INSERT dbo.questions (question)
VALUES
('Im okay'),
('One bad character:'+CHAR(148)),
(CHAR(151)+'flanked by bad things'+CHAR(150)),
('...Poser apostophes:'+CHAR(146)+CHAR(146));
/*
Replacing bad characters
Configuration is in the dbo.ReplaceCharStrings table
*/
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
SELECT
QST.q_id
,(
SELECT
ISNULL(RCS.ReplaceString,SUBSTRING(DQ.question,NM.N,1))
FROM dbo.questions DQ
CROSS APPLY
(
SELECT TOP(LEN(DQ.question)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2 /* Add more "T" for longer strings*/
) AS NM
LEFT OUTER JOIN dbo.ReplaceCharStrings RCS
ON ASCII(RCS.BadString) = ASCII(SUBSTRING(DQ.question,NM.N,1))
WHERE QST.q_id = DQ.q_id
FOR XML PATH(''),TYPE).value('.[1]','VARCHAR(100)') AS OUTPUT_STRING
FROM dbo.questions QST;
GO
Results
q_id OUTPUT_STRING
----------- -------------------------------
1 Im ok@y
2 One b@d ch@r@cter:"
3 -fl@nked by b@d things-
4 ...Poser @postophes:''
The same logic as a inline table value function
/* Inline table value function */
IF OBJECT_ID(N'dbo.ITVF_CONFIGURABLE_REPLACE') IS NOT NULL DROP FUNCTION dbo.ITVF_CONFIGURABLE_REPLACE;
GO
CREATE FUNCTION dbo.ITVF_CONFIGURABLE_REPLACE
(
@INPUT_STRING VARCHAR(8000)
)
RETURNS TABLE
WITH SCHEMABINDING
/*
dbo.ITVF_CONFIGURABLE_REPLACE
2015-04-17 Eirikur Eiriksson Initial Coding
Configurable replace function, replaces characters which are defined as bad
with their good counterparts.
USAGE:
DECLARE @INPUT_STRING VARCHAR(8000) = '...Poser apostophes:'+CHAR(146)+CHAR(146);
SELECT
X.OUTPUT_STRING
dbo.ITVF_CONFIGURABLE_REPLACE(@INPUT_STRING) AS X
*/
/* Seed for the inline Tally table with 20 = cube root of 8000 */
RETURN
WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
, NUMS(N) AS (SELECT TOP(LEN(@INPUT_STRING)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)
SELECT
(
SELECT
ISNULL(RCS.ReplaceString,SUBSTRING(@INPUT_STRING,NM.N,1))
FROM NUMS NM
LEFT OUTER JOIN dbo.ReplaceCharStrings RCS
ON ASCII(RCS.BadString) = ASCII(SUBSTRING(@INPUT_STRING,NM.N,1))
FOR XML PATH(''),TYPE).value('.[1]','VARCHAR(100)') AS OUTPUT_STRING
;
GO
Sample usage
/* Select side-by-side */
SELECT
Q.q_id
,Q.question
,CLEAN.OUTPUT_STRING
FROM dbo.questions Q
OUTER APPLY dbo.ITVF_CONFIGURABLE_REPLACE(Q.question) AS CLEAN;
GO
/* Update and inspect the results */
UPDATE Q
SET Q.question = CLEAN.OUTPUT_STRING
FROM dbo.questions Q
OUTER APPLY dbo.ITVF_CONFIGURABLE_REPLACE(Q.question) AS CLEAN;
GO
SELECT
q.q_id
,q.question
FROM dbo.questions Q;
April 17, 2015 at 6:51 am
Naive, eh Eirikur? Hehe. I like that.
As I said it's been awhile since I was playing around with functional encapsulation of this process, and because of the elapsed time that has passed I'd forgotten why I ultimately abandoned it. The fastest (highest performance) approach is going to be to use dynamic SQL to create and execute a code string that will look something like this (if you will pardon the fact that I'm about to fall asleep and haven't tested it).
UPDATE dbo.Questions
-- As many REPLACEs as there are rows in Alan.B's strings table
SET Question = REPLACE(REPLACE(REPLACE(Question COLLATE Latin1_General_BIN, bs1, gs1), bs2, gs2), bs3, gs3);
Where bs1, bs2, bs3 are the bad strings and gs1, gs2, gs3 are the good ones.
Edit: Note that it will make building the dynamic SQL easier if you put the CHAR offsets (integers) into the character substitution table instead of the strings themselves, so you don't have to worry about the correct number of quotation marks.
You can also do it with a recursive CTE. This actually doesn't perform too badly on small sets of questions and character substitutions, but it generates a whole lot of rows you end up throwing away if you have many subsitutions.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 19, 2015 at 2:50 pm
dwain.c (4/17/2015)
Naive, eh Eirikur? Hehe. I like that.
It is naïve in the sense that it is totally Brute force, kind of like ripping out a patience heart to check if it's OK:-D
😎
As I said it's been awhile since I was playing around with functional encapsulation of this process, and because of the elapsed time that has passed I'd forgotten why I ultimately abandoned it. The fastest (highest performance) approach is going to be to use dynamic SQL to create and execute a code string that will look something like this (if you will pardon the fact that I'm about to fall asleep and haven't tested it)
Used few methods for this, action table, dynamic sql, update procedures etc., in fact for the last one, normally one only has to deal with (26^2) - 26 cases so why not just create all of those:-D
April 19, 2015 at 9:05 pm
With only 5 known characters to replace so far and very few that could remain, I recommend keeping this as simple and fast as possible. You could easily turn this into an iTVF as well.
UPDATE t
SET Question = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
t.Question
,CHAR(147),'"'),CHAR(148),'"'),CHAR(151),'-'),CHAR(150),'-'),CHAR(146),'''')
FROM dbo.Table$ t
;
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply