May 11, 2012 at 1:25 am
Hi,
I am having a table with a column which is having data like # followed by some number for Eg. #123.
I want to strip out the # and the number from the column using TSQL without using clr regex replace.
Any help here would be highly appreciated.
Script for generating test data is given below.
DECLARE @BadData TABLE(DataID INT, Data VARCHAR(50), ExpectedData VARCHAR(50))
INSERT INTO @BadData(DataID,Data,ExpectedData)
SELECT 1,'#123 data','data' UNION
SELECT 2,'#456data','data' UNION
SELECT 3,'data #789','data' UNION
SELECT 4,'data#012','data' UNION
SELECT 5,'data#345 some more data','data some more data' UNION
SELECT 6,'data #678some more data','data some more data' UNION
SELECT 7,'data #901 some more data','data some more data' UNION
SELECT 8,'data#234some more data','datasome more data'
SELECT * FROM @BadData
The column 'Data' has bad values and the column 'ExpectedData' has the expected values.
I will soon post the queries i have tried so far but they didn't work, hence posting it to the forum.
May 11, 2012 at 1:45 am
May 11, 2012 at 2:49 am
Thanks for the link, but i was looking for a non-clr solution.
May 11, 2012 at 3:22 am
I have tried the below query but it does not cover all the cases, especially when the pattern is like #[number][string]
The output of query is attached to this post.
SELECT
DataID
,Data
,LEN(Data) AS [Length of Data]
,CHARINDEX('#',Data) AS [Position of #]
,CHARINDEX(' ',Data) AS [Position of space]
,CHARINDEX(' ',Data,CHARINDEX('#',Data)) AS [Position of space after #]
,CASE
WHEN (CHARINDEX(' ',Data,CHARINDEX('#',Data)) > 0)
THEN SUBSTRING(Data,CHARINDEX('#',Data),CHARINDEX(' ',Data,CHARINDEX('#',Data))-CHARINDEX('#',Data))
ELSE SUBSTRING(Data,CHARINDEX('#',Data),LEN(Data)-CHARINDEX('#',Data))
END AS [The word containing #]
,CASE
WHEN (CHARINDEX(' ',Data,CHARINDEX('#',Data)) > 0)
THEN LTRIM(RTRIM(REPLACE(Data,SUBSTRING(Data,CHARINDEX('#',Data),CHARINDEX(' ',Data,CHARINDEX('#',Data))-CHARINDEX('#',Data)),'')))
ELSE LTRIM(RTRIM(REPLACE(Data,SUBSTRING(Data,CHARINDEX('#',Data),LEN(Data)-CHARINDEX('#',Data)),'')))
END AS [Output]
,ExpectedData
FROM
@BadData
May 11, 2012 at 8:34 am
How about nested replace. You said all you want is to remove "#" and any numbers.
SELECT rtrim(ltrim(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(REPLACE(Data, '#', ''), '1', ''), '2', ''), '3', ''), '4', ''), '5', ''), '6', ''), '7', ''), '8', ''), '9', ''), '0', '')))
, ExpectedData
FROM @BadData
The nested replace approach is easy to read and understand, it also has the advantage that is super fast. Then just trim left and right. This at least works on your sample data.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 11, 2012 at 8:48 am
Have to ask, is it always a # and three digits or is that just the way you configured your sample data?
May 11, 2012 at 9:26 am
Here is a shot at it:
DECLARE @BadData TABLE(DataID INT, Data VARCHAR(50), ExpectedData VARCHAR(50));
INSERT INTO @BadData(DataID,Data,ExpectedData)
SELECT 1,'#123 data','data' UNION
SELECT 2,'#456data','data' UNION
SELECT 3,'data #789','data' UNION
SELECT 4,'data#012','data' UNION
SELECT 5,'data#345 some more data','data some more data' UNION
SELECT 6,'data #678some more data','data some more data' UNION
SELECT 7,'data #901 some more data','data some more data' UNION
SELECT 8,'data#234some more data','datasome more data' UNION
SELECT 9,'data#2some more data','datasome more data';
--SELECT * FROM @BadData;
SELECT
SUBSTRING(Data,1,PATINDEX('%#[0-9]%', Data) - 1) +
REVERSE(SUBSTRING(REVERSE(SUBSTRING(Data,PATINDEX('%#[0-9]%', Data) + 1,DATALENGTH(Data) - PATINDEX('%#[0-9]%', Data))),1,PATINDEX('%[0-9]%',REVERSE(SUBSTRING(Data,PATINDEX('%#[0-9]%', Data) + 1,DATALENGTH(Data) - PATINDEX('%#[0-9]%', Data)))) - 1)),
Data,ExpectedData
FROM
@BadData;
May 11, 2012 at 9:28 am
Lynn, why not the replace I posted above? It is super fast because it doesn't have to do all sorts of string manipulation.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 11, 2012 at 9:30 am
Shaun-884394 (5/11/2012)
Thanks for the link, but i was looking for a non-clr solution.
Any particular reason why not? Let me guess...political reasons in your organization?
RegEx is one of the primary use-cases for using SQLCLR in the database and there are tons of good examples of UDFs and Procedures already written. RegEx can be used in assemblies marked as SAFE as well meaning security is far less of a concern when using SQLCLR this way.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 11, 2012 at 9:34 am
Sean Lange (5/11/2012)
Lynn, why not the replace I posted above? It is super fast because it doesn't have to do all sorts of string manipulation.
Didn't see it. Looks good actually as long as there aren't other numbers in the string that need to be kept. Spec did say # followed by numbers.
May 11, 2012 at 9:38 am
Lynn Pettis (5/11/2012)
Sean Lange (5/11/2012)
Lynn, why not the replace I posted above? It is super fast because it doesn't have to do all sorts of string manipulation.Didn't see it. Looks good actually as long as there aren't other numbers in the string that need to be kept. Spec did say # followed by numbers.
Good point about the spec, guess it worked on the sample data so I figured it was good enough. 😉
If there are other numbers you need to keep and CLR is off the table then go with Lynn's approach. Of course CLR is still the easiest way to handle this.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 11, 2012 at 9:41 am
Sean Lange (5/11/2012)
Lynn Pettis (5/11/2012)
Sean Lange (5/11/2012)
Lynn, why not the replace I posted above? It is super fast because it doesn't have to do all sorts of string manipulation.Didn't see it. Looks good actually as long as there aren't other numbers in the string that need to be kept. Spec did say # followed by numbers.
Good point about the spec, guess it worked on the sample data so I figured it was good enough. 😉
If there are other numbers you need to keep and CLR is off the table then go with Lynn's approach. Of course CLR is still the easiest way to handle this.
Actually, mine will fail if there are extra numbers. I just thought of that.
May 11, 2012 at 10:44 am
This should work:
DECLARE @BadData TABLE(DataID INT, Data VARCHAR(50), ExpectedData VARCHAR(50));
INSERT INTO @BadData(DataID,Data,ExpectedData)
SELECT 1,'#123 data','data' UNION
SELECT 2,'#456data','data' UNION
SELECT 3,'data #789','data' UNION
SELECT 4,'data#012','data' UNION
SELECT 5,'data#345 some more data','data some more data' UNION
SELECT 6,'data #678some more data','data some more data' UNION
SELECT 7,'data #901 some more data','data some more data' UNION
SELECT 8,'data#234some more data','datasome more data' UNION
SELECT 9,'data#2some more data','datasome more data' UNION
SELECT 10, 'data#2 4 some more data','data 4 some more data';
--SELECT * FROM @BadData;
SELECT
LEFT(Data, PATINDEX('%#[0-9]%', Data) - 1)
+ RIGHT(Data,CASE WHEN PATINDEX('%[^0-9]%',SUBSTRING(Data, PATINDEX('%#[0-9]%', Data) + 1, DATALENGTH(Data))) = 0 THEN 0 ELSE DATALENGTH(Data) - (PATINDEX('%#[0-9]%', Data) + PATINDEX('%[^0-9]%',SUBSTRING(Data, PATINDEX('%#[0-9]%', Data) + 1, DATALENGTH(Data)))) + 1 END)
,Data,ExpectedData
FROM
@BadData;
December 7, 2016 at 10:54 am
Hi Lynn,
What if the source data contained the same pattern that is repeated many times? Example:
SELECT 1,'001N data 002N003N004N','data' UNION
SELECT 2,'005N005Ndata007N','data' UNION
SELECT 3,'data 008N009N010N','data ' UNION
SELECT 4,'data011Nsome012Nmore013Ndata014N','datasomemoredata' UNION
SELECT 5,'data015N some 016N017Nmore data','data some more data' UNION
SELECT 6,'data 050Nsome more 051N052Ndata','data some more data' UNION
SELECT 7,'data 090N some more data091N092N','data some more data' UNION
SELECT 8,'data093Nsome more 094N095Ndata','datasome more data' UNION
SELECT 9,'data096N some more097N098N data','data some more data' UNION
SELECT 10, 'data099N 4 some100N 5more 101Ndata','data 4 some 5more data';
Note that in my example, the string I want to remove is always 3 characters long, datatype Text, in the format of [0-9][0-9][0-9]N, ie. 001N, 002N, ... 999N, which repeats up to 999 times in each string, and no numbers repeat, they just increment upwards by 1. In addition, eliminating all but 1 space in between all words would also be helpful too. Thanks!
EDIT: I've solved the problem myself by creating a function as follows:
CREATE FUNCTION [dbo].[fnCleanN]
(
@BadString nvarchar(MAX)
)
RETURNS nvarchar(MAX)
AS
BEGIN
DECLARE @npos INTEGER
SELECT @npos = PATINDEX('%[0-9][0-9][0-9]N%', @BadString)
WHILE @npos > 0
BEGIN
SELECT @BadString = STUFF(@BadString, @npos, 4, '')
SELECT @npos = PATINDEX('%[0-9][0-9][0-9]N%', @BadString)
END
RETURN @BadString
END
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply