June 2, 2017 at 10:38 am
hi ,
Here is the function i found in google.
DROP Function [dbo].[RemoveNonAlphaCharacters]
GO
Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(4000))
Returns VarChar(1000)
AS
Begin
Declare @KeepValues as varchar(100) = '%[^a-zA-Z0-9]%'
While PatIndex(@KeepValues, @Temp) > 0
Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')
Return @Temp
End
Declare @string nvarchar(200)
SET @string= N'資料來源聯博 No payment ashdk'
SELECT [dbo].[RemoveNonAlphaCharacters](@string)
Result: Nopaymentashdk
But wanted the result : No payment ashdk
From above function how can i get the expected result . The function is replacing the spaces too .
June 2, 2017 at 10:50 am
The problem is that you're only leaving letters and numbers instead of all non-unicode characters.
The following function uses several techniques such as a tally table,and using FOR XML to concatenate characters. It's also being called in a different way to improve performance.
CREATE FUNCTION [dbo].[RemoveNonUnicodeChars]
(
@string nvarchar(4000)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
WITH
E(n) AS(
SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
),
E2(n) AS(
SELECT a.n FROM E a, E b
),
E4(n) AS(
SELECT a.n FROM E2 a, E2 b
),
cteTally(n) AS(
SELECT TOP(LEN(@string))ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
FROM E4
)
SELECT STUFF(( SELECT SUBSTRING( @string, n, 1)
FROM cteTally
WHERE SUBSTRING( @string, n, 1) = CAST(SUBSTRING( @string, n, 1) AS char(1))
FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '') AS CleanString;
GO
Declare @sample table(
string nvarchar(200))
INSERT INTO @sample VALUES( N'?????? No payment ashdk');
SELECT *
FROM @sample s
CROSS APPLY [dbo].[RemoveNonUnicodeChars](s.string) rnuc;
GO
DROP FUNCTION [dbo].[RemoveNonUnicodeChars]
References:
http://dwaincsql.com/2014/03/27/tally-tables-in-t-sql/
http://www.sqlservercentral.com/articles/comma+separated+list/71700/
June 2, 2017 at 10:52 am
Try this...
Declare @string nvarchar(200)
SET @string= N'?????? No payment ashdk';
SELECT REPLACE(CAST(@string AS VARCHAR(200)), CHAR(63), '');
June 2, 2017 at 11:11 am
Luis Cazares - Friday, June 2, 2017 10:50 AMThe problem is that you're only leaving letters and numbers instead of all non-unicode characters.
The following function uses several techniques such as a tally table,and using FOR XML to concatenate characters. It's also being called in a different way to improve performance.
CREATE FUNCTION [dbo].[RemoveNonUnicodeChars]
(
@string nvarchar(4000)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
WITH
E(n) AS(
SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
),
E2(n) AS(
SELECT a.n FROM E a, E b
),
E4(n) AS(
SELECT a.n FROM E2 a, E2 b
),
cteTally(n) AS(
SELECT TOP(LEN(@string))ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
FROM E4
)
SELECT STUFF(( SELECT SUBSTRING( @string, n, 1)
FROM cteTally
WHERE SUBSTRING( @string, n, 1) = CAST(SUBSTRING( @string, n, 1) AS char(1))
FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '') AS CleanString;
GO
Declare @sample table(
string nvarchar(200))
INSERT INTO @sample VALUES( N'資料來æºè¯åš No payment ashdk');SELECT *
FROM @sample s
CROSS APPLY [dbo].[RemoveNonUnicodeChars](s.string) rnuc;GO
DROP FUNCTION [dbo].[RemoveNonUnicodeChars]
References:
http://dwaincsql.com/2014/03/27/tally-tables-in-t-sql/
http://www.sqlservercentral.com/articles/comma+separated+list/71700/
How can we modifythis function to retrieve the column value ?
SELECT [dbo].[RemoveNonUnicodeChars](Contractplan) FROm [Stage].[Price] where stageid ='816299'
It throws error saying :Cannot find either column "dbo" or the user-defined function or aggregate "dbo.RemoveNonUnicodeChars", or the name is ambiguous.
June 2, 2017 at 11:16 am
Jason A. Long - Friday, June 2, 2017 10:52 AMTry this...
Declare @string nvarchar(200)
SET @string= N'資料來æºè¯åš No payment ashdk';SELECT REPLACE(CAST(@string AS VARCHAR(200)), CHAR(63), '');
I've always liked the idea of this approach. The trick, of course, is what to do with any question marks in the original string.
For most cases, you could get away with first replacing the question marks in the original string with something very, very uncommon, then doing the convert, removing the introduced question marks, and then replacing your uncommon character with question marks to reintroduce the originals.
If you want something to work in 100% of cases, though, you have to do something fancy to dynamically determine a character/set of characters that doesn't exist in the original string (as a safe placeholder for the original question marks).
By the time that's all said and done, though, the approach isn't really all that simpler or better performing than using the tally table/STUFF/FOR XML approach.
Still fun to work out, though 🙂
June 2, 2017 at 11:28 am
komal145 - Friday, June 2, 2017 11:11 AMHow can we modifythis function to retrieve the column value ?
SELECT [dbo].[RemoveNonUnicodeChars](Contractplan) FROm [Stage].[Price] where stageid ='816299'It throws error saying :Cannot find either column "dbo" or the user-defined function or aggregate "dbo.RemoveNonUnicodeChars", or the name is ambiguous.
There's an example on how to use it in the code that I posted. Did you even tried to understand the code?
June 2, 2017 at 11:45 am
Jacob Wilkins - Friday, June 2, 2017 11:16 AMI've always liked the idea of this approach. The trick, of course, is what to do with any question marks in the original string.For most cases, you could get away with first replacing the question marks in the original string with something very, very uncommon, then doing the convert, removing the introduced question marks, and then replacing your uncommon character with question marks to reintroduce the originals.
If you want something to work in 100% of cases, though, you have to do something fancy to dynamically determine a character/set of characters that doesn't exist in the original string (as a safe placeholder for the original question marks).
By the time that's all said and done, though, the approach isn't really all that simpler or better performing than using the tally table/STUFF/FOR XML approach.
Still fun to work out, though 🙂
Total brain fart on my part. Didn't even take into account the possibility of actual ?'s in the original string. :blush:
As far as it being more or less simple than the tally table/STUFF/FOR XML approach... It should still be significantly faster. While it's far faster than loops/cursors it's still has a significant expense when compared to a CAST and a few REPLACEs.
June 2, 2017 at 11:52 am
Luis Cazares - Friday, June 2, 2017 11:28 AMkomal145 - Friday, June 2, 2017 11:11 AMHow can we modifythis function to retrieve the column value ?
SELECT [dbo].[RemoveNonUnicodeChars](Contractplan) FROm [Stage].[Price] where stageid ='816299'It throws error saying :Cannot find either column "dbo" or the user-defined function or aggregate "dbo.RemoveNonUnicodeChars", or the name is ambiguous.
There's an example on how to use it in the code that I posted. Did you even tried to understand the code?
The code looks simple but it hard to understand but still trying what is really doing to remove such unicodes
June 2, 2017 at 12:09 pm
Updated to handle ? characters in the original string...
Declare @string nvarchar(200)
SET @string= N'?????? No payment ashdk ?&';
SELECT REPLACE(REPLACE(CAST(REPLACE(@string, '?', '```') AS VARCHAR(200)), CHAR(63), ''), '```', '?');
GO
June 2, 2017 at 12:11 pm
Jason A. Long - Friday, June 2, 2017 11:45 AMJacob Wilkins - Friday, June 2, 2017 11:16 AMI've always liked the idea of this approach. The trick, of course, is what to do with any question marks in the original string.For most cases, you could get away with first replacing the question marks in the original string with something very, very uncommon, then doing the convert, removing the introduced question marks, and then replacing your uncommon character with question marks to reintroduce the originals.
If you want something to work in 100% of cases, though, you have to do something fancy to dynamically determine a character/set of characters that doesn't exist in the original string (as a safe placeholder for the original question marks).
By the time that's all said and done, though, the approach isn't really all that simpler or better performing than using the tally table/STUFF/FOR XML approach.
Still fun to work out, though 🙂
Total brain fart on my part. Didn't even take into account the possibility of actual ?'s in the original string. :blush:
As far as it being more or less simple than the tally table/STUFF/FOR XML approach... It should still be significantly faster. While it's far faster than loops/cursors it's still has a significant expense when compared to a CAST and a few REPLACEs.
I agree that it has the potential to be faster. The trick is finding a suitably inexpensive way to find the safe placeholder for the original question marks.
I've only messed with this once a while back, but I remember the only method I tried that was 100% safe ended up being slower than the tally/STUFF/FOR XML method.
I might have (probably) missed some clever tricks then, though. I'll have to revisit to see if I can come up with something better this time around.
Cheers!
EDIT: A solution like your most recent post is mostly safe, but not 100% safe like the other approach; In theory, the original string might also contain '```', or any other fixed placeholder you choose (highly unlikely, of course, but still, 100% accuracy is nice :))
June 2, 2017 at 12:29 pm
Jacob Wilkins - Friday, June 2, 2017 12:11 PMEDIT: A solution like your most recent post is mostly safe, but not 100% safe like the other approach; In theory, the original string might also contain '```', or any other fixed placeholder you choose (highly unlikely, of course, but still, 100% accuracy is nice :))
I suppose there's a trade-off to be had... I don't think it's outrageous to assume that there are ASCII character combinations that will be so unlikely to exist as to assume that they would never exist in the existing text. You could use a GUID with the dashes replaced with tildes for example...
Given the performance difference, I (personally) could live with 99.999999999% safety...
June 2, 2017 at 9:58 pm
Jacob Wilkins - Friday, June 2, 2017 12:11 PMJason A. Long - Friday, June 2, 2017 11:45 AMJacob Wilkins - Friday, June 2, 2017 11:16 AMI've always liked the idea of this approach. The trick, of course, is what to do with any question marks in the original string.For most cases, you could get away with first replacing the question marks in the original string with something very, very uncommon, then doing the convert, removing the introduced question marks, and then replacing your uncommon character with question marks to reintroduce the originals.
If you want something to work in 100% of cases, though, you have to do something fancy to dynamically determine a character/set of characters that doesn't exist in the original string (as a safe placeholder for the original question marks).
By the time that's all said and done, though, the approach isn't really all that simpler or better performing than using the tally table/STUFF/FOR XML approach.
Still fun to work out, though 🙂
Total brain fart on my part. Didn't even take into account the possibility of actual ?'s in the original string. :blush:
As far as it being more or less simple than the tally table/STUFF/FOR XML approach... It should still be significantly faster. While it's far faster than loops/cursors it's still has a significant expense when compared to a CAST and a few REPLACEs.I agree that it has the potential to be faster. The trick is finding a suitably inexpensive way to find the safe placeholder for the original question marks.
I've only messed with this once a while back, but I remember the only method I tried that was 100% safe ended up being slower than the tally/STUFF/FOR XML method.
I might have (probably) missed some clever tricks then, though. I'll have to revisit to see if I can come up with something better this time around.
Cheers!
EDIT: A solution like your most recent post is mostly safe, but not 100% safe like the other approach; In theory, the original string might also contain '```', or any other fixed placeholder you choose (highly unlikely, of course, but still, 100% accuracy is nice :))
Try using CHAR(7). or CHAR(127).
--Jeff Moden
Change is inevitable... Change for the better is not.
June 3, 2017 at 8:37 am
komal145 - Friday, June 2, 2017 11:52 AMLuis Cazares - Friday, June 2, 2017 11:28 AMkomal145 - Friday, June 2, 2017 11:11 AMHow can we modifythis function to retrieve the column value ?
SELECT [dbo].[RemoveNonUnicodeChars](Contractplan) FROm [Stage].[Price] where stageid ='816299'It throws error saying :Cannot find either column "dbo" or the user-defined function or aggregate "dbo.RemoveNonUnicodeChars", or the name is ambiguous.
There's an example on how to use it in the code that I posted. Did you even tried to understand the code?
The code looks simple but it hard to understand but still trying what is really doing to remove such unicodes
It's looking at each character individually. It filters out the characters that don't match the Varchar equivalent of itself (Nvarchar p = Varchar p, Nvarchar åš = Varchar ?). It reassembles them and returns the concatenated string.
The single biggest thing Luis did to help you is that he converted it into an inline table-valued function (ITVF) instead of a scalar function. This is going to result in a huge difference in performance when you use it.
It does require that you call it differently than your scalar function. Luis included an example, so take a look at it and note the difference. This is one of those moments when a real paradigm shift in thinking can occur if you want it to.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply