September 12, 2013 at 6:08 pm
i have productname column in my table and the query i need to get only the alphabets.
column contains the special characters (!@#$%^&*():'"{}[]\|?-+=,) and numbers(0-9), space
sample data
Foodhold USA,mlc.
Beverage Partners Worldwide (North canada)......
Bread World 8
my expected output will be
FoodholdUSAmlc
BeveragePartnersWorldwide(Northcanada)
BreadWorld
SELECT productname,
SUBSTRING(Name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z.''0-9]%',LTRIM(RTRIM(productname))),0)-1,LEN(productname))) AS noSpecials
FROM Manufacturer
but it is not working. can anyone please show me sample query..
September 12, 2013 at 6:43 pm
You can try this:
WITH SampleData (col) AS
(
SELECT 'Foodhold USA,mlc.'
UNION ALL SELECT 'Beverage Partners Worldwide (North canada)......'
UNION ALL SELECT 'Bread World 8'''
),
Tally (n) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns
)
SELECT col
,NewCol=
(
SELECT SUBSTRING(col, n, 1)
FROM SampleData b
CROSS APPLY (SELECT n FROM Tally WHERE n BETWEEN 1 AND LEN(col)) c
WHERE PATINDEX('%[a-zA-Z]%', SUBSTRING(col, n, 1)) <> 0 AND b.col = a.col
ORDER BY n
FOR XML PATH('')
)
FROM SampleData a
Note that you may have issues if the same "dirty" string appears in more than one row.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 13, 2013 at 5:15 am
Thank you for the reply and will post the output soon.
September 13, 2013 at 8:15 am
Are the numbers and special characters present at the last of the string only?
_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
September 13, 2013 at 11:46 am
I am done with writing function to remove the special characters and number from string.
Thanks everyone who responded to my question.
September 13, 2013 at 1:36 pm
born2achieve (9/13/2013)
I am done with writing function to remove the special characters and number from string.Thanks everyone who responded to my question.
Nice. Let's see it, please. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 13, 2013 at 2:26 pm
Here is the code i tried and works fine the only minute draback of using this function take few milliseconds delay than using the sql query. but writing function uses in many by calling it
CREATE FUNCTION [dbo].[fn_GetAlphabetsOnly](@input VARCHAR(50))
RETURNS VARCHAR(1000)
AS
BEGIN
WHILE PATINDEX('%[^a-z]%',@input) > 0
SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input),1,'')
RETURN @input
END
September 13, 2013 at 2:39 pm
born2achieve (9/13/2013)
Here is the code i tried and works fine the only minute draback of using this function take few milliseconds delay than using the sql query. but writing function uses in many by calling it
CREATE FUNCTION [dbo].[fn_GetAlphabetsOnly](@input VARCHAR(50))
RETURNS VARCHAR(1000)
AS
BEGIN
WHILE PATINDEX('%[^a-z]%',@input) > 0
SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input),1,'')
RETURN @input
END
My recommendation is to remove the "limits" you have built into it to give it a broader range of utility. For example, change @Input to VARCHAR(8000) and the RETURNS to VARCHAR(8000).
While (no pun intended) this is a good fast "memory only" method for doing such a thing, it's still a scalar function. I'll see if I can come up with an "iSF" later tonight with the understanding that it'll need to be used in a CROSS APPLY rather than in the SELECT list.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 13, 2013 at 7:09 pm
Dear Jeff,
Thanks for your time on this and it will be more helpful if you have much better choice than my approach as it is function.
September 14, 2013 at 9:18 pm
Sorry... deleted this post. I made a testing error. I'll be back.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 15, 2013 at 10:21 am
Ok, here we go. First, we need some test data. I normally test functions with a million rows of data to ensure scalability so here’s a million row test table. I used the three strings you provided and duplicated them over a million rows.
--=======================================================================================
-- Create a Million row test table. This is not a part of the solution.
-- We're just building a test table here.
--=======================================================================================
--===== Conditionally drop the test table to make reruns easier in SSMS
IF OBJECT_ID('tempdb..#TestTable') IS NOT NULL
DROP TABLE #TestTable
;
--===== Create and populate the test table on-the-fly.
SELECT TOP 1000000
d.SomeString
INTO #TestTable
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
CROSS APPLY
(
SELECT 'Foodhold USA,mlc.' UNION ALL
SELECT 'Beverage Partners Worldwide (North canada)......' UNION ALL
SELECT 'Bread World 8'
) d (SomeString)
;
The original function couldn’t handle anything over 50 characters because of the VARCHAR(50) input. I changed that in the original function but that’s all. Here’s the new copy for the original function.
ALTER FUNCTION dbo.fn_GetAlphabetsOnly
(@input VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
WHILE PATINDEX('%[^a-z]%',@input) > 0
SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input ),1,'');
RETURN @input;
END
;
I also saw a major performance flaw (other than the WHILE loop ;-)) in the original function. As tight as it is, it executes PATINDEX twice for each row. I created a new function that has longer code but the execution time is cut about a third because it only executes PATINDEX once per row. Here’s that new function.
CREATE FUNCTION dbo.fn_GetAlphabetsOnlyModified
(@input VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @P INT
SELECT @P = PATINDEX('%[^a-z]%',@input)
WHILE @P > 0
SELECT @input = STUFF(@input,@P,1,''),
@P = PATINDEX('%[^a-z]%',@input)
RETURN @input;
END
;
Of course, there are other useful changes such as adding COLLATE LATIN1_GENERAL_BIN and checking for both upper/lower case letters to overcome the slower default collations people may be using but really wanted to test the original code, so I didn’t make those changes. I also didn’t add those changes into the set-based function below because I wanted to compare apples to apples.
CREATE FUNCTION dbo.GetAlphaOnly
/*****************************************************************************************
Purpose:
Given a string of up to 8000 characters, return only the alphabetic characters.
Usage:
SELECT ao.AlphaOnly
FROM dbo.SomeTable st
CROSS APPLY GetAlphaOnly(st.SomeStringColumn) ao
;
Alternate Usage:
SELECT AlphaOnly = (SELECT AlphaOnly FROM dbo.GetAlpha_XML(st.SomeString))
FROM dbo.SomeTable st
;
------------------------------------------------------------------------------------------
Revision History:
Rev 00 - 14 Sep 2013 - Jeff Moden
- Rework an existing function for performance purposes.
*****************************************************************************************/
(@pString VARCHAR(8000))
RETURNS TABLE AS
RETURN
WITH
--===== Generate up to 10,000 rows ("En" indicates the power of 10 produced)
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E4(N) AS (SELECT 1 FROM E1 a,E1 b,E1 c,E1 d),
cTally AS (SELECT TOP (LEN(@pString)) N = ROW_NUMBER() OVER (ORDER BY N) FROM E4)
SELECT AlphaOnly = CAST(
(
SELECT SUBSTRING(@pString,t.N,1)
FROM cTally t
WHERE SUBSTRING(@pString,t.N,1) LIKE '[a-z]'
ORDER BY N
FOR XML PATH('')
)
AS VARCHAR(8000))
;
Of course, before we do any performance testing, we have to make sure that they all work correctly.
--===== Test to make sure that all 3 functions work correctly
SELECT Top 3
SomeString
,dbo.fn_GetAlphabetsOnly(SomeString)
FROM #TestTable
;
SELECT Top 3
SomeString
,dbo.fn_GetAlphabetsOnlyModified (SomeString)
FROM #TestTable
;
SELECT Top 3
SomeString
,AlphaOnly
FROM #TestTable
CROSS APPLY dbo.GetAlphaOnly(SomeString)
;
Results from above:
SomeString
------------------------------------------------ -------------------------------------
Foodhold USA, mlc. FoodholdUSAmlc
Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada
Bread World 8 BreadWorld
(3 row(s) affected)
SomeString
------------------------------------------------ -------------------------------------
Foodhold USA, mlc. FoodholdUSAmlc
Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada
Bread World 8 BreadWorld
(3 row(s) affected)
SomeString AlphaOnly
------------------------------------------------ -------------------------------------
Foodhold USA, mlc. FoodholdUSAmlc
Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada
Bread World 8 BreadWorld
(3 row(s) affected)
We’re ready to rumble for performance testing. We can’t use SET STATISTICS TIME, IO ON for these tests because it makes it look like Scalar UDF’s are about 6 or 7 times slower than they really are (see this link on that subject http://www.sqlservercentral.com/articles/T-SQL/91724/ ). Instead, I used SQL Profiler, which will also show the RBAR nature of the Scalar UDFs in terms of ROWCOUNTs generated internally.
With SQL Profiler all setup to capture the tests (I did it “Client Side” because it’s easy and I’m on a private machine), here’s the code I ran to perform the tests. Each query runs over the whole table. The @Bitbucket variable is used as a target for the results to take display times out of the picture so that we’re mostly measuring the performance of the functions rather than the display times. I also run each query 3 times just to see if there are any performance anomalies.
--===== dbo.fn_GetAlphabetsOnly ==========================================================
DECLARE @Bitbucket VARCHAR(8000);
SELECT @Bitbucket = dbo.fn_GetAlphabetsOnly(SomeString)
FROM #TestTable
;
GO 3
--===== dbo.GetAlphabetsOnlyModified ====================================================
DECLARE @Bitbucket VARCHAR(8000);
SELECT @Bitbucket = dbo.fn_GetAlphabetsOnlyModified (SomeString)
FROM #TestTable
;
GO 3
--===== dbo.GetAlphaOnly =================================================================
DECLARE @Bitbucket VARCHAR(8000);
SELECT @Bitbucket = AlphaOnly
FROM #TestTable
CROSS APPLY dbo.GetAlphaOnly(SomeString)
;
GO 3
And, here’s the SQL Profiler results. As you can see, as tight as that nice little WHILE loop is, it never stood a chance.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply