get only the alphabets from string

Question

Post reply

get only the alphabets from string

born2achieve

Hall of Fame

Points: 3182
More actions
September 12, 2013 at 6:08 pm

#395472

i have productname column in my table and the query i need to get only the alphabets.
column contains the special characters (!@#$%^&*():'"{}[]\|?-+=,) and numbers(0-9), space
sample data
Foodhold USA,mlc.
Beverage Partners Worldwide (North canada)......
Bread World 8
my expected output will be
FoodholdUSAmlc
BeveragePartnersWorldwide(Northcanada)
BreadWorld
SELECT productname,
SUBSTRING(Name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z.''0-9]%',LTRIM(RTRIM(productname))),0)-1,LEN(productname))) AS noSpecials
FROM Manufacturer
but it is not working. can anyone please show me sample query..

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Dwain Camps SSC Guru Points: 86978 More actions · Answer 1

You can try this:

WITH SampleData (col) AS

(

SELECT 'Foodhold USA,mlc.'

UNION ALL SELECT 'Beverage Partners Worldwide (North canada)......'

UNION ALL SELECT 'Bread World 8'''

),

Tally (n) AS

(

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns

)

SELECT col

,NewCol=

(

SELECT SUBSTRING(col, n, 1)

FROM SampleData b

CROSS APPLY (SELECT n FROM Tally WHERE n BETWEEN 1 AND LEN(col)) c

WHERE PATINDEX('%[a-zA-Z]%', SUBSTRING(col, n, 1)) <> 0 AND b.col = a.col

ORDER BY n

FOR XML PATH('')

)

FROM SampleData a

Note that you may have issues if the same "dirty" string appears in more than one row.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

born2achieve Hall of Fame Points: 3182 More actions · Answer 2

Thank you for the reply and will post the output soon.

kapil_kk SSC-Insane Points: 21316 More actions · Answer 3

Are the numbers and special characters present at the last of the string only?

_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

born2achieve Hall of Fame Points: 3182 More actions · Answer 4

I am done with writing function to remove the special characters and number from string.

Thanks everyone who responded to my question.

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 5

born2achieve (9/13/2013)
I am done with writing function to remove the special characters and number from string.
Thanks everyone who responded to my question.

Nice. Let's see it, please. Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

born2achieve Hall of Fame Points: 3182 More actions · Answer 6

Here is the code i tried and works fine the only minute draback of using this function take few milliseconds delay than using the sql query. but writing function uses in many by calling it

CREATE FUNCTION [dbo].[fn_GetAlphabetsOnly](@input VARCHAR(50))

RETURNS VARCHAR(1000)

AS

BEGIN

WHILE PATINDEX('%[^a-z]%',@input) > 0

SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input),1,'')

RETURN @input

END

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 7

born2achieve (9/13/2013)
Here is the code i tried and works fine the only minute draback of using this function take few milliseconds delay than using the sql query. but writing function uses in many by calling it
CREATE FUNCTION [dbo].[fn_GetAlphabetsOnly](@input VARCHAR(50))
RETURNS VARCHAR(1000)
AS
BEGIN
WHILE PATINDEX('%[^a-z]%',@input) > 0
SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input),1,'')
RETURN @input
END

My recommendation is to remove the "limits" you have built into it to give it a broader range of utility. For example, change @Input to VARCHAR(8000) and the RETURNS to VARCHAR(8000).

While (no pun intended) this is a good fast "memory only" method for doing such a thing, it's still a scalar function. I'll see if I can come up with an "iSF" later tonight with the understanding that it'll need to be used in a CROSS APPLY rather than in the SELECT list.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

born2achieve Hall of Fame Points: 3182 More actions · Answer 8

Dear Jeff,

Thanks for your time on this and it will be more helpful if you have much better choice than my approach as it is function.

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 9

Sorry... deleted this post. I made a testing error. I'll be back.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 10

Ok, here we go. First, we need some test data. I normally test functions with a million rows of data to ensure scalability so here’s a million row test table. I used the three strings you provided and duplicated them over a million rows.

--=======================================================================================

-- Create a Million row test table. This is not a part of the solution.

-- We're just building a test table here.

--=======================================================================================

--===== Conditionally drop the test table to make reruns easier in SSMS

IF OBJECT_ID('tempdb..#TestTable') IS NOT NULL

DROP TABLE #TestTable

;

--===== Create and populate the test table on-the-fly.

SELECT TOP 1000000

d.SomeString

INTO #TestTable

FROM master.sys.all_columns ac1

CROSS JOIN master.sys.all_columns ac2

CROSS APPLY

(

SELECT 'Foodhold USA,mlc.' UNION ALL

SELECT 'Beverage Partners Worldwide (North canada)......' UNION ALL

SELECT 'Bread World 8'

) d (SomeString)

;

The original function couldn’t handle anything over 50 characters because of the VARCHAR(50) input. I changed that in the original function but that’s all. Here’s the new copy for the original function.

ALTER FUNCTION dbo.fn_GetAlphabetsOnly

(@input VARCHAR(8000))

RETURNS VARCHAR(8000)

AS

BEGIN

WHILE PATINDEX('%[^a-z]%',@input) > 0

SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input ),1,'');

RETURN @input;

END

;

I also saw a major performance flaw (other than the WHILE loop ;-)) in the original function. As tight as it is, it executes PATINDEX twice for each row. I created a new function that has longer code but the execution time is cut about a third because it only executes PATINDEX once per row. Here’s that new function.

CREATE FUNCTION dbo.fn_GetAlphabetsOnlyModified

(@input VARCHAR(8000))

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @P INT

SELECT @P = PATINDEX('%[^a-z]%',@input)

WHILE @P > 0

SELECT @input = STUFF(@input,@P,1,''),

@P = PATINDEX('%[^a-z]%',@input)

RETURN @input;

END

;

Of course, there are other useful changes such as adding COLLATE LATIN1_GENERAL_BIN and checking for both upper/lower case letters to overcome the slower default collations people may be using but really wanted to test the original code, so I didn’t make those changes. I also didn’t add those changes into the set-based function below because I wanted to compare apples to apples.

CREATE FUNCTION dbo.GetAlphaOnly

/*****************************************************************************************

Purpose:

Given a string of up to 8000 characters, return only the alphabetic characters.

Usage:

SELECT ao.AlphaOnly

FROM dbo.SomeTable st

CROSS APPLY GetAlphaOnly(st.SomeStringColumn) ao

;

Alternate Usage:

SELECT AlphaOnly = (SELECT AlphaOnly FROM dbo.GetAlpha_XML(st.SomeString))

FROM dbo.SomeTable st

;

------------------------------------------------------------------------------------------

Revision History:

Rev 00 - 14 Sep 2013 - Jeff Moden

- Rework an existing function for performance purposes.

*****************************************************************************************/

(@pString VARCHAR(8000))

RETURNS TABLE AS

RETURN

WITH

--===== Generate up to 10,000 rows ("En" indicates the power of 10 produced)

E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),

E4(N) AS (SELECT 1 FROM E1 a,E1 b,E1 c,E1 d),

cTally AS (SELECT TOP (LEN(@pString)) N = ROW_NUMBER() OVER (ORDER BY N) FROM E4)

SELECT AlphaOnly = CAST(

(

SELECT SUBSTRING(@pString,t.N,1)

FROM cTally t

WHERE SUBSTRING(@pString,t.N,1) LIKE '[a-z]'

ORDER BY N

FOR XML PATH('')

)

AS VARCHAR(8000))

;

Of course, before we do any performance testing, we have to make sure that they all work correctly.

--===== Test to make sure that all 3 functions work correctly

SELECT Top 3

SomeString

,dbo.fn_GetAlphabetsOnly(SomeString)

FROM #TestTable

;

SELECT Top 3

SomeString

,dbo.fn_GetAlphabetsOnlyModified (SomeString)

FROM #TestTable

;

SELECT Top 3

SomeString

,AlphaOnly

FROM #TestTable

CROSS APPLY dbo.GetAlphaOnly(SomeString)

;

Results from above:

SomeString

------------------------------------------------ -------------------------------------

Foodhold USA, mlc. FoodholdUSAmlc

Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada

Bread World 8 BreadWorld

(3 row(s) affected)

SomeString

------------------------------------------------ -------------------------------------

Foodhold USA, mlc. FoodholdUSAmlc

Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada

Bread World 8 BreadWorld

(3 row(s) affected)

SomeString AlphaOnly

------------------------------------------------ -------------------------------------

Foodhold USA, mlc. FoodholdUSAmlc

Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada

Bread World 8 BreadWorld

(3 row(s) affected)

We’re ready to rumble for performance testing. We can’t use SET STATISTICS TIME, IO ON for these tests because it makes it look like Scalar UDF’s are about 6 or 7 times slower than they really are (see this link on that subject http://www.sqlservercentral.com/articles/T-SQL/91724/ ). Instead, I used SQL Profiler, which will also show the RBAR nature of the Scalar UDFs in terms of ROWCOUNTs generated internally.

With SQL Profiler all setup to capture the tests (I did it “Client Side” because it’s easy and I’m on a private machine), here’s the code I ran to perform the tests. Each query runs over the whole table. The @BitBucket variable is used as a target for the results to take display times out of the picture so that we’re mostly measuring the performance of the functions rather than the display times. I also run each query 3 times just to see if there are any performance anomalies.

--===== dbo.fn_GetAlphabetsOnly ==========================================================

DECLARE @BitBucket VARCHAR(8000);

SELECT @BitBucket = dbo.fn_GetAlphabetsOnly(SomeString)

FROM #TestTable

;

GO 3

--===== dbo.GetAlphabetsOnlyModified ====================================================

DECLARE @BitBucket VARCHAR(8000);

SELECT @BitBucket = dbo.fn_GetAlphabetsOnlyModified (SomeString)

FROM #TestTable

;

GO 3

--===== dbo.GetAlphaOnly =================================================================

DECLARE @BitBucket VARCHAR(8000);

SELECT @BitBucket = AlphaOnly

FROM #TestTable

CROSS APPLY dbo.GetAlphaOnly(SomeString)

;

GO 3

And, here’s the SQL Profiler results. As you can see, as tight as that nice little WHILE loop is, it never stood a chance.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)