More than 1 alphanumeric chars in a string

Question

More than 1 alphanumeric chars in a string

shan-422658

SSC Enthusiast

Points: 130
More actions
March 12, 2014 at 8:25 am

#289232

I have an employee table
Surname GivenName
ABC x.yz
A.BC X.*YZ
A*.BC xyz
The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters. In this case the output will be
a.bc x.*yz
a*.BC xyz
I know how to find non alphanumeric chars in a string but not sure if I can how to find the strings with minimum 2 non alphanumeric chars?

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Keith Tate SSCrazy Eights Points: 9938 More actions · Answer 1

Now look for one of the SQL Ninja's on the forum to give you something cleaner and better, but this should work for now.with Employee as

(

select 'ABC' as Surname, 'x.yz' as GivenName union all

select 'A.BC' as Surname, 'X.*YZ' as GivenName union all

select 'A*.BC' as Surname, 'xyz' as GivenName

)

SELECT * FROM Employee

WHERE substring(Surname, patindex('%[^a-zA-Z0-9]%',Surname)+1,len(Surname)+1) like '%[^a-zA-Z0-9]%'

or substring(GivenName, patindex('%[^a-zA-Z0-9]%',GivenName)+1,len(GivenName)+1) like '%[^a-zA-Z0-9]%'

Microsoft Certified Master - SQL Server 2008
Follow me on twitter: @keith_tate

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Sean Lange SSC Guru Points: 286590 More actions · Answer 2

shan-422658 (3/12/2014)
I have an employee table
Surname GivenName
ABC x.yz
A.BC X.*YZ
A*.BC xyz
The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters. In this case the output will be
a.bc x.*yz
a*.BC xyz
I know how to find non alphanumeric chars in a string but not sure if I can how to find the strings with minimum 2 non alphanumeric chars?

Hi and welcome to the forums. This is a rather strange requirement for an employee table. It would seem to me that you should find any rows that have characters that don't make sense for a name. The only values that make sense in a name are any alphanumeric values (that depends on if you allow accent characters), a space, hyphen and apostrophe.

If you really need to allow characters like * in your employee name and you want to find any where there are more than 1 of those you could use a nested replace. It looks ugly but it should be really fast.

with employee as

(

select 'ABC' as Surname, 'x.yz' as GivenName union all

select 'A.BC', 'X.*YZ' union all

select 'A*.BC', 'xyz'

)

, AlphaRemoved as

(

select *,

len(

REPLACE(

Surname, 'A', ''),

'B', ''),

'C', ''),

'D', ''),

'E', ''),

'F', ''),

'G', ''),

'H', ''),

'I', ''),

'J', ''),

'K', ''),

'L', ''),

'M', ''),

'N', ''),

'O', ''),

'P', ''),

'Q', ''),

'R', ''),

'S', ''),

'T', ''),

'U', ''),

'V', ''),

'W', ''),

'X', ''),

'Y', ''),

'Z', '')

) as NonAlphaCountSurname,

len(

REPLACE(

GivenName, 'A', ''),

'B', ''),

'C', ''),

'D', ''),

'E', ''),

'F', ''),

'G', ''),

'H', ''),

'I', ''),

'J', ''),

'K', ''),

'L', ''),

'M', ''),

'N', ''),

'O', ''),

'P', ''),

'Q', ''),

'R', ''),

'S', ''),

'T', ''),

'U', ''),

'V', ''),

'W', ''),

'X', ''),

'Y', ''),

'Z', '')

) as NonAlphaCountGivenName

from employee

)

select Surname, GivenName

from AlphaRemoved

where NonAlphaCountSurname > 1

or NonAlphaCountGivenName > 1

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Keith Tate SSCrazy Eights Points: 9938 More actions · Answer 3

BTW Shan, Sean is one of the SQL Ninjas 😀

Microsoft Certified Master - SQL Server 2008
Follow me on twitter: @keith_tate

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Sean Lange SSC Guru Points: 286590 More actions · Answer 4

Keith Tate (3/12/2014)
BTW Shan, Sean is one of the SQL Ninjas 😀

Thanks Keith. I would not consider myself a SQL Ninja by any means but thanks for the compliment. I am truly honored and humbled.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Dohsan SSCrazy Points: 2972 More actions · Answer 5

Sean Lange (3/12/2014)
Keith Tate (3/12/2014)
BTW Shan, Sean is one of the SQL Ninjas 😀
Thanks Keith. I would not consider myself a SQL Ninja by any means but thanks for the compliment. I am truly honored and humbled.

and well known for his love of a good nested replace 😛

Sean Lange SSC Guru Points: 286590 More actions · Answer 6

In an effort to evaluate which of the two posted methods would perform better I put together a quick test with 1000 random names. It is kind of nasty to post all the details here but I will certainly put it together if anybody wants to see what I did. I have a couple of random name tables that I used for this. With 1,000 rows the two performed so close that there is no clear winner. When I used a cross join between my two tables (which means we are now hitting 2 million rows) the nested replace has a VERY slight edge on performance. They are both actually quite fast with even 2 million rows.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

shan-422658 SSC Enthusiast Points: 130 More actions · Answer 7

Thanks for your reply.

Just a version 2 of the above query…

Surname GivenName

ABC x.yz

ABcx.y.z

ABC X.*YZ

A*.BC xyz

The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters together(one after the other). In this case the output will be

abc x.*yz

a*.BC xyz

How would I find just the names that have 2 non alphanumeric chars together(one after the other)

Sean Lange SSC Guru Points: 286590 More actions · Answer 8

Dohsan (3/12/2014)
Sean Lange (3/12/2014)
Keith Tate (3/12/2014)
BTW Shan, Sean is one of the SQL Ninjas 😀
Thanks Keith. I would not consider myself a SQL Ninja by any means but thanks for the compliment. I am truly honored and humbled.
and well known for his love of a good nested replace 😛

😛 They are certainly ugly to look at but they are also usually super fast.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Sean Lange SSC Guru Points: 286590 More actions · Answer 9

shan-422658 (3/12/2014)
Thanks for your reply.
Just a version 2 of the above query…
Surname GivenName
ABC x.yz
ABcx.y.z
ABC X.*YZ
A*.BC xyz
The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters together(one after the other). In this case the output will be
abc x.*yz
a*.BC xyz
How would I find just the names that have 2 non alphanumeric chars together(one after the other)

What are you really after? It seems like you are doing some analysis on your data. Is this something you are going to be doing routinely? Trying to find only those values where there are at least 2 non alphanumeric characters in a row is going to suck rotten tomatoes. You are going to have to look at each and every character in every value for all rows in the entire table. This takes RBAR to a new level. It is like RBAR cross join CBAC (character by agonizing character).

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Luis Cazares SSC Guru Points: 183706 More actions · Answer 10

I might be trying to kill a fly with a cannon but this could be an option. It could be turned into an inLine Table-valued Function if needed to avoid writing everything if this code is needed somewhere else.

CREATE TABLE #employee(Surname varchar(50), GivenName varchar(50)) ;

WITH Employee as

(

select 'ABC' as Surname, 'x.yz' as GivenName union all

select 'ABC', 'X.*YZ' union all

select 'A*.BC', 'xyz' union all

select 'ABc', 'x.y.z'

)

INSERT INTO #employee

SELECT *

FROM Employee;

WITH E1(N) AS(

SELECT N FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(N)

),

E2(N) AS(

SELECT a.N FROM E1 a, E1 b

),

E4(N) AS(

SELECT a.N FROM E2 a, E2 b

)

SELECT e.*

FROM #employee e

CROSS APPLY( SELECT TOP( SELECT MAX(strLen)

FROM( VALUES(LEN( e.Surname)), (LEN( e.Surname))) x(strLen))

ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

FROM E4) Tally(N)

WHERE (SUBSTRING( e.Surname, N, 1) LIKE '[^0-9a-zA-z]'

AND SUBSTRING( e.Surname, N + 1, 1) LIKE '[^0-9a-zA-z]')

OR (SUBSTRING( e.GivenName, N, 1) LIKE '[^0-9a-zA-z]'

AND SUBSTRING( e.GivenName, N + 1, 1) LIKE '[^0-9a-zA-z]') ;

GO

DROP TABLE #employee;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sean Lange SSC Guru Points: 286590 More actions · Answer 11

Luis Cazares (3/12/2014)
I might be trying to kill a fly with a sledgehammer but this could be an option. It could be turned into an inLine Table-valued Function if needed to avoid writing everything if this code is needed somewhere else.
CREATE TABLE #employee(Surname varchar(50), GivenName varchar(50)) ;
WITH Employee as
(
select 'ABC' as Surname, 'x.yz' as GivenName union all
select 'ABC', 'X.*YZ' union all
select 'A*.BC', 'xyz' union all
select 'ABc', 'x.y.z'
)
INSERT INTO #employee
SELECT *
FROM Employee;
WITH E1(N) AS(
SELECT N FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(N)
),
E2(N) AS(
SELECT a.N FROM E1 a, E1 b
),
E4(N) AS(
SELECT a.N FROM E2 a, E2 b
)
SELECT e.*
FROM #employee e
CROSS APPLY( SELECT TOP( SELECT MAX(strLen)
FROM( VALUES(LEN( e.Surname)), (LEN( e.Surname))) x(strLen))
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM E4) Tally(N)
WHERE (SUBSTRING( e.Surname, N, 1) LIKE '[^0-9a-zA-z]'
AND SUBSTRING( e.Surname, N + 1, 1) LIKE '[^0-9a-zA-z]')
OR (SUBSTRING( e.GivenName, N, 1) LIKE '[^0-9a-zA-z]'
AND SUBSTRING( e.GivenName, N + 1, 1) LIKE '[^0-9a-zA-z]') ;
GO
DROP TABLE #employee;

That is pretty slick Luis.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Luis Cazares SSC Guru Points: 183706 More actions · Answer 12

Sean Lange (3/12/2014)

That is pretty slick Luis.

Thanks Sean, it came after my third cup of coffee, I might need more now. 😀

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Keith Tate SSCrazy Eights Points: 9938 More actions · Answer 13

Sean Lange (3/12/2014)
In an effort to evaluate which of the two posted methods would perform better I put together a quick test with 1000 random names. It is kind of nasty to post all the details here but I will certainly put it together if anybody wants to see what I did. I have a couple of random name tables that I used for this. With 1,000 rows the two performed so close that there is no clear winner. When I used a cross join between my two tables (which means we are now hitting 2 million rows) the nested replace has a VERY slight edge on performance. They are both actually quite fast with even 2 million rows.

Thanks for doing that Sean very interesting results.

Microsoft Certified Master - SQL Server 2008
Follow me on twitter: @keith_tate

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Dwain Camps SSC Guru Points: 86948 More actions · Answer 14

Sean Lange (3/12/2014)
shan-422658 (3/12/2014)
Thanks for your reply.
Just a version 2 of the above query…
Surname GivenName
ABC x.yz
ABcx.y.z
ABC X.*YZ
A*.BC xyz
The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters together(one after the other). In this case the output will be
abc x.*yz
a*.BC xyz
How would I find just the names that have 2 non alphanumeric chars together(one after the other)
What are you really after? It seems like you are doing some analysis on your data. Is this something you are going to be doing routinely? Trying to find only those values where there are at least 2 non alphanumeric characters in a row is going to suck rotten tomatoes. You are going to have to look at each and every character in every value for all rows in the entire table. This takes RBAR to a new level. It is like RBAR cross join CBAC (character by agonizing character).

I have to agree with Sean that what you're asking for sounds rather nasty. But then again I like a good challenge:

with Employee as

(

select 'ABC' as Surname, 'x.yz' as GivenName union all

select 'A.BC' as Surname, 'X.*YZ' as GivenName union all

select 'A.D.*C' as Surname, 'X.*YZ' as GivenName union all

select 'A*.BC.*.%.%.%' as Surname, 'xyz' as GivenName

),

Tally (n) AS

(

SELECT TOP (SELECT CASE WHEN MAX(LEN(Surname)) > MAX(LEN(GivenName)) THEN MAX(LEN(Surname)) ELSE MAX(LEN(GivenName)) END FROM Employee)

ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns

)

SELECT a.Surname, a.GivenName

FROM Employee a

WHERE EXISTS

(

SELECT 1

FROM Tally

WHERE n < LEN(Surname) - 1 AND

PATINDEX('%[^a-zA-Z0-9]%', SUBSTRING(Surname, n, 1)) > 0 AND

PATINDEX('%[^a-zA-Z0-9]%', SUBSTRING(Surname, n+1, 1)) > 0

)

OR EXISTS

(

SELECT 1

FROM Tally

WHERE n < LEN(GivenName) - 1 AND

PATINDEX('%[^a-zA-Z0-9]%', SUBSTRING(GivenName, n, 1)) > 0 AND

PATINDEX('%[^a-zA-Z0-9]%', SUBSTRING(GivenName, n+1, 1)) > 0

);

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St