string manipulation help

Question

Post reply

string manipulation help

is250sp

SSCarpal Tunnel

Points: 4361
More actions
February 6, 2009 at 12:16 pm

#77250

I have the following values..
IT
IT.Control.Bing
IT.Control.Bing.AC Review
IT.FldSvc.ARTS.Admin
IT.FldSvc.ARTS.Hardware.Printers/Copiers/Fax
IT.ProdSupp.UPMB.OASYS
IT.ProdSuppE.Atlanta.Accounting.Validation
I need to parse the string and return all values after the 3rd period. The return values would be.....
AC Review
Admin
Hardware.Printers/Copiers/Fax
OASYS
Accounting.Validation
I also want to grouped these values to the characters before the 2nd period. In other words....
AC Review will be grouped under IT.Control
Admin and Hardware.Printers/Copiers/Fax will be grouped under IT.FldSvc
OASYS will be grouped under IT.ProdSupp
Accounting.Validation will be grouped under IT.ProdSuppE
Help please.....

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

charlesz Old Hand Points: 381 More actions · Answer 1

You can create two user defined functions. One retrieves the names (after two dot) and one retrieves the group names.

Here is an example of user defined function. It can be run using Northwind database of SQL Server 2000.

CREATE FUNCTION TestFun (@input nvarchar(100) )

RETURNS nvarchar(30)

AS

BEGIN

return substring(@input, 1, 2)

END

GO

select dbo.TestFun(productname), count(*)

from products

group by dbo.TestFun(productname)

By the way, you need to use "CHARINDEX" to find the "dot" in your user defined functions.

http://www.speedydb.com

Charles Zhang

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 2

You haven't provided your table schema so I have used the following in my query below:

CREATE TABLE Items (

name varchar(100) NOT NULL

)

GO

INSERT INTO Items

SELECT 'IT'

UNION ALL SELECT 'IT.Control.Bing'

UNION ALL SELECT 'IT.Control.Bing.AC Review'

UNION ALL SELECT 'IT.FldSvc.ARTS.Admin'

UNION ALL SELECT 'IT.FldSvc.ARTS.Hardware.Printers/Copiers/Fax'

UNION ALL SELECT 'IT.ProdSupp.UPMB.OASYS'

UNION ALL SELECT 'IT.ProdSuppE.Atlanta.Accounting.Validation'

Here's the query.

SELECT LEFT(I.name, I.P2 - 1), RIGHT(I.name, LEN(I.name) - I.P3)

FROM (

SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)

FROM (

SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)

FROM (

SELECT I1.name, P1 = CHARINDEX('.', I1.name)

FROM Items I1

) I2

) I3

) I

WHERE (I.P3 > 0)

ORDER BY 1

Here are the results of this query using the data above. I'm not sure how well the query will perform on a very large table.

[font="Courier New"]IT.Control AC Review

IT.FldSvc Admin

IT.FldSvc Hardware.Printers/Copiers/Fax

IT.ProdSupp OASYS

IT.ProdSuppE Accounting.Validation[/font]

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 3

The following variation has a level of SELECT nesting that is one fewer, but gives the same results.

SELECT LEFT(I.name, I.P2 - 1), RIGHT(I.name, LEN(I.name) - CHARINDEX('.', I.name, I.P2 + 1))

FROM (

SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)

FROM (

SELECT I1.name, P1 = CHARINDEX('.', I1.name)

FROM Items I1

) I2

) I

WHERE (CHARINDEX('.', I.name, I.P2 + 1) > 0)

ORDER BY 1

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

SELECT Name,

SUBSTRING(Name,NULLIF(CHARINDEX('.',Name,NULLIF(CHARINDEX('.',Name,NULLIF(CHARINDEX('.',Name),0)+1),0)+1),0)+1,100)

FROM dbo.ITEMS

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Aram Koukia SSCommitted Points: 1521 More actions · Answer 5

hi

i think smith is right

[font="Courier New"]Aram Koukia: http://www.koukia.ca[/font]

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 6

aram_golbaghi (2/8/2009)
hi
i think smith is right

About what and why?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 7

Who can say if I'm right other than the OP, it depends on the OP's precise requirements.

OP does state the following which Jeff's query doesn't do.

I also want to grouped these values to the characters before the 2nd period.

However, my queries do not return rows corresponding to the following strings because they do not contain at least 3 '.' characters. Only the OP knows how these should be dealt with.

IT

IT.Control.Bing

I've tried comparing the performance of 4 different variations of my query where the number of nested select statements varies from 1 to 4. These 4 queries produce identical output. My testing used an Items table with 294,407 rows. There were no indexes defined on the table.

The results of this testing shows that the number of nested SELECT statements make little difference to the execution time (< 5% difference in both CPU time and elapsed time between best and worst cases), and that the actual execution plans appear to be the same. It seems that the query optimizer is able to recognise that all 4 queries only require a single pass (in this case a table scan).

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 8

Here are the 4 queries I used for the testing mentioned in my previous post. Unlike my original queries, they do return a row corresponding to every row in the Items table. This is achieved by using CASE WHEN clauses in the SELECT list and removing the WHERE clause.

PRINT ''

PRINT '--------------------------'

PRINT '4 nested select statements'

SET STATISTICS TIME ON

SELECT

CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,

CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END

FROM (

SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)

FROM (

SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)

FROM (

SELECT I1.name, P1 = CHARINDEX('.', I1.name)

from Items I1

) I2

) I3

) I

ORDER BY 1

SET STATISTICS TIME OFF

GO

PRINT ''

PRINT '--------------------------'

PRINT '3 nested select statements'

SET STATISTICS TIME ON

SELECT

CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,

CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END

FROM (

SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)

FROM (

SELECT I2.name, P2 = CHARINDEX('.', I2.name, CHARINDEX('.', I2.name) + 1)

FROM Items I2

) I3

) I

ORDER BY 1

SET STATISTICS TIME OFF

GO

PRINT ''

PRINT '--------------------------'

PRINT '2 nested select statements'

SET STATISTICS TIME ON

SELECT

CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,

CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END

FROM (

SELECT

I3.name,

P2 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1),

P3 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1) + 1)

FROM Items I3

) I

ORDER BY 1

SET STATISTICS TIME OFF

GO

PRINT ''

PRINT '--------------------------'

PRINT 'Single select statement'

SET STATISTICS TIME ON

SELECT

CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name) + 1) > 0) THEN

LEFT(name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) - 1)

ELSE name END,

CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1) > 0) THEN

RIGHT(name, LEN(name) - CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1))

ELSE NULL END

FROM Items

ORDER BY 1

SET STATISTICS TIME OFF

GO

Here are my timing statistics for the 4 queries.

[font="Courier New"]--------------------------

4 nested select statements

(294407 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 5180 ms, elapsed time = 5586 ms.

--------------------------

3 nested select statements

(294407 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 5117 ms, elapsed time = 5450 ms.

--------------------------

2 nested select statements

(294407 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 5054 ms, elapsed time = 5507 ms.

--------------------------

Single select statement

(294407 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 5102 ms, elapsed time = 5498 ms.[/font]

Given that the performance of the 4 queries are equivalent, we are free to consider other factors such as readability when choosing between them. I personally find the 3rd query with a single derived table the most transparent of the four.

SELECT

CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,

CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END

FROM (

SELECT

I3.name,

P2 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1),

P3 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1) + 1)

FROM Items I3

) I

ORDER BY 1

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 9

Incidently, avoiding the use of the NULLIF function in these queries does give a significant performance improvement. The following compares Jeff's query with a functionally equivalent query that does not use the NULLIF function.

PRINT ''

PRINT '--------------------------'

PRINT 'Without NULLIF'

SET STATISTICS TIME ON

SELECT

name,

CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1) > 0) THEN

RIGHT(name, LEN(name) - CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1))

ELSE NULL END

FROM dbo.Items

SET STATISTICS TIME OFF

GO

PRINT ''

PRINT '--------------------------'

PRINT 'With NULLIF'

SET STATISTICS TIME ON

SELECT

name,

SUBSTRING(name,NULLIF(CHARINDEX('.',name,NULLIF(CHARINDEX('.',name,NULLIF(CHARINDEX('.',name),0)+1),0)+1),0)+1,100)

FROM dbo.Items

SET STATISTICS TIME OFF

GO

Here are my timing statistics comparing the two queries:

[font="Courier New"]--------------------------

Without NULLIF

(294407 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 2621 ms, elapsed time = 3561 ms.

--------------------------

With NULLIF

(294407 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 5070 ms, elapsed time = 5461 ms.[/font]

Sergiy SSC Guru Points: 110209 More actions · Answer 10

The task itself is stupid enough not to bother to run any benchmark tests.

SQL Server is a RDMS, which stands for Relational Database Management System.

Can you show me any piece of relational database here?

Andrew, VB Script running against a flat file will beat any of your SQL options by times and times.

So, I don't see any point to enter the losers' competition.

If you mean to do it in SQL you need to create relational database first.

Then you need to populate that database by parsing incoming strings.

After that you'll get really effective output.

_____________
Code for TallyGenerator

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 11

Sergiy,

The task itself is stupid enough not to bother to run any benchmark tests.

I have to admit getting a bit carried away with this string manipulation problem. The only thing I'd say is that sometimes you are forced by circumstances to do things in SQL Server that it wasn't designed for.

is250sp SSCarpal Tunnel Points: 4361 More actions · Answer 12

Thanks Andrew. Your script is exactly what I needed. The data is already in a sql table and I am using SRS to create the report.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 13

andrewd.smith (2/8/2009)
Who can say if I'm right other than the OP, it depends on the OP's precise requirements.
OP does state the following which Jeff's query doesn't do.
I also want to grouped these values to the characters before the 2nd period.
However, my queries do not return rows corresponding to the following strings because they do not contain at least 3 '.' characters. Only the OP knows how these should be dealt with.
IT
IT.Control.Bing
I've tried comparing the performance of 4 different variations of my query where the number of nested select statements varies from 1 to 4. These 4 queries produce identical output. My testing used an Items table with 294,407 rows. There were no indexes defined on the table.
The results of this testing shows that the number of nested SELECT statements make little difference to the execution time (< 5% difference in both CPU time and elapsed time between best and worst cases), and that the actual execution plans appear to be the same. It seems that the query optimizer is able to recognise that all 4 queries only require a single pass (in this case a table scan).

Yep... you're definitely correct on that... I forgot the grouping. Sorry about that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110209 More actions · Answer 14

andrewd.smith (2/9/2009)
The only thing I'd say is that sometimes you are forced by circumstances to do things in SQL Server that it wasn't designed for.

Unless you're a professional.

_____________
Code for TallyGenerator