February 6, 2009 at 12:16 pm
I have the following values..
IT
IT.Control.Bing
IT.Control.Bing.AC Review
IT.FldSvc.ARTS.Admin
IT.FldSvc.ARTS.Hardware.Printers/Copiers/Fax
IT.ProdSupp.UPMB.OASYS
IT.ProdSuppE.Atlanta.Accounting.Validation
I need to parse the string and return all values after the 3rd period. The return values would be.....
AC Review
Admin
Hardware.Printers/Copiers/Fax
OASYS
Accounting.Validation
I also want to grouped these values to the characters before the 2nd period. In other words....
AC Review will be grouped under IT.Control
Admin and Hardware.Printers/Copiers/Fax will be grouped under IT.FldSvc
OASYS will be grouped under IT.ProdSupp
Accounting.Validation will be grouped under IT.ProdSuppE
Help please.....
February 6, 2009 at 1:15 pm
You can create two user defined functions. One retrieves the names (after two dot) and one retrieves the group names.
Here is an example of user defined function. It can be run using Northwind database of SQL Server 2000.
CREATE FUNCTION TestFun (@input nvarchar(100) )
RETURNS nvarchar(30)
AS
BEGIN
return substring(@input, 1, 2)
END
GO
select dbo.TestFun(productname), count(*)
from products
group by dbo.TestFun(productname)
By the way, you need to use "CHARINDEX" to find the "dot" in your user defined functions.
Charles Zhang
February 6, 2009 at 2:21 pm
You haven't provided your table schema so I have used the following in my query below:
CREATE TABLE Items (
name varchar(100) NOT NULL
)
GO
INSERT INTO Items
SELECT 'IT'
UNION ALL SELECT 'IT.Control.Bing'
UNION ALL SELECT 'IT.Control.Bing.AC Review'
UNION ALL SELECT 'IT.FldSvc.ARTS.Admin'
UNION ALL SELECT 'IT.FldSvc.ARTS.Hardware.Printers/Copiers/Fax'
UNION ALL SELECT 'IT.ProdSupp.UPMB.OASYS'
UNION ALL SELECT 'IT.ProdSuppE.Atlanta.Accounting.Validation'
Here's the query.
SELECT LEFT(I.name, I.P2 - 1), RIGHT(I.name, LEN(I.name) - I.P3)
FROM (
SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)
FROM (
SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)
FROM (
SELECT I1.name, P1 = CHARINDEX('.', I1.name)
FROM Items I1
) I2
) I3
) I
WHERE (I.P3 > 0)
ORDER BY 1
Here are the results of this query using the data above. I'm not sure how well the query will perform on a very large table.
[font="Courier New"]IT.Control AC Review
IT.FldSvc Admin
IT.FldSvc Hardware.Printers/Copiers/Fax
IT.ProdSupp OASYS
IT.ProdSuppE Accounting.Validation[/font]
February 6, 2009 at 2:36 pm
The following variation has a level of SELECT nesting that is one fewer, but gives the same results.
SELECT LEFT(I.name, I.P2 - 1), RIGHT(I.name, LEN(I.name) - CHARINDEX('.', I.name, I.P2 + 1))
FROM (
SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)
FROM (
SELECT I1.name, P1 = CHARINDEX('.', I1.name)
FROM Items I1
) I2
) I
WHERE (CHARINDEX('.', I.name, I.P2 + 1) > 0)
ORDER BY 1
February 7, 2009 at 12:14 pm
SELECT Name,
SUBSTRING(Name,NULLIF(CHARINDEX('.',Name,NULLIF(CHARINDEX('.',Name,NULLIF(CHARINDEX('.',Name),0)+1),0)+1),0)+1,100)
FROM dbo.ITEMS
--Jeff Moden
Change is inevitable... Change for the better is not.
February 8, 2009 at 3:45 am
February 8, 2009 at 9:01 am
aram_golbaghi (2/8/2009)
hii think smith is right
About what and why?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 8, 2009 at 2:43 pm
Who can say if I'm right other than the OP, it depends on the OP's precise requirements.
OP does state the following which Jeff's query doesn't do.
I also want to grouped these values to the characters before the 2nd period.
However, my queries do not return rows corresponding to the following strings because they do not contain at least 3 '.' characters. Only the OP knows how these should be dealt with.
IT
IT.Control.Bing
I've tried comparing the performance of 4 different variations of my query where the number of nested select statements varies from 1 to 4. These 4 queries produce identical output. My testing used an Items table with 294,407 rows. There were no indexes defined on the table.
The results of this testing shows that the number of nested SELECT statements make little difference to the execution time (< 5% difference in both CPU time and elapsed time between best and worst cases), and that the actual execution plans appear to be the same. It seems that the query optimizer is able to recognise that all 4 queries only require a single pass (in this case a table scan).
February 8, 2009 at 3:59 pm
Here are the 4 queries I used for the testing mentioned in my previous post. Unlike my original queries, they do return a row corresponding to every row in the Items table. This is achieved by using CASE WHEN clauses in the SELECT list and removing the WHERE clause.
PRINT ''
PRINT '--------------------------'
PRINT '4 nested select statements'
SET STATISTICS TIME ON
SELECT
CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,
CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END
FROM (
SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)
FROM (
SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)
FROM (
SELECT I1.name, P1 = CHARINDEX('.', I1.name)
from Items I1
) I2
) I3
) I
ORDER BY 1
SET STATISTICS TIME OFF
GO
PRINT ''
PRINT '--------------------------'
PRINT '3 nested select statements'
SET STATISTICS TIME ON
SELECT
CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,
CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END
FROM (
SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)
FROM (
SELECT I2.name, P2 = CHARINDEX('.', I2.name, CHARINDEX('.', I2.name) + 1)
FROM Items I2
) I3
) I
ORDER BY 1
SET STATISTICS TIME OFF
GO
PRINT ''
PRINT '--------------------------'
PRINT '2 nested select statements'
SET STATISTICS TIME ON
SELECT
CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,
CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END
FROM (
SELECT
I3.name,
P2 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1),
P3 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1) + 1)
FROM Items I3
) I
ORDER BY 1
SET STATISTICS TIME OFF
GO
PRINT ''
PRINT '--------------------------'
PRINT 'Single select statement'
SET STATISTICS TIME ON
SELECT
CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name) + 1) > 0) THEN
LEFT(name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) - 1)
ELSE name END,
CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1) > 0) THEN
RIGHT(name, LEN(name) - CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1))
ELSE NULL END
FROM Items
ORDER BY 1
SET STATISTICS TIME OFF
GO
Here are my timing statistics for the 4 queries.
[font="Courier New"]--------------------------
4 nested select statements
(294407 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 5180 ms, elapsed time = 5586 ms.
--------------------------
3 nested select statements
(294407 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 5117 ms, elapsed time = 5450 ms.
--------------------------
2 nested select statements
(294407 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 5054 ms, elapsed time = 5507 ms.
--------------------------
Single select statement
(294407 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 5102 ms, elapsed time = 5498 ms.[/font]
Given that the performance of the 4 queries are equivalent, we are free to consider other factors such as readability when choosing between them. I personally find the 3rd query with a single derived table the most transparent of the four.
SELECT
CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,
CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END
FROM (
SELECT
I3.name,
P2 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1),
P3 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1) + 1)
FROM Items I3
) I
ORDER BY 1
February 8, 2009 at 5:24 pm
Incidently, avoiding the use of the NULLIF function in these queries does give a significant performance improvement. The following compares Jeff's query with a functionally equivalent query that does not use the NULLIF function.
PRINT ''
PRINT '--------------------------'
PRINT 'Without NULLIF'
SET STATISTICS TIME ON
SELECT
name,
CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1) > 0) THEN
RIGHT(name, LEN(name) - CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1))
ELSE NULL END
FROM dbo.Items
SET STATISTICS TIME OFF
GO
PRINT ''
PRINT '--------------------------'
PRINT 'With NULLIF'
SET STATISTICS TIME ON
SELECT
name,
SUBSTRING(name,NULLIF(CHARINDEX('.',name,NULLIF(CHARINDEX('.',name,NULLIF(CHARINDEX('.',name),0)+1),0)+1),0)+1,100)
FROM dbo.Items
SET STATISTICS TIME OFF
GO
Here are my timing statistics comparing the two queries:
[font="Courier New"]--------------------------
Without NULLIF
(294407 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 2621 ms, elapsed time = 3561 ms.
--------------------------
With NULLIF
(294407 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 5070 ms, elapsed time = 5461 ms.[/font]
February 8, 2009 at 7:08 pm
The task itself is stupid enough not to bother to run any benchmark tests.
SQL Server is a RDMS, which stands for Relational Database Management System.
Can you show me any piece of relational database here?
Andrew, VB Script running against a flat file will beat any of your SQL options by times and times.
So, I don't see any point to enter the losers' competition.
If you mean to do it in SQL you need to create relational database first.
Then you need to populate that database by parsing incoming strings.
After that you'll get really effective output.
_____________
Code for TallyGenerator
February 9, 2009 at 3:05 am
Sergiy,
The task itself is stupid enough not to bother to run any benchmark tests.
I have to admit getting a bit carried away with this string manipulation problem. The only thing I'd say is that sometimes you are forced by circumstances to do things in SQL Server that it wasn't designed for.
February 9, 2009 at 9:41 am
Thanks Andrew. Your script is exactly what I needed. The data is already in a sql table and I am using SRS to create the report.
February 9, 2009 at 10:46 am
andrewd.smith (2/8/2009)
Who can say if I'm right other than the OP, it depends on the OP's precise requirements.OP does state the following which Jeff's query doesn't do.
I also want to grouped these values to the characters before the 2nd period.
However, my queries do not return rows corresponding to the following strings because they do not contain at least 3 '.' characters. Only the OP knows how these should be dealt with.
IT
IT.Control.Bing
I've tried comparing the performance of 4 different variations of my query where the number of nested select statements varies from 1 to 4. These 4 queries produce identical output. My testing used an Items table with 294,407 rows. There were no indexes defined on the table.
The results of this testing shows that the number of nested SELECT statements make little difference to the execution time (< 5% difference in both CPU time and elapsed time between best and worst cases), and that the actual execution plans appear to be the same. It seems that the query optimizer is able to recognise that all 4 queries only require a single pass (in this case a table scan).
Yep... you're definitely correct on that... I forgot the grouping. Sorry about that.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 9, 2009 at 1:29 pm
andrewd.smith (2/9/2009)
The only thing I'd say is that sometimes you are forced by circumstances to do things in SQL Server that it wasn't designed for.
Unless you're a professional.
_____________
Code for TallyGenerator
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply