June 24, 2009 at 10:00 pm
Comments posted to this topic are about the item Split string using XML
--Divya
June 25, 2009 at 12:09 am
Very useful.
June 25, 2009 at 1:52 am
what about performance?
June 25, 2009 at 2:43 am
Good. Different approach.
June 25, 2009 at 3:50 am
Hi all,
Its a new way to split the string using XML. I have written a SQL function to split the string without using XML. Please refer the below code...
CREATE function Split_fn
(
@split_stringvarchar(max),
@deli_charvarchar(3)
)
returns @list table
(
SeqNoint,
SplitStringvarchar(max)
)
as
begin
declare @from_locint
declare @to_locint
if charindex(@deli_char,@split_string,0) 0
begin
select @from_loc= 0
select @to_loc= charindex(@deli_char,@split_string,0)
end
if charindex(@deli_char,@split_string,0) <= 0
begin
select @to_loc = null
end
while @to_loc is not null
begin
if substring(@split_string,@from_loc, @to_loc - @from_loc) ''
begin
insert into @list(seqno, SplitString)
select isnull(max(seqno),0) + 1, substring(@split_string,@from_loc, @to_loc - @from_loc)
from@list
end
select @from_loc = charindex(@deli_char,@split_string,@from_loc+len(@deli_char)) + len(@deli_char)
select @to_loc = charindex(@deli_char,@split_string,@from_loc)
if @to_loc = 0
begin
if substring(@split_string,@from_loc, (len(@split_string) - @from_loc) + len(@deli_char)) ''
begin
insert into @list(seqno, SplitString)
select isnull(max(seqno),0) + 1, substring(@split_string,@from_loc, (len(@split_string) - @from_loc) + len(@deli_char))
from@list
end
select @to_loc = null
end
end
return
end
go
select * from dbo.split_fn('raja,ravi,prabhu',',')
Rafidheen.M
June 25, 2009 at 4:07 am
This technique was documented by Erland Sommarskog, SQL Server MVP in 2004. The XML and other methods can be found in the classic article "Arrays and Lists in SQL Server".
For SQL Server 2000, see http://www.sommarskog.se/arrays-in-sql-2000.html and for SQL Server 2005, see http://www.sommarskog.se/arrays-in-sql-2005.html
SQL = Scarcely Qualifies as a Language
June 25, 2009 at 4:43 am
What about using Tally tables to do the same:
June 25, 2009 at 6:35 am
We did performance and scaling test to split the comma separated string value using XML query and SQL function.
The plan was better with XML split than SQL function; however XML query performance degraded when number of concurrent users increase. SQL function did better in scalability test.
June 25, 2009 at 6:35 am
We did performance and scaling test to split the comma separated string value using XML query and SQL function.
The plan was better with XML split than SQL function; however XML query performance degraded when number of concurrent users increase. SQL function did better in scalability test.
June 25, 2009 at 6:47 am
Very handy - thanks for the article and examples Divya. And thanks for the performance info Senthilnathan.
June 25, 2009 at 7:55 am
For splitting delimited lists I really like Jeff Moden's approach.
http://www.sqlservercentral.com/articles/T-SQL/63003/
Depending on how you use XML as was previously mentioned can cause an issue with the special XML characters. You would probably want to do an initial select to get it encoded properly possibly nesting it inside your code. Here is an example showing the characters getting entity encoded.:
DECLARE @data table(
someData varchar(255) NOT NULL PRIMARY KEY);
INSERT INTO @data (someData) VALUES ('SpecialChars, , & ');
INSERT INTO @data (someData) VALUES ('just, regular');
SELECT d.someData
FROM @data d
FOR XML PATH(''), TYPE
June 25, 2009 at 8:19 am
Cool concept and well written article, but this is not the best way to split a string.
I've performance tests on this and the SQL while loop.
The XML version seems slick at first, but slows down terribly when size of the string increases.
And Jeff's tally method leaves the while loop far behind.
June 25, 2009 at 10:07 am
The fastest way to do this is to
1 use a CLR scalar function to convert the delimitted string as a fixed width string.
2 split it using a tally table and substring
CREATE FUNCTION [dbo].[FnSplitQuick](@str nvarchar(max), @maxwidth int, @delimitter varchar(1))
RETURNS TABLE
RETURN
(
WITH dt as (
select
Data,
Offset = case when @maxwidth > 0 then 0 else 10 end,
Width = case when @maxwidth > 0 then @maxwidth else left(data,10) end,
MaxN = (len(data) - 2 - case when @maxwidth > 0 then 0 else 10 end) / case when @maxwidth > 0 then @maxwidth else left(data,10) end
from ( select data = dbsystem.dbo.[fnConvertToFixedWidth](isnull(@str,'')+@delimitter+'a', @maxwidth, @delimitter) ) d
)
select
Idx = N,
Value = rtrim(substring(data,N*Width+1 + Offset,Width))
from dt
inner join dbsystem..tally n
on n<= MaxN
)
and the clr
[Microsoft.SqlServer.Server.SqlFunction()]
public static SqlString ConvertToFixedFn(SqlString str, int padWidth, SqlString delimitter) {
// split
string[] arrStr = str.Value.Split(delimitter.Value[0]);
StringBuilder sb = new StringBuilder();
bool addWidth = (padWidth == 0);
if (addWidth) {
padWidth = 1;
for (int i = 0; i padWidth) {
padWidth = arrStr.Length;
}
}
}
if (addWidth) {
sb.Append(padWidth.ToString().PadRight(10));
}
foreach (string item in arrStr) {
sb.Append(item.PadRight(padWidth));
}
return new SqlString(sb.ToString());
}
June 25, 2009 at 10:23 am
Very cool concept ....
For all interested in performance aspect, XML data types are powerful but expensive. For XML queries, execution plan means very little. As the XML payload/ nodes increase or node iteration increases, LOB parsing becomes extremely heavy. Processing raw XML without indexes or schema applied on it, is similar to a heap in concept but worse because XML parsing adds a heavy layer. Schema helps with the read-aheads and Iops. Adding indexes helps with the lookup. Caveat with the indexes, needs sufficient head room for growth. Some of the benchmarks from my a prior project, the index size is roughly 10-12 times the data size for a 500+ node XML. Here is a sample benchmark, 200 node XML of roughly 69K size has a 810K index size. The size of the data matters as well.
Without taking away the spotlight from Divya's cool technique, if there are numerous XML nodes I would
1. pin the XML data to physical table and column
2. create a schema and apply it on the XML column
3. make the table transient to save on disk space. This means maintenance to defrag
4. apply primary and at least secondary PATH index
5. maintain a seperate LUN for the table, if using the second example to split numerous rows in the table
Viewing 15 posts - 1 through 15 (of 48 total)
You must be logged in to reply to this topic. Login to reply