December 14, 2012 at 9:02 pm
dwain.c (12/14/2012)
Jeff has said (he can correct me if I'm wrong) that DelimitedSplit8K is optimized for VARCHAR(8000) so change it to VARCHAR(MAX) with that in mind.I think although I don't have rigid testing results that PatternSplitCM is not significantly affected if you change to VARCHAR(MAX).
It is, indeed, optimized for 8K. You also have to remember that VARCHAR(MAX) doesn't like to be joined to even by a Tally Table and will cause at least a 2 to 1 slowdown just by changing the function variables from VARCHAR(8000) to VARCHAR(MAX) even if the data stays under 8K.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 14, 2012 at 9:03 pm
Hi Jeff - Sorry those things are not pairs/dont increment by one. I just made that example in excel.
December 14, 2012 at 9:05 pm
Yes, DelimitedSplit8K is miraculously optimized for VARCHAR(8000). It runs very fast!
December 14, 2012 at 9:09 pm
Hi - Quick question. Do you know why VARCHAR(MAX) doesnt work in this query?
It returns the following error:
Msg 240, Level 16, State 1, Line 110
Types don't match between the anchor and the recursive part in column "rowstrt" of recursive query "cte".
Thanks!
nigelrivett (12/11/2012)
This will do something like that.It caters for any number of fields - just amend the final select for the number required
declare @s-2 varchar(8000)
select @s-2 = '30;38469;1|31;38470;1|32;38471;1|33;38472;1|34;38473;1|35;38474;1|36;38475;1|37;38476;1|38;38477;1|'
select @s-2 = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
declare @rowterm varchar(1) = '|'
declare @fieldterm varchar(1) = ';'
;with cte as
(
select rowstrt = 1, rowend = charindex(@rowterm,@s)-1, seq = 1
union all
select rowstrt = rowend+2, rowend = charindex(@rowterm,@s,rowend+2)-1, seq = seq + 1
from cte where charindex(@rowterm,@s,rowend+2)<>0
) ,
cte2 as
(
select s = substring(@s, rowstrt,rowend-rowstrt+1), seq
from cte
) ,
cte3 as
(
select fldstrt = 1, fldend = charindex(@fieldterm,s)-1, seq, s, fldseq = 1 from cte2
union all
select fldstrt = fldend+2,
fldend = case when charindex(@fieldterm,s,fldend+2) <> 0 then charindex(@fieldterm,s,fldend+2)-1 else len(s) end,
seq, s, fldseq = fldseq+1
from cte3 where fldend < len(s)
)
select s1.s, s2.s, s3.s
from
(select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 1) s1
join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 2) s2 on s1.seq = s2.seq
left join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 3) s3 on s1.seq = s3.seq
December 14, 2012 at 9:17 pm
Yes... both variables and any/all string literals must be defined as VARCHAR(MAX) including the ones that hold the single character delimiters. It's one of the great joys of recursive CTE's. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
December 14, 2012 at 9:21 pm
murthyvs (12/14/2012)
Hi Jeff - Sorry those things are not pairs/dont increment by one. I just made that example in excel.
Like I said... remarkable coincidence. Thanks for the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 14, 2012 at 9:26 pm
dwain.c (12/11/2012)
Here's another way:
DECLARE @MyString VARCHAR(8000) =
'30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)
,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)
FROM PatternSplitCM(@MyString, '[0-9]')
WHERE [Matched] = 1
GROUP BY ItemNumber/4
PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/
Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.
Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).
Edit: Revised (simplified) my vector of attack slightly.
Hi - Just exploring some VARCHAR(MAX) options. Will patternsplitcm work for this dataset?
dataset1:
A123W1;EESLGKNR912|
A12W2;EESLGKNR913|
A3W3;EESLGR914|
123W4;12EESLGKNR915|
123W5;EESL12NR916|
A123W;E31LGKNR917|
A123W7;EE18|
A123W8;1234|
1238;1234|
dataset2:
A123W1EESLGKNR912
A12W2EESLGKNR913
A3W3EESLGR914
123W412EESLGKNR915
123W5EESL12NR916
A123WE31LGKNR917
A123W7EE18
A123W81234
12381234
Thanks
December 15, 2012 at 5:53 pm
Here's a way that uses XML, which you could generate in the web application before passing the data to SQL, or even just convert to XML and pump straight into a SQL table all within the web app...
Anyway, this is the SQL version:
-- set up some sample input data
declare @input varchar(max);
set @input = '30;38469|
31;38470|
32;38471|
33;38472|
34;38473|
35;38474|
36;38475|
37;38476|
38;38477|'
-- convert to xml
-- of course, this could be done in your web app and may be quicker.
declare @xml xml;
SET @xml =
CONVERT(xml,
'<row><col>'+
REPLACE(
REPLACE(
REPLACE(
REPLACE(
@input,
CHAR(13),
''
),
CHAR(10),
''
),
'|',
'</col></row><row><col>'
),
';',
'</col><col>'
)
+'</col></row>'
);
-- and select the values out as rows/columns
select
nd.value('(col/text())[1]','varchar(200)') as field1,
nd.value('(col/text())[2]','varchar(200)') as field2
from @xml.nodes('row') as x(nd)
where not nd.value('(col/text())[1]','varchar(200)') is null
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 16, 2012 at 5:10 pm
murthyvs (12/14/2012)
dwain.c (12/11/2012)
Here's another way:
DECLARE @MyString VARCHAR(8000) =
'30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)
,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)
FROM PatternSplitCM(@MyString, '[0-9]')
WHERE [Matched] = 1
GROUP BY ItemNumber/4
PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/
Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.
Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).
Edit: Revised (simplified) my vector of attack slightly.
Hi - Just exploring some VARCHAR(MAX) options. Will patternsplitcm work for this dataset?
dataset1:
A123W1;EESLGKNR912|
A12W2;EESLGKNR913|
A3W3;EESLGR914|
123W4;12EESLGKNR915|
123W5;EESL12NR916|
A123W;E31LGKNR917|
A123W7;EE18|
A123W8;1234|
1238;1234|
dataset2:
A123W1EESLGKNR912
A12W2EESLGKNR913
A3W3EESLGR914
123W412EESLGKNR915
123W5EESL12NR916
A123WE31LGKNR917
A123W7EE18
A123W81234
12381234
Thanks
For dataset1 the answer is yes. Use pattern: '[0-9A-Z]' or if the database is case sensitive: '[0-9A-Za-z]'
For dataset2 it is possible the same pattern will yield the results you seek, except it is unclear what the row split character is. CHAR(10) or CHAR(15) perhaps?
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 16, 2012 at 10:03 pm
Hi - this a nice method. didnt know this before.
I have created this function and using it in a stored proc..
CREATE FUNCTION [dbo].[FnXmlSplit](@input_string VARCHAR(max))
RETURNS @temptable TABLE (
field1 VARCHAR(max)
,field2 VARCHAR(max))
AS
BEGIN
DECLARE @xml XML;
SET @xml = CONVERT(XML, '<row><col>'
+ REPLACE(REPLACE(@input_string, CHAR(13), '</col></row><row><col>'), CHAR(9), '</col><col>' )
+ '</col></row>');
INSERT @temptable
select
Ltrim(Rtrim(replace(replace(field1,char(10),''),char(9),''))) as field1, Ltrim(Rtrim(replace(replace(field2,char(10),''),char(9),''))) as field2
from (
SELECT nd.value('(col/text())[1]', 'varchar(200)') AS field1
,nd.value('(col/text())[2]', 'varchar(200)') AS field2
FROM @xml.nodes('row') AS x(nd)
WHERE NOT nd.value('(col/text())[1]', 'varchar(200)') IS NULL) as a;
RETURN;
END;
Function call in the stored proc: select * from dbo.FnXmlSplit (@input_string).
For some reason, this method when used inside a stored proc, it runs slow. Any idea why? Thanks!
mister.magoo (12/15/2012)
Here's a way that uses XML, which you could generate in the web application before passing the data to SQL, or even just convert to XML and pump straight into a SQL table all within the web app...Anyway, this is the SQL version:
-- set up some sample input data
declare @input varchar(max);
set @input = '30;38469|
31;38470|
32;38471|
33;38472|
34;38473|
35;38474|
36;38475|
37;38476|
38;38477|'
-- convert to xml
-- of course, this could be done in your web app and may be quicker.
declare @xml xml;
SET @xml =
CONVERT(xml,
'<row><col>'+
REPLACE(
REPLACE(
REPLACE(
REPLACE(
@input,
CHAR(13),
''
),
CHAR(10),
''
),
'|',
'</col></row><row><col>'
),
';',
'</col><col>'
)
+'</col></row>'
);
-- and select the values out as rows/columns
select
nd.value('(col/text())[1]','varchar(200)') as field1,
nd.value('(col/text())[2]','varchar(200)') as field2
from @xml.nodes('row') as x(nd)
where not nd.value('(col/text())[1]','varchar(200)') is null
Edited: My input is two columns of excel rows
December 16, 2012 at 10:13 pm
Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.
sample:
3038469
3138470
The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!
December 17, 2012 at 2:45 am
murthyvs (12/16/2012)
Hi - this a nice method. didnt know this before.I have created this function and using it in a stored proc..
Function call in the stored proc: select * from dbo.FnXmlSplit (@input_string).
For some reason, this method when used inside a stored proc, it runs slow. Any idea why? Thanks!
Edited: My input is two columns of excel rows
Yeah, it's not the fastest way to do it.
If you want speed, split the data at source or persist with dwain's patter split or Jeff's DelimitedSplit8K
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 17, 2012 at 1:32 pm
murthyvs (12/16/2012)
Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.sample:
3038469
3138470
The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!
The Delimittedsplit8k only counts up to E4's max which is 10,000. If you have more than 10,000 characters in the string you'll need to increase the size of the tally based CTE by creating an E8 which would cross 10,000 by 10,000 and handle
100,000,000 columns of data for searching for delimitters.
December 17, 2012 at 5:48 pm
mtassin (12/17/2012)
murthyvs (12/16/2012)
Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.sample:
3038469
3138470
The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!
The Delimittedsplit8k only counts up to E4's max which is 10,000. If you have more than 10,000 characters in the string you'll need to increase the size of the tally based CTE by creating an E8 which would cross 10,000 by 10,000 and handle
100,000,000 columns of data for searching for delimitters.
The same applies to the Tally tables generated within PatternSplitCM.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 18, 2012 at 10:29 am
I know Jeff Moden's splitter function has already been discussed. But this is how I would do it with no aggregates or case statements:
DECLARE
@InputString VARCHAR(8000)
,@Delimiter1 CHAR(1)
,@Delimiter2 CHAR(1)
SET @InputString = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
SET @Delimiter1 = '|'
SET @Delimiter2 = ';'
--to get rid of the extra '|' character at the end
SET @InputString = LEFT(@InputString,LEN(@InputString)-1)
SELECT
(SELECT Item FROM dbo.DelimitedSplit8K(r1.Item,@Delimiter2) WHERE ItemNumber = 1) AS Col1
,(SELECT Item FROM dbo.DelimitedSplit8K(r1.Item,@Delimiter2) WHERE ItemNumber = 2) AS Col2
FROM
(
SELECT
dsk1.Item
FROM
dbo.DelimitedSplit8K(@InputString,@Delimiter1) AS dsk1
) r1
Output:
Col1Col2
3038469
3138470
3238471
3338472
3438473
3538474
3638475
3738476
3838477
Viewing 15 posts - 16 through 30 (of 57 total)
You must be logged in to reply to this topic. Login to reply