Catch Data between symbols in SQL

Question

Post reply

Catch Data between symbols in SQL

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply

vinu512 SSCoach Points: 15729 More actions · Answer 1

hi Chris, it definitely runs like a Rocket. It would be really nice if you could explain the following code :

SELECT df = LEFT(b.AllLevels, CHARINDEX('»',b.AllLevels,1)-1)

UNION ALL

SELECT df = SUBSTRING(b.AllLevels,N+1,

ISNULL(NULLIF(CHARINDEX('»',b.AllLevels,n+1),0)-(n+1),8000))

FROM dbo.Tally As a

WHERE a.N < = DATALENGTH(b.AllLevels)

AND SUBSTRING(b.AllLevels,N,1) = '»'

It seems as if you are getting the first string before the first '»' using the Left function. Then you are splitting the rest using the Tally Table. It is almost as I was doing in my query.

Then, why does your query take less time??....Could you please throw a little light on that. Would be really nice. I'm just learning, would love to know.

PS: By the way, it was my first big attempt at Cross Apply. Haven't worked with it much. But seems

like its very very handy.

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

Dwain Camps SSC Guru Points: 86978 More actions · Answer 2

vinu512 (2/7/2013)
I worked it out. This is the query that would get you the results you want:
--Creating Table
Create Table Ex1
(
Id Int Identity(1,1),
AllLevels Varchar(MAX)
)
--Inserting Sample Data
Insert Into Ex1
Select 'Analysis»Letters»Numbers»Detail'
Union ALL
Select 'Description»Letters»Numbers»Detail'
Union ALL
Select 'Planning»Letters»Symbols»Detail'
--Query for your requirement
Select Id,
MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,
MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,
MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,
MAX(Case When rn = 'Level4' Then df Else '' End) As Level4
From
(
Select Id, AllLevels, a.df, 'Level' + Cast(ROW_NUMBER() Over(Partition By Id Order By Id) As Varchar) As rn From Ex1 As b
Cross Apply
(
SELECT SUBSTRING('»' + b.AllLevels + '»',N+1,CHARINDEX('»','»' + b.AllLevels + '»',N+1)-N-1) As df
FROM dbo.Tally As a
WHERE a.N < LEN('»' + b.AllLevels + '»')
AND SUBSTRING('»' + b.AllLevels + '»',N,1) = '»'
) a
) As p
Group By Id

But, it takes a 8 looooong seconds to get the result for this teeeny sample data. So, for tables with huge amounts of data this would definitely not be a good one.
This means that your design is not good. Inserting Delimited strings and pivoting them is not what SQL Server is meant to do.
You should rather do it with a reporting tool.
This is what I could come up with...may be someone might come up with something better.

8 seconds? What are you using? A netbook with an Atom processor?

This runs pretty instantaneously.

Create Table #Ex1

(

Id Int Identity(1,1),

AllLevels Varchar(MAX)

)

--Inserting Sample Data

Insert Into #Ex1

Select 'Analysis»Letters»Numbers»Detail'

Union ALL

Select 'Description»Letters»Numbers»Detail'

Union ALL

Select 'Planning»Letters»Symbols»Detail'

SELECT Id

,Col1=MAX(CASE ItemNumber WHEN 1 THEN Item END)

,Col2=MAX(CASE ItemNumber WHEN 2 THEN Item END)

,Col3=MAX(CASE ItemNumber WHEN 3 THEN Item END)

,Col4=MAX(CASE ItemNumber WHEN 4 THEN Item END)

FROM #Ex1

CROSS APPLY DelimitedSplit8K(AllLevels, '»')

GROUP BY Id

DROP TABLE #Ex1

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 3

Sean Lange (2/7/2013)
How about using the DelimitedSplit8k? This seems to be super fast to me.
Select Id,
MAX(Case When ItemNumber = 1 Then Item Else '' End) As Level1,
MAX(Case When ItemNumber = 2 Then Item Else '' End) As Level2,
MAX(Case When ItemNumber = 3 Then Item Else '' End) As Level3,
MAX(Case When ItemNumber = 4 Then Item Else '' End) As Level4
From
(
select *
from Ex1
cross apply dbo.DelimitedSplit8K(ex1.AllLevels, '»')
) As p
Group By Id

Aaack! Didn't see the second page of responses before my last post. Mine's pretty close to your Sean.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

Sean Lange (2/7/2013)
How about using the DelimitedSplit8k? This seems to be super fast to me.
Select Id,
MAX(Case When ItemNumber = 1 Then Item Else '' End) As Level1,
MAX(Case When ItemNumber = 2 Then Item Else '' End) As Level2,
MAX(Case When ItemNumber = 3 Then Item Else '' End) As Level3,
MAX(Case When ItemNumber = 4 Then Item Else '' End) As Level4
From
(
select *
from Ex1
cross apply dbo.DelimitedSplit8K(ex1.AllLevels, '»')
) As p
Group By Id

Cascaded CROSS APPLY would be faster for a four-element string. However, OP says

davdam8 (2/6/2013)
Thanks Guys.
Lowell, yes there are more levels indeed.
Regards

so this is probably as fast as it's going to get. Lowell's function would be equally as fast converted to an iTVF (it's halfway there already) because it's functionally the same as DelimitedSplit8K.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

vinu512 (2/7/2013)
hi Chris, it definitely runs like a Rocket. It would be really nice if you could explain the following code :
SELECT df = LEFT(b.AllLevels, CHARINDEX('»',b.AllLevels,1)-1)
UNION ALL
SELECT df = SUBSTRING(b.AllLevels,N+1,
ISNULL(NULLIF(CHARINDEX('»',b.AllLevels,n+1),0)-(n+1),8000))
FROM dbo.Tally As a
WHERE a.N < = DATALENGTH(b.AllLevels)
AND SUBSTRING(b.AllLevels,N,1) = '»'

It seems as if you are getting the first string before the first '»' using the Left function. Then you are splitting the rest using the Tally Table. It is almost as I was doing in my query.
Then, why does your query take less time??....Could you please throw a little light on that. Would be really nice. I'm just learning, would love to know.
PS: By the way, it was my first big attempt at Cross Apply. Haven't worked with it much. But seems
like its very very handy.

Hi Vinu

Did you ever work through the DelimitedSplit8k test harness set up by Jeff Moden et al? Quite a few folks did, and they contributed a ton of experimental findings on one thread or another. One of the early findings was that prefixing and suffixing the string with delimiters in order to shape the first and last elements the same as the rest was an expensive operation (and tricky too - try resolving an empty first element). I also know from experimentation that using LEFT() to collect the first element is very cheap, though not quite as fast as the final DelimitedSplit8k code.

If you plan to experiment with this, then beware of directly comparing execution times from plans. It doesn't work like that. You might compare a slight modification of DelimitedSplit8k with the original and find the mod coming out at 1% of the batch, DelimitedSplit8k coming out at 99%. Test both with a decent sample data set and the timings are completely different!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

dwain.c (2/8/2013)

vinu512 (2/7/2013)
...
8 seconds? What are you using? A netbook with an Atom processor?
............

It must have been. That or a wristwatch. It resolved something like 100000 rows in 8 seconds on this pc.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

vinu512 SSCoach Points: 15729 More actions · Answer 7

dwain.c (2/8/2013)
vinu512 (2/7/2013)
I worked it out. This is the query that would get you the results you want:
--Creating Table
Create Table Ex1
(
Id Int Identity(1,1),
AllLevels Varchar(MAX)
)
--Inserting Sample Data
Insert Into Ex1
Select 'Analysis»Letters»Numbers»Detail'
Union ALL
Select 'Description»Letters»Numbers»Detail'
Union ALL
Select 'Planning»Letters»Symbols»Detail'
--Query for your requirement
Select Id,
MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,
MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,
MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,
MAX(Case When rn = 'Level4' Then df Else '' End) As Level4
From
(
Select Id, AllLevels, a.df, 'Level' + Cast(ROW_NUMBER() Over(Partition By Id Order By Id) As Varchar) As rn From Ex1 As b
Cross Apply
(
SELECT SUBSTRING('»' + b.AllLevels + '»',N+1,CHARINDEX('»','»' + b.AllLevels + '»',N+1)-N-1) As df
FROM dbo.Tally As a
WHERE a.N < LEN('»' + b.AllLevels + '»')
AND SUBSTRING('»' + b.AllLevels + '»',N,1) = '»'
) a
) As p
Group By Id

But, it takes a 8 looooong seconds to get the result for this teeeny sample data. So, for tables with huge amounts of data this would definitely not be a good one.
This means that your design is not good. Inserting Delimited strings and pivoting them is not what SQL Server is meant to do.
You should rather do it with a reporting tool.
This is what I could come up with...may be someone might come up with something better.
8 seconds? What are you using? A netbook with an Atom processor?
This runs pretty instantaneously.
Create Table #Ex1
(
Id Int Identity(1,1),
AllLevels Varchar(MAX)
)
--Inserting Sample Data
Insert Into #Ex1
Select 'Analysis»Letters»Numbers»Detail'
Union ALL
Select 'Description»Letters»Numbers»Detail'
Union ALL
Select 'Planning»Letters»Symbols»Detail'
SELECT Id
,Col1=MAX(CASE ItemNumber WHEN 1 THEN Item END)
,Col2=MAX(CASE ItemNumber WHEN 2 THEN Item END)
,Col3=MAX(CASE ItemNumber WHEN 3 THEN Item END)
,Col4=MAX(CASE ItemNumber WHEN 4 THEN Item END)
FROM #Ex1
CROSS APPLY DelimitedSplit8K(AllLevels, '»')
GROUP BY Id
DROP TABLE #Ex1

Yes, actually my system is quiet slow. Even though its a P4 2.66GHZ with 2 GB RAM.....but Hard Disk is very small. Only 40GB....so sometimes it just takes a long while to do things.....

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉