Catch Data between symbols in SQL

  • hi Chris, it definitely runs like a Rocket. It would be really nice if you could explain the following code :

    SELECT df = LEFT(b.AllLevels, CHARINDEX('»',b.AllLevels,1)-1)

    UNION ALL

    SELECT df = SUBSTRING(b.AllLevels,N+1,

    ISNULL(NULLIF(CHARINDEX('»',b.AllLevels,n+1),0)-(n+1),8000))

    FROM dbo.Tally As a

    WHERE a.N < = DATALENGTH(b.AllLevels)

    AND SUBSTRING(b.AllLevels,N,1) = '»'

    It seems as if you are getting the first string before the first '»' using the Left function. Then you are splitting the rest using the Tally Table. It is almost as I was doing in my query.

    Then, why does your query take less time??....Could you please throw a little light on that. Would be really nice. I'm just learning, would love to know.

    PS: By the way, it was my first big attempt at Cross Apply. Haven't worked with it much. But seems

    like its very very handy.

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • vinu512 (2/7/2013)


    I worked it out. This is the query that would get you the results you want:

    --Creating Table

    Create Table Ex1

    (

    Id Int Identity(1,1),

    AllLevels Varchar(MAX)

    )

    --Inserting Sample Data

    Insert Into Ex1

    Select 'Analysis»Letters»Numbers»Detail'

    Union ALL

    Select 'Description»Letters»Numbers»Detail'

    Union ALL

    Select 'Planning»Letters»Symbols»Detail'

    --Query for your requirement

    Select Id,

    MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,

    MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,

    MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,

    MAX(Case When rn = 'Level4' Then df Else '' End) As Level4

    From

    (

    Select Id, AllLevels, a.df, 'Level' + Cast(ROW_NUMBER() Over(Partition By Id Order By Id) As Varchar) As rn From Ex1 As b

    Cross Apply

    (

    SELECT SUBSTRING('»' + b.AllLevels + '»',N+1,CHARINDEX('»','»' + b.AllLevels + '»',N+1)-N-1) As df

    FROM dbo.Tally As a

    WHERE a.N < LEN('»' + b.AllLevels + '»')

    AND SUBSTRING('»' + b.AllLevels + '»',N,1) = '»'

    ) a

    ) As p

    Group By Id

    But, it takes a 8 looooong seconds to get the result for this teeeny sample data. So, for tables with huge amounts of data this would definitely not be a good one.

    This means that your design is not good. Inserting Delimited strings and pivoting them is not what SQL Server is meant to do.

    You should rather do it with a reporting tool.

    This is what I could come up with...may be someone might come up with something better.

    8 seconds? What are you using? A netbook with an Atom processor?

    This runs pretty instantaneously.

    Create Table #Ex1

    (

    Id Int Identity(1,1),

    AllLevels Varchar(MAX)

    )

    --Inserting Sample Data

    Insert Into #Ex1

    Select 'Analysis»Letters»Numbers»Detail'

    Union ALL

    Select 'Description»Letters»Numbers»Detail'

    Union ALL

    Select 'Planning»Letters»Symbols»Detail'

    SELECT Id

    ,Col1=MAX(CASE ItemNumber WHEN 1 THEN Item END)

    ,Col2=MAX(CASE ItemNumber WHEN 2 THEN Item END)

    ,Col3=MAX(CASE ItemNumber WHEN 3 THEN Item END)

    ,Col4=MAX(CASE ItemNumber WHEN 4 THEN Item END)

    FROM #Ex1

    CROSS APPLY DelimitedSplit8K(AllLevels, '»')

    GROUP BY Id

    DROP TABLE #Ex1


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Sean Lange (2/7/2013)


    How about using the DelimitedSplit8k? This seems to be super fast to me.

    Select Id,

    MAX(Case When ItemNumber = 1 Then Item Else '' End) As Level1,

    MAX(Case When ItemNumber = 2 Then Item Else '' End) As Level2,

    MAX(Case When ItemNumber = 3 Then Item Else '' End) As Level3,

    MAX(Case When ItemNumber = 4 Then Item Else '' End) As Level4

    From

    (

    select *

    from Ex1

    cross apply dbo.DelimitedSplit8K(ex1.AllLevels, '»')

    ) As p

    Group By Id

    Aaack! Didn't see the second page of responses before my last post. Mine's pretty close to your Sean.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Sean Lange (2/7/2013)


    How about using the DelimitedSplit8k? This seems to be super fast to me.

    Select Id,

    MAX(Case When ItemNumber = 1 Then Item Else '' End) As Level1,

    MAX(Case When ItemNumber = 2 Then Item Else '' End) As Level2,

    MAX(Case When ItemNumber = 3 Then Item Else '' End) As Level3,

    MAX(Case When ItemNumber = 4 Then Item Else '' End) As Level4

    From

    (

    select *

    from Ex1

    cross apply dbo.DelimitedSplit8K(ex1.AllLevels, '»')

    ) As p

    Group By Id

    Cascaded CROSS APPLY would be faster for a four-element string. However, OP says

    davdam8 (2/6/2013)


    Thanks Guys.

    Lowell, yes there are more levels indeed.

    Regards

    so this is probably as fast as it's going to get. Lowell's function would be equally as fast converted to an iTVF (it's halfway there already) because it's functionally the same as DelimitedSplit8K.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • vinu512 (2/7/2013)


    hi Chris, it definitely runs like a Rocket. It would be really nice if you could explain the following code :

    SELECT df = LEFT(b.AllLevels, CHARINDEX('»',b.AllLevels,1)-1)

    UNION ALL

    SELECT df = SUBSTRING(b.AllLevels,N+1,

    ISNULL(NULLIF(CHARINDEX('»',b.AllLevels,n+1),0)-(n+1),8000))

    FROM dbo.Tally As a

    WHERE a.N < = DATALENGTH(b.AllLevels)

    AND SUBSTRING(b.AllLevels,N,1) = '»'

    It seems as if you are getting the first string before the first '»' using the Left function. Then you are splitting the rest using the Tally Table. It is almost as I was doing in my query.

    Then, why does your query take less time??....Could you please throw a little light on that. Would be really nice. I'm just learning, would love to know.

    PS: By the way, it was my first big attempt at Cross Apply. Haven't worked with it much. But seems

    like its very very handy.

    Hi Vinu

    Did you ever work through the DelimitedSplit8k test harness set up by Jeff Moden et al? Quite a few folks did, and they contributed a ton of experimental findings on one thread or another. One of the early findings was that prefixing and suffixing the string with delimiters in order to shape the first and last elements the same as the rest was an expensive operation (and tricky too - try resolving an empty first element). I also know from experimentation that using LEFT() to collect the first element is very cheap, though not quite as fast as the final DelimitedSplit8k code.

    If you plan to experiment with this, then beware of directly comparing execution times from plans. It doesn't work like that. You might compare a slight modification of DelimitedSplit8k with the original and find the mod coming out at 1% of the batch, DelimitedSplit8k coming out at 99%. Test both with a decent sample data set and the timings are completely different!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • dwain.c (2/8/2013)


    vinu512 (2/7/2013)


    ...

    8 seconds? What are you using? A netbook with an Atom processor?

    ............

    It must have been. That or a wristwatch. It resolved something like 100000 rows in 8 seconds on this pc.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • dwain.c (2/8/2013)


    vinu512 (2/7/2013)


    I worked it out. This is the query that would get you the results you want:

    --Creating Table

    Create Table Ex1

    (

    Id Int Identity(1,1),

    AllLevels Varchar(MAX)

    )

    --Inserting Sample Data

    Insert Into Ex1

    Select 'Analysis»Letters»Numbers»Detail'

    Union ALL

    Select 'Description»Letters»Numbers»Detail'

    Union ALL

    Select 'Planning»Letters»Symbols»Detail'

    --Query for your requirement

    Select Id,

    MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,

    MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,

    MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,

    MAX(Case When rn = 'Level4' Then df Else '' End) As Level4

    From

    (

    Select Id, AllLevels, a.df, 'Level' + Cast(ROW_NUMBER() Over(Partition By Id Order By Id) As Varchar) As rn From Ex1 As b

    Cross Apply

    (

    SELECT SUBSTRING('»' + b.AllLevels + '»',N+1,CHARINDEX('»','»' + b.AllLevels + '»',N+1)-N-1) As df

    FROM dbo.Tally As a

    WHERE a.N < LEN('»' + b.AllLevels + '»')

    AND SUBSTRING('»' + b.AllLevels + '»',N,1) = '»'

    ) a

    ) As p

    Group By Id

    But, it takes a 8 looooong seconds to get the result for this teeeny sample data. So, for tables with huge amounts of data this would definitely not be a good one.

    This means that your design is not good. Inserting Delimited strings and pivoting them is not what SQL Server is meant to do.

    You should rather do it with a reporting tool.

    This is what I could come up with...may be someone might come up with something better.

    8 seconds? What are you using? A netbook with an Atom processor?

    This runs pretty instantaneously.

    Create Table #Ex1

    (

    Id Int Identity(1,1),

    AllLevels Varchar(MAX)

    )

    --Inserting Sample Data

    Insert Into #Ex1

    Select 'Analysis»Letters»Numbers»Detail'

    Union ALL

    Select 'Description»Letters»Numbers»Detail'

    Union ALL

    Select 'Planning»Letters»Symbols»Detail'

    SELECT Id

    ,Col1=MAX(CASE ItemNumber WHEN 1 THEN Item END)

    ,Col2=MAX(CASE ItemNumber WHEN 2 THEN Item END)

    ,Col3=MAX(CASE ItemNumber WHEN 3 THEN Item END)

    ,Col4=MAX(CASE ItemNumber WHEN 4 THEN Item END)

    FROM #Ex1

    CROSS APPLY DelimitedSplit8K(AllLevels, '»')

    GROUP BY Id

    DROP TABLE #Ex1

    Yes, actually my system is quiet slow. Even though its a P4 2.66GHZ with 2 GB RAM.....but Hard Disk is very small. Only 40GB....so sometimes it just takes a long while to do things.....

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply