Split string in two or three columns based on character count

  • Hi All

    I have a table that contains one column called Symbol. The symbol consists of two or three parts. They are ticker, indicator and dimension or in some cases only ticker and indicator. They are split by an underscore. I have written a script to split these parts into different columns and it works fine for the symbol values that contain two underscore but not for the one that contains 1 underscore as it places the indicator value in the dimension column. I was hoping somebody could help me with fixing this. So for example the symbol value of AAPL_DEBT_MRQ gets split between Ticker Indicator and Dimension as follow AAPL, DEBT,MRQ. This is how it should be. But for AAPL_DIVYIELD it gets split between Ticker and Dimension putting no value in indicator which is where it actually should be.

    Below is my code: Sorry I did try to format this but couldn't work out how.

    SELECT TOP 100 [Symbol],

    LEFT([Symbol], CHARINDEX('_', [Symbol])-1) AS [Ticker],

    LEFT(SUBSTRING([Symbol], CHARINDEX('', [Symbol])+1, LEN([Symbol])), CHARINDEX('', SUBSTRING([Symbol], CHARINDEX('_', [Symbol])+2, LEN([Symbol])))) AS Indicators,

    RIGHT([Symbol], (CHARINDEX('_', REVERSE([Symbol]), 0))-1) AS Dimension

    FROM [dbo].[STG_Quandl_Core_US_Fundamentals]

    WHERE LEFT([Symbol], CHARINDEX('_', [Symbol])-1) = 'AAPL'

    ORDER BY LEFT(SUBSTRING([Symbol], CHARINDEX('', [Symbol])+1, LEN([Symbol])), CHARINDEX('', SUBSTRING([Symbol], CHARINDEX('_', [Symbol])+2, LEN([Symbol])))) ASC,

    Dimension ASC;

    Thanks for your help!

  • oaucamp (9/9/2016)


    Hi All

    I have a table that contains one column called Symbol. The symbol consists of two or three parts. They are ticker, indicator and dimension or in some cases only ticker and indicator. They are split by an underscore. I have written a script to split these parts into different columns and it works fine for the symbol values that contain two underscore but not for the one that contains 1 underscore as it places the indicator value in the dimension column. I was hoping somebody could help me with fixing this. So for example the symbol value of AAPL_DEBT_MRQ gets split between Ticker Indicator and Dimension as follow AAPL, DEBT,MRQ. This is how it should be. But for AAPL_DIVYIELD it gets split between Ticker and Dimension putting no value in indicator which is where it actually should be.

    Below is my code: Sorry I did try to format this but couldn't work out how.

    SELECT TOP 100 [Symbol],

    LEFT([Symbol], CHARINDEX('_', [Symbol])-1) AS [Ticker],

    LEFT(SUBSTRING([Symbol], CHARINDEX('', [Symbol])+1, LEN([Symbol])), CHARINDEX('', SUBSTRING([Symbol], CHARINDEX('_', [Symbol])+2, LEN([Symbol])))) AS Indicators,

    RIGHT([Symbol], (CHARINDEX('_', REVERSE([Symbol]), 0))-1) AS Dimension

    FROM [dbo].[STG_Quandl_Core_US_Fundamentals]

    WHERE LEFT([Symbol], CHARINDEX('_', [Symbol])-1) = 'AAPL'

    ORDER BY LEFT(SUBSTRING([Symbol], CHARINDEX('', [Symbol])+1, LEN([Symbol])), CHARINDEX('', SUBSTRING([Symbol], CHARINDEX('_', [Symbol])+2, LEN([Symbol])))) ASC,

    Dimension ASC;

    Thanks for your help!

    The cleanest code you might get (IMO) would be by using the DelimitedSplit8k and then pivot your data.

    Here's an example:

    SELECT Symbol,

    MAX( CASE WHEN split.ItemNumber = 1 THEN split.Item ELSE '' END) AS Ticker,

    MAX( CASE WHEN split.ItemNumber = 2 THEN split.Item ELSE '' END) AS Indicators,

    MAX( CASE WHEN split.ItemNumber = 3 THEN split.Item ELSE '' END) AS Dimension

    FROM (VALUES('AAPL_DEBT_MRQ'), ('AAPL_DIVYIELD'))x(Symbol)

    CROSS APPLY dbo.DelimitedSplit8K( Symbol, '_') split

    GROUP BY Symbol;

    You can get the DelimitedSplit8k from here: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    You can use the IFCode tags on the left of the textbox to maintain formatted code.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Here's an approach that builds off of your original query. You just need to make sure that all of the records match the pattern that you're expecting, so it adds an underscore to 'AAPL_DIVYIELD' to produce 'AAPL_DIVYIELD_' (the CROSS APPLY) and then uses that value in the calculations.

    SELECT v.Symbol

    , LEFT(v.Symbol, CHARINDEX('_', v.Symbol)-1) AS [Ticker],

    LEFT(SUBSTRING(v.Symbol, CHARINDEX('_', v.Symbol)+1, LEN(v.Symbol)), CHARINDEX('_', SUBSTRING(v.Symbol, CHARINDEX('_', v.Symbol)+2, LEN(v.Symbol)))) AS Indicators,

    RIGHT(v.Symbol, (CHARINDEX('_', REVERSE(v.Symbol), 0))-1) AS Dimension

    FROM (

    VALUES('AAPL_DIVYIELD'), ('AAPL_DEBT_MRQ')

    ) orig(Symbol)

    CROSS APPLY (

    VALUES( orig.Symbol + CASE WHEN orig.Symbol NOT LIKE '%[_]%[_]%' THEN '_' ELSE '' END )

    ) v(Symbol)

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Another option.

    Note that I'm including 2 additional possibilities to show a problem. The other 2 solutions handle it differently, but not necessarily correctly.

    SELECT Symbol,

    LEFT(Symbol, CHARINDEX('_', Symbol + '_')-1) AS [Ticker],

    SUBSTRING( Symbol, CHARINDEX('_', Symbol + '_')+1, LEN(Symbol) - CHARINDEX('_', Symbol, CHARINDEX('_', Symbol + '_')+1) + 1) AS Indicators,

    SUBSTRING( Symbol, CHARINDEX('_', Symbol + '__', CHARINDEX('_', Symbol + '_')+1) + 1, LEN(Symbol)) AS Dimension

    FROM (

    VALUES('AAPL_DIVYIELD'), ('AAPL_DEBT_MRQ'), ('AAPL'), ('AAPL_DEBT_MRQ_PLUS'))orig(Symbol);

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks for the help! The DelimitedSplit8K is so fast!

  • Thanks Drew. There is only about 8 of them that shows the same pattern so I could just apply this logic to all of them. Thanks!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply