Split string of concatenated words

  • I wonder if anyone could help me with a problem I'm having.

    I have some data in a table that contains many joined words e.g. ThisIsASample

    What I would like to to is change it to read This Is A Sample

    I have about 5,000 records and don't want to do it manually but cannot figure out how to do it ๐Ÿ™

  • Tim-24860 (12/2/2010)


    I wonder if anyone could help me with a problem I'm having.

    I have some data in a table that contains many joined words e.g. ThisIsASample

    What I would like to to is change it to read This Is A Sample

    I have about 5,000 records and don't want to do it manually but cannot figure out how to do it ๐Ÿ™

    Tim, this could be anywhere between trivial and impossible, depending upon the distribution of the words. Can you provide more info? A sample would be awesome.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris, thanks for the reply. Here's a small sample

    QualificationsLength

    DepthMustBeNumeric

    HeightMustBeNumeric

    WidthMustBeNumeric

    DepthNotValid

    HeightNotValid

    QuantityIsMandatory

    QuantityNotValid

    WidthNotValid

    FactorNotValid

    DimensionNotNumeric

    DimensionNotValid

    StartDateEarlierThanCurrentDate

    CostCodeIsMandatory

    QuantityMustBeNumeric

    InvalidRatioNumeric

    MissingRatio

  • Try this, Tim:

    DROP TABLE #Sentences

    CREATE TABLE #Sentences (Sentence VARCHAR(100))

    INSERT INTO #Sentences (Sentence)

    SELECT 'QualificationsLength' UNION ALL

    SELECT 'DepthMustBeNumeric' UNION ALL

    SELECT 'HeightMustBeNumeric' UNION ALL

    SELECT 'WidthMustBeNumeric' UNION ALL

    SELECT 'DepthNotValid' UNION ALL

    SELECT 'HeightNotValid' UNION ALL

    SELECT 'QuantityIsMandatory' UNION ALL

    SELECT 'QuantityNotValid' UNION ALL

    SELECT 'WidthNotValid' UNION ALL

    SELECT 'FactorNotValid' UNION ALL

    SELECT 'DimensionNotNumeric' UNION ALL

    SELECT 'DimensionNotValid' UNION ALL

    SELECT 'StartDateEarlierThanCurrentDate' UNION ALL

    SELECT 'CostCodeIsMandatory' UNION ALL

    SELECT 'QuantityMustBeNumeric' UNION ALL

    SELECT 'InvalidRatioNumeric' UNION ALL

    SELECT 'MissingRatio'

    SELECT s.Sentence,

    NewSentence = LTRIM(STUFF(STUFF(STUFF(STUFF(STUFF(s.Sentence, S1, 0, ' '), S2, 0, ' '), S3, 0, ' '), S4, 0, ' '), S5, 0, ' '))

    FROM #Sentences s

    CROSS APPLY (

    SELECT

    s1 = ISNULL(MAX(CASE WHEN Seq = 1 THEN d.n END), 1),

    s2 = ISNULL(MAX(CASE WHEN Seq = 2 THEN d.n END), 1),

    s3 = ISNULL(MAX(CASE WHEN Seq = 3 THEN d.n END), 1),

    s4 = ISNULL(MAX(CASE WHEN Seq = 4 THEN d.n END), 1),

    s5 = ISNULL(MAX(CASE WHEN Seq = 5 THEN d.n END), 1)

    FROM (

    SELECT n.n, Seq = ROW_NUMBER() OVER (ORDER BY n DESC)

    FROM (SELECT TOP 100 n = ROW_NUMBER() OVER (ORDER BY Name) FROM master.dbo.syscolumns) n

    WHERE ASCII(SUBSTRING(s.Sentence, n.n, 1)) BETWEEN 65 AND 90) d

    ) Splitter

    Edit:

    -- Rough Stats:

    -- 1,419,874 rows of test data fixed to max six words

    -- all results returned to client in 27 seconds

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hello you an try the following split script

    create table Exceptions (ShortText nvarchar(100))

    insert into Exceptions values (

    'QualificationsLength'),(

    'DepthMustBeNumeric'),(

    'HeightMustBeNumeric'),(

    'WidthMustBeNumeric'),(

    'DepthNotValid'),(

    'HeightNotValid'),(

    'QuantityIsMandatory'),(

    'QuantityNotValid'),(

    'WidthNotValid'),(

    'FactorNotValid'),(

    'DimensionNotNumeric'),(

    'DimensionNotValid'),(

    'StartDateEarlierThanCurrentDate'),(

    'CostCodeIsMandatory'),(

    'QuantityMustBeNumeric'),(

    'InvalidRatioNumeric'),(

    'MissingRatio')

    select * from Exceptions

    cross apply dbo.CaseSensitiveSQLSplitFunction(Exceptions.ShortText)

    You can find the sql source of case sensitive sql split string function here

  • Thank you for your replies.

    Eralper - I would like the function to return one record not a record per split word.

  • Hello Tim,

    Sorry for misunderstanding

    Here is an alternative script

    CREATE FUNCTION udfSample

    (

    @STR nvarchar(400)

    )

    returns nvarchar(400)

    begin

    declare @i int, @j-2 int, @w nvarchar(400)

    select @i = 1, @j-2 = len(@str)

    while @i <= @j-2

    begin

    if substring(@str,@i,1) = UPPER(substring(@str,@i,1)) collate Latin1_General_CS_AS

    begin

    set @STR = STUFF(@str,@i,0,' ')

    select @i = @i + 1, @j-2 = len(@str)

    end

    set @i = @i + 1

    end

    return ltrim(@str)

    end

    GO

    select dbo.udfSample('StartDateEarlierThanCurrentDate')

    You could also use a sql string concatenation function to bring pieces back as a sentence too.

  • My attempt at this one.

    declare @v-2 varchar(50)='StartDateEarlierThanCurrentDate'

    declare @splitvalues varchar(50)=''

    ;with cte

    as

    (

    select row_number()over(order by (select 1))id,SUBSTRING(@v,number,1)num,number from master..spt_values

    where type='p' and number between 1 and LEN(@v)and SUBSTRING(@v,number,1)=UPPER(SUBSTRING(@v,number,1))

    collate Latin1_General_CS_AS_WS

    )

    select @splitvalues=@splitvalues + ' ' +

    substring(@v,c2.number,isnull(c1.number-c2.number,LEN(@v))) from cte c1 right join cte c2 on c1.id=c2.id+1

    select @splitvalues

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Here's another solution using a slightly modified version of Chis' test table data.

    Solution:

    ;WITH TrimmedSentences(SentenceID, Sentence) AS

    ( -- Trim preceding white space of all sentences

    SELECT SentenceID, LTRIM(Sentence) FROM #Sentences

    ),

    Tally(N) AS

    ( -- Generate a numbers table to split the sentences (max. 2048) into characters

    SELECT number FROM master.dbo.spt_values WHERE type = 'P'

    ),

    SplittedCharacters(SentenceID, CharNo, C) AS

    ( -- Generate a row for each character in a sentence including its position

    SELECT

    S.SentenceID, T.N, SUBSTRING(S.Sentence, T.N, 1)

    FROM

    TrimmedSentences S

    CROSS JOIN

    Tally T

    WHERE

    T.N BETWEEN 1 AND LEN(S.Sentence)

    )

    SELECT

    L.Sentence,

    -- Use a FOR XML PATH subquery to concatenate characters back into a single sentence

    (

    SELECT

    CASE

    -- Prefix uppercase characters with a space (except the first one)

    WHEN C.CharNo > 1 AND ASCII(C) BETWEEN 65 AND 90 THEN

    ' ' + C

    ELSE

    C

    END

    FROM

    SplittedCharacters C

    WHERE

    C.SentenceID = L.SentenceID

    ORDER BY

    CharNo

    FOR XML PATH(''), TYPE

    ).value('./text()[1]', 'varchar(100)')

    FROM

    TrimmedSentences L

    Test data:

    DROP TABLE #Sentences

    CREATE TABLE #Sentences (SentenceID INT IDENTITY PRIMARY KEY, Sentence VARCHAR(100))

    INSERT INTO #Sentences (Sentence)

    SELECT 'Qualifications&Length' UNION ALL

    SELECT ' DepthMustBeNumeric' UNION ALL

    SELECT 'HeightMustBeNumericAndNotAlphanumericOrDateTime' UNION ALL

    SELECT 'WidthMustBeNumeric' UNION ALL

    SELECT 'DepthNotValid' UNION ALL

    SELECT 'HeightNotValid' UNION ALL

    SELECT 'QuantityIsMandatory' UNION ALL

    SELECT 'QuantityNotValid' UNION ALL

    SELECT 'WidthNotValid' UNION ALL

    SELECT 'FactorNotValid' UNION ALL

    SELECT 'DimensionNotNumeric' UNION ALL

    SELECT 'DimensionNotValid' UNION ALL

    SELECT 'StartDateEarlierThanCurrentDate' UNION ALL

    SELECT 'CostCodeIsMandatory' UNION ALL

    SELECT 'QuantityMustBeNumeric' UNION ALL

    SELECT 'InvalidRatioNumeric' UNION ALL

    SELECT 'MissingRatio'

  • Here's a reasonably performant version. Assumes you have a Numbers table. If not, search for Numbers table and Tally table, and you'll find how to create one, and a ton of different uses for them.

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T;

    CREATE TABLE #T (

    ID INT IDENTITY PRIMARY KEY,

    String VARCHAR(100));

    INSERT INTO #T (String)

    SELECT 'QualificationsLength' FROM dbo.Numbers UNION ALL

    SELECT 'DepthMustBeNumeric' FROM dbo.Numbers UNION ALL

    SELECT 'HeightMustBeNumeric' FROM dbo.Numbers UNION ALL

    SELECT 'WidthMustBeNumeric' FROM dbo.Numbers UNION ALL

    SELECT 'DepthNotValid' FROM dbo.Numbers UNION ALL

    SELECT 'HeightNotValid' FROM dbo.Numbers UNION ALL

    SELECT 'QuantityIsMandatory' FROM dbo.Numbers UNION ALL

    SELECT 'QuantityNotValid' FROM dbo.Numbers UNION ALL

    SELECT 'WidthNotValid' FROM dbo.Numbers UNION ALL

    SELECT 'FactorNotValid' FROM dbo.Numbers UNION ALL

    SELECT 'DimensionNotNumeric' FROM dbo.Numbers UNION ALL

    SELECT 'DimensionNotValid' FROM dbo.Numbers UNION ALL

    SELECT 'StartDateEarlierThanCurrentDate' FROM dbo.Numbers UNION ALL

    SELECT 'CostCodeIsMandatory' FROM dbo.Numbers UNION ALL

    SELECT 'QuantityMustBeNumeric' FROM dbo.Numbers UNION ALL

    SELECT 'InvalidRatioNumeric' FROM dbo.Numbers UNION ALL

    SELECT 'MissingRatio ' FROM dbo.Numbers;

    SELECT

    (SELECT

    CASE WHEN ASCII(SUBSTRING(String, Number, 1)) BETWEEN 65 AND 90 AND Number > 1

    THEN ' ' + SUBSTRING(String, Number, 1)

    ELSE SUBSTRING(String, Number, 1)

    END

    FROM DBA.dbo.Numbers

    WHERE Number <= LEN(String)

    ORDER BY Number

    FOR XML PATH(''), TYPE).value('(./text())[1]','varchar(200)')

    FROM #T;

    Took an average of about 2 seconds to split a 170,000 row table on my desktop computer at work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Everybody - thank you very much for the great suggestions.

    Eralper - I went with your one because it suited my requirements the closest.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply