Replace uppercase words with bold

  • I need to replace in given string all uppercase strings longer than two characters with lowercase and set html bold tags around them. Any leads?

    For example convert this:

    Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna.

    into this:

    Lorem ipsum dolor sit amet, <b>consec</b>tetur adipi<b>sci</b>ng elit. PelleNTesque in <b>elit</b> ut ex accumsan sagittis in a urna.

  • I think it would be quite a complicated solution in T-SQL, so much so that you may consider using a different language, but if you want to do it, here is an outline:

    (1) Split the string[/url] into individual characters

    (2) Learn about gaps and islands[/url] to identify consecutive upper case characters. You'll need to use a case-sensitive collation

    (3) Add the HTML tags before and after each island of consecutive upper case characters

    (4) Re-concatenate[/url] into a single string

    John

  • dothedev (11/15/2016)


    I need to replace in given string all uppercase strings longer than two characters with lowercase and set html bold tags around them. Any leads?

    For example convert this:

    Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna.

    into this:

    Lorem ipsum dolor sit amet, <b>consec</b>tetur adipi<b>sci</b>ng elit. PelleNTesque in <b>elit</b> ut ex accumsan sagittis in a urna.

    Here is a quick solution which works for the English character set, must be tweaked for other languages.

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    DECLARE @SAMPLE_STRING VARCHAR(2000) = 'Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna.';

    DECLARE @DESIRED_OUTPUT VARCHAR(2000) = 'Lorem ipsum dolor sit amet, <b>consec</b>tetur adipi<b>sci</b>ng elit. PelleNTesque in <b>elit</b> ut ex accumsan sagittis in a urna.';

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))

    , NUMS(N) AS (SELECT TOP(LEN(@SAMPLE_STRING)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N FROM T T1,T T2,T T3,T T4)

    ,BASE_DATA AS

    (

    SELECT

    NM.N

    ,SUBSTRING(@SAMPLE_STRING,NM.N,1) AS CHR_VAL

    ,CASE

    WHEN ASCII(SUBSTRING(@SAMPLE_STRING,NM.N,1)) BETWEEN 65 AND 90 THEN 1

    ELSE 0

    END AS IS_CAPS

    FROM NUMS NM

    )

    ,CHANGE_IDENTIFICATION AS

    (

    SELECT

    BD.N

    ,BD.CHR_VAL

    ,BD.IS_CAPS

    ,CASE WHEN BD.IS_CAPS

    = LAG(BD.IS_CAPS,1,BD.IS_CAPS) OVER

    (

    ORDER BY BD.N

    ) THEN 0 ELSE 1 END AS IS_CHANGED

    FROM BASE_DATA BD

    )

    ,GROUPED_SET AS

    (

    SELECT

    CI.N

    ,CI.CHR_VAL

    ,CI.IS_CAPS

    ,SUM(CI.IS_CHANGED) OVER

    (

    ORDER BY CI.N

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW

    ) AS GRP_ID

    FROM CHANGE_IDENTIFICATION CI

    )

    ,GROUPS_AND_SIZES AS

    (

    SELECT

    GS.GRP_ID

    ,MAX(GS.IS_CAPS) AS IS_CAPS

    ,COUNT(GS.N) AS GRP_SIZE

    ,(SELECT

    '' + SGS.CHR_VAL

    FROM GROUPED_SET SGS

    WHERE GS.GRP_ID = SGS.GRP_ID

    FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(2000)') AS GR_STR

    FROM GROUPED_SET GS

    GROUP BY GS.GRP_ID

    )

    SELECT 'INPUT' AS STR_NAME, @SAMPLE_STRING AS OUTPUT_STRING UNION ALL

    SELECT 'OUTPUT' AS STR_NAME,

    (

    SELECT

    CASE

    WHEN GAS.IS_CAPS = 1 AND GAS.GRP_SIZE > 2 THEN '<b>' + LOWER(GAS.GR_STR) + '</b>'

    ELSE GAS.GR_STR

    END

    FROM GROUPS_AND_SIZES GAS

    FOR XML PATH(''),TYPE

    ).value('(./text())[1]','VARCHAR(2000)') AS OUTPUT_STR

    UNION ALL

    SELECT 'DESIRED' AS STR_NAME, @DESIRED_OUTPUT AS OUTPUT_STR;

    Output

    STR_NAME OUTPUT_STRING

    -------- --------------------------------------------------------------------------------------------------------------------------------------

    INPUT Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna.

    OUTPUT Lorem ipsum dolor sit amet, <b>consec</b>tetur adipi<b>sci</b>ng elit. PelleNTesque in <b>elit</b> ut ex accumsan sagittis in a urna.

    DESIRED Lorem ipsum dolor sit amet, <b>consec</b>tetur adipi<b>sci</b>ng elit. PelleNTesque in <b>elit</b> ut ex accumsan sagittis in a urna.

  • Wow, thanks! :w00t:

    Just needed to add some (finnish) characters in the IS_CAPS case and that's it!

    It's going to take a while to understand though 🙂

  • dothedev (11/15/2016)


    Wow, thanks! :w00t:

    Just needed to add some (finnish) characters in the IS_CAPS case and that's it!

    It's going to take a while to understand though 🙂

    You are very welcome.

    😎

    To extend the character range, add to the case statement in the BASE_DATA CTE

    SELECT

    NM.N

    ,SUBSTRING(@SAMPLE_STRING,NM.N,1) AS CHR_VAL

    ,CASE

    WHEN ASCII(SUBSTRING(@SAMPLE_STRING,NM.N,1)) BETWEEN 65 AND 90 THEN 1

    -- Add additional characters here

    ELSE 0

    END AS IS_CAPS

    FROM NUMS NM

  • Here is one more way:

    declare @Org varchar(150) = 'Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna'

    declare @Results varchar(200) = ''

    declare @ChIndex int

    --Getting the first occurence of more then 1 capital leter

    WHILE patindex('%[A-Z][A-Z]%', @Org) > 0

    BEGIN

    --Get the location of the first place with more then 1 capital letter

    select @ChIndex = patindex('%[A-Z][A-Z]%', @Org)

    --Adding to the results string the part that doesn't need to be modified

    set @Results = @Results + substring(@Org,1, @ChIndex - 1) + '

    '

    --Deleting the part that was added to the results from the original string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    --Getting the end location of the capital letters

    select @ChIndex = patindex('%[^A-Z]%',@Org)

    --Adding the needed part to the @results string

    SELECT @Results = @Results + lower(substring(@Org,1,@ChIndex - 1)) + '</BR>'

    --Deleting the part from the @Org string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    END

    SET @Results = @Results + @Org

    select @Results

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Adi Cohn-120898 (11/15/2016)


    Here is one more way:

    declare @Org varchar(150) = 'Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna'

    declare @Results varchar(200) = ''

    declare @ChIndex int

    --Getting the first occurence of more then 1 capital leter

    WHILE patindex('%[A-Z][A-Z]%', @Org) > 0

    BEGIN

    --Get the location of the first place with more then 1 capital letter

    select @ChIndex = patindex('%[A-Z][A-Z]%', @Org)

    --Adding to the results string the part that doesn't need to be modified

    set @Results = @Results + substring(@Org,1, @ChIndex - 1) + '

    '

    --Deleting the part that was added to the results from the original string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    --Getting the end location of the capital letters

    select @ChIndex = patindex('%[^A-Z]%',@Org)

    --Adding the needed part to the @results string

    SELECT @Results = @Results + lower(substring(@Org,1,@ChIndex - 1)) + '</BR>'

    --Deleting the part from the @Org string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    END

    SET @Results = @Results + @Org

    select @Results

    Adi

    Throws an error if the substring doesn't contain the pattern.

    😎

    Msg 537, Level 16, State 2, Line 23

    Invalid length parameter passed to the LEFT or SUBSTRING function.

    Further, the output doesn't seem to match the requirements

    lorem</BR>

    ipsum</BR>

    dolor</BR>

    sit</BR>

    amet</BR>,

    consectetur</BR>

    adipiscing</BR>

    elit</BR>.

    pellentesque</BR>

    in</BR>

    elit</BR>

    ut</BR>

    ex</BR>

    accumsan</BR>

    sagitti

  • Here's an option using a modified version of the pattern splitter explained in here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

    The only modification was to make it CASE sensitive.

    ALTER FUNCTION [dbo].[PatternSplitCM](@List [varchar](8000) = NULL, @Pattern [varchar](50))

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    WITH numbers AS (

    SELECT TOP(ISNULL(DATALENGTH(@List), 0))

    n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n))

    SELECT

    ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),

    Item = SUBSTRING(@List,MIN(n),1+MAX(n)-MIN(n)),

    [Matched]

    FROM (

    SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n)

    FROM numbers

    CROSS APPLY (

    SELECT [Matched] = CASE WHEN SUBSTRING(@List,n,1) LIKE @Pattern COLLATE Latin1_General_Bin THEN 1 ELSE 0 END

    ) y

    ) d

    GROUP BY [Matched], Grouper;

    Then the solution becomes a bit easier.

    DECLARE @String varchar(1000);

    SET @String = 'Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna. '

    SELECT ( SELECT CASE WHEN ItemNumber > 1 AND Matched = 1

    THEN '<b>' + LOWER(Item) + '</b>'

    ELSE Item END

    FROM dbo.PatternSplitCM(@String, '%[A-Z]%' )x

    FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)')

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I've done few mistakes. Should have checked it better. The first mistake was that for some reason without noticing it, I've done a small modification in my script. Instead of adding the BR as requested, I added carriage return + new line (enter key). This is the reason that in my script you got few lines instead of one. The second mistake was that I checked my script on a server that is configured as case sensitive and forgot to test it on a server that is not case sensitive. I've modified the script so it will work on none case sensitive servers as well. Sorry for the mistakes, I've should have checked it better.

    declare @Org varchar(150) = 'Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna' COLLATE Latin1_General_BIN

    declare @Results varchar(200) = '' COLLATE Latin1_General_BIN

    declare @ChIndex int

    --Getting the first occurence of more then 1 capital leter

    WHILE patindex('%[A-Z][A-Z]%' COLLATE Latin1_General_BIN, @Org) > 0

    BEGIN

    --Get the location of the first place with more then 1 capital letter

    select @ChIndex = patindex('%[A-Z][A-Z]%' COLLATE Latin1_General_BIN, @Org)

    --Adding to the results string the part that doesn't need to be modified

    set @Results = @Results + substring(@Org,1, @ChIndex - 1) + '

    '

    --Deleting the part that was added to the results from the original string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    --Getting the end location of the capital letters

    select @ChIndex = patindex('%[^A-Z]%' COLLATE Latin1_General_BIN,@Org)

    IF @ChIndex = 0

    set @ChIndex = len(@Org)+1

    --Adding the needed part to the @results string

    SELECT @Results = @Results + lower(substring(@Org,1,@ChIndex - 1)) + '</BR>'

    --Deleting the part from the @Org string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    END

    SET @Results = @Results + @Org

    select @Results

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Adi Cohn-120898 (11/15/2016)


    I've done few mistakes. Should have checked it better. The first mistake was that for some reason without noticing it, I've done a small modification in my script. Instead of adding the BR as requested, I added carriage return + new line (enter key). This is the reason that in my script you got few lines instead of one. The second mistake was that I checked my script on a server that is configured as case sensitive and forgot to test it on a server that is not case sensitive. I've modified the script so it will work on none case sensitive servers as well. Sorry for the mistakes, I've should have checked it better.

    declare @Org varchar(150) = 'Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna' COLLATE Latin1_General_BIN

    declare @Results varchar(200) = '' COLLATE Latin1_General_BIN

    declare @ChIndex int

    --Getting the first occurence of more then 1 capital leter

    WHILE patindex('%[A-Z][A-Z]%' COLLATE Latin1_General_BIN, @Org) > 0

    BEGIN

    --Get the location of the first place with more then 1 capital letter

    select @ChIndex = patindex('%[A-Z][A-Z]%' COLLATE Latin1_General_BIN, @Org)

    --Adding to the results string the part that doesn't need to be modified

    set @Results = @Results + substring(@Org,1, @ChIndex - 1) + '

    '

    --Deleting the part that was added to the results from the original string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    --Getting the end location of the capital letters

    select @ChIndex = patindex('%[^A-Z]%' COLLATE Latin1_General_BIN,@Org)

    IF @ChIndex = 0

    set @ChIndex = len(@Org)+1

    --Adding the needed part to the @results string

    SELECT @Results = @Results + lower(substring(@Org,1,@ChIndex - 1)) + '</BR>'

    --Deleting the part from the @Org string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    END

    SET @Results = @Results + @Org

    select @Results

    Adi

    You still got some work to do, the requirements are to wrap the upper case portions in <b></b>, not

    . Further, it still adds new line and hence is returning multiple lines.

    😎

  • Luis Cazares (11/15/2016)


    Here's an option using a modified version of the pattern splitter explained in here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

    The only modification was to make it CASE sensitive.

    Nice one Luis!

    😎

  • dothedev (11/15/2016)


    Wow, thanks! :w00t:

    Just needed to add some (finnish) characters in the IS_CAPS case and that's it!

    It's going to take a while to understand though 🙂

    Can you list the characters so the guys can add it to they're solutions?

    😎

  • Eirikur Eiriksson (11/15/2016)


    Luis Cazares (11/15/2016)


    Here's an option using a modified version of the pattern splitter explained in here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

    The only modification was to make it CASE sensitive.

    Nice one Luis!

    😎

    Thank you, but the credit should go to Chris Morris and Dwain Camps for creating the function and sharing it with everyone.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I read it as BR and have to modify it to B, but this is something that the original poster can modify. I also think that there is a bug that modified my code. It still shows that I add carriage return and new line instead of

    . This time I've made sure in my code that I don't add the enter key. In any case I changed it to <B> and hope that it won't change it again.

    declare @Org varchar(150) = 'Lorem ipsum dolor sit amet, CONSECtetur adipiSCIng elit. PelleNTesque in ELIT ut ex accumsan sagittis in a urna' COLLATE Latin1_General_BIN

    declare @Results varchar(200) = '' COLLATE Latin1_General_BIN

    declare @ChIndex int

    --Getting the first occurence of more then 1 capital leter

    WHILE patindex('%[A-Z][A-Z]%' COLLATE Latin1_General_BIN, @Org) > 0

    BEGIN

    --Get the location of the first place with more then 1 capital letter

    select @ChIndex = patindex('%[A-Z][A-Z]%' COLLATE Latin1_General_BIN, @Org)

    --Adding to the results string the part that doesn't need to be modified

    set @Results = @Results + substring(@Org,1, @ChIndex - 1) + '<B>'

    --Deleting the part that was added to the results from the original string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    --Getting the end location of the capital letters

    select @ChIndex = patindex('%[^A-Z]%' COLLATE Latin1_General_BIN,@Org)

    IF @ChIndex = 0

    set @ChIndex = len(@Org)+1

    --Adding the needed part to the @results string

    SELECT @Results = @Results + lower(substring(@Org,1,@ChIndex - 1)) + '</B>'

    --Deleting the part from the @Org string

    set @Org = substring(@Org,@ChIndex,len(@Org))

    END

    SET @Results = @Results + @Org

    select @Results

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The characters I needed to add are: Å,Ä,Ö / char(196),char(197),char(214)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply