ProperCase Function

Question

Post reply

ProperCase Function

Dohsan

SSCrazy

Points: 2972
More actions
January 16, 2014 at 8:42 am

#302855

I'm having an issue converting a scalar function into Table-Valued function and was hoping some of the minds on this forum would be able to help.
Here is the original Function:
CREATE FUNCTION [Utility].[f_ProperCase]
(
@strIn VARCHAR(255)
)
RETURNS VARCHAR(255)
AS
BEGIN
IF @strIn IS NULL
RETURN NULL
DECLARE
@strOut VARCHAR(255),
@i INT,
@Up BIT,
@C VARCHAR(2)
SELECT
@strOut = '',
@i = 0,
@Up = 1
WHILE @i <= DATALENGTH(@strIn)
BEGIN
SET @C = SUBSTRING(@strIn,@i,1)
IF @C IN (' ','-','''')
BEGIN
SET @strOut = @strOut + @C
SET @Up = 1
END
ELSE
BEGIN
IF @up = 1
SET @C = UPPER(@c)
ELSE
SET @C = LOWER(@c)
SET @strOut = @strOut + @C
SET @Up = 0
END
SET @i = @i + 1
END
RETURN @strOut
END
Here is my attempt at a Table Valued Function to perform the same task:
CREATE FUNCTION [dbo].[ProperCase]
(
@StrIn VARCHAR(255)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
SELECTProperCaseString = (
SELECTCASE
WHEN GN.N = 1 THEN UPPER(SUBSTRING(@StrIn,GN.N,1))
WHEN SUBSTRING(@StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(@StrIn,GN.N,1))
ELSE LOWER(SUBSTRING(@StrIn,GN.N,1))
END AS [text()]
FROMdbo.GetNums(1,ISNULL(LEN(@StrIn),1)) AS GN
FOR XML PATH('')
);
GO
Part of the issue is that the XML concatenation is converting the blanks spaces into "& # x 2 0;" . I think I've found a way to correct this and get the output i would like (Shown in the below code):
DECLARE @StringTest TABLE
(
StrIn VARCHAR(255)
)
INSERT INTO @StringTest(StrIn)
VALUES ('thIS iS A StRing'),
('ANOTHEr STRING'),
('again another string')
--Converts special characters
SELECTCA1.ProperCase AS ProperCase
FROM@StringTest AS ST
CROSS APPLY (
SELECTCASE
WHEN GN.N = 1 THEN UPPER(SUBSTRING(ST.StrIn,GN.N,1))
WHEN SUBSTRING(ST.StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(ST.StrIn,GN.N,1))
WHEN SUBSTRING(ST.StrIn,GN.N,1) = ' ' THEN CHAR(32)
ELSE LOWER(SUBSTRING(ST.StrIn,GN.N,1))
END AS [text()]
FROMdbo.GetNums(1,ISNULL(LEN(ST.StrIn),1)) AS GN
FOR XML PATH('')
) AS CA1 (ProperCase)
--Shows special characters i.e. space as a space
SELECTCA1.ProperCase.value('/MyString[1]','varchar(8000)') AS ProperCase
FROM@StringTest AS ST
CROSS APPLY (
SELECTCASE
WHEN GN.N = 1 THEN UPPER(SUBSTRING(ST.StrIn,GN.N,1))
WHEN SUBSTRING(ST.StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(ST.StrIn,GN.N,1))
WHEN SUBSTRING(ST.StrIn,GN.N,1) = ' ' THEN CHAR(32)
ELSE LOWER(SUBSTRING(ST.StrIn,GN.N,1))
END AS [text()]
FROMdbo.GetNums(1,ISNULL(LEN(ST.StrIn),1)) AS GN
FOR XML PATH(''), root('MyString'), type
) AS CA1 (ProperCase)
The main part of the issue though, and the long winded point of this post, is that i cannot seem to reflect this in a function
CREATE FUNCTION [dbo].[ProperCase]
(
@StrIn VARCHAR(255)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
SELECTProperCaseString = (
SELECT SQ1.[text()].value('/MyString[1]','varchar(8000)')
FROM
(SELECTCASE
WHEN GN.N = 1 THEN UPPER(SUBSTRING(@StrIn,GN.N,1))
WHEN SUBSTRING(@StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(@StrIn,GN.N,1))
WHEN SUBSTRING(@StrIn,GN.N,1) = ' ' THEN ' '
ELSE LOWER(SUBSTRING(@StrIn,GN.N,1))
END AS [text()]
FROMdbo.GetNums(1,ISNULL(LEN(@StrIn),1)) AS GN
FOR XML PATH(''), root('MyString'), type
) AS SQ1
);
GO
I get the follow error when I try to create the function:
Msg 8155, Level 16, State 2, Procedure ProperCase, Line 24
No column name was specified for column 1 of 'SQ1'.
Msg 207, Level 16, State 1, Procedure ProperCase, Line 13
Invalid column name 'text()'.
Any suggestions/Improvements welcome!
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Dohsan SSCrazy Points: 2972 More actions · Answer 1

Think I may have figured it out:

CREATE FUNCTION [dbo].[ProperCase]

(

@StrIn VARCHAR(255)

)

RETURNS TABLE WITH SCHEMABINDING

AS

RETURN

SELECTProperCaseString = (

(SELECTCASE

WHEN GN.N = 1 THEN UPPER(SUBSTRING(@StrIn,GN.N,1))

WHEN SUBSTRING(@StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(@StrIn,GN.N,1))

ELSE LOWER(SUBSTRING(@StrIn,GN.N,1))

END AS [text()]

FROMdbo.GetNums(1,ISNULL(LEN(@StrIn),1)) AS GN

FOR XML PATH(''), TYPE

).value('.','varchar(8000)')

);

Now to see if it actually is an improvement!

And if anyone has anything to add please do.

Ed Wagner SSC Guru Points: 287026 More actions · Answer 2

You have definitely peaked my interest with this one. I've been playing with adapting Jeff Moden's DelimitedSplit8K function covered in the article at http://www.sqlservercentral.com/articles/Tally+Table/72993, but I don't have your solution yet. I'm running into the case where multiple trailing spaces is killing it. Then there's the matter of allowing for more than one delimiter at once. :w00t: I don't know if I'm running down a rabbit hole with no hope here, but it is definitely cool stuff. 😀

I'm curious as to the performance when using the XML approach versus the tally table approach, so please post your performance testing results.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Hi,

I'm not sure what your GetNums function looks like. I used the DelimitedSplit8K which you can find in the following article which explains how it works. http://www.sqlservercentral.com/articles/Tally+Table/72993

Here's my option for you that seems to be doing what you need.

CREATE FUNCTION [dbo].[ProperCase]

(

@StrIn VARCHAR(255)

)

RETURNS TABLE WITH SCHEMABINDING

AS

RETURN

SELECT STUFF((SELECT ' ' + UPPER( LEFT( split.Item, 1)) + LOWER(SUBSTRING( split.Item, 2, 255)) word

FROM dbo.DelimitedSplit8K( @StrIn, ' ') split

FOR XML PATH(''),TYPE).value('.', 'varchar(255)'), 1, 1, '') AS ProperCased

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sean Lange SSC Guru Points: 286590 More actions · Answer 4

Can you post GetNums? There have been a few different proper case functions around here over the years. They are always a challenge because sometime you want every word capitalized and sometimes you want Title Case.

Parade Of The Horribles

Parade of the Horribles

Subtle difference but makes a huge difference.

What about things like Washington D.C.?

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Dohsan SSCrazy Points: 2972 More actions · Answer 5

dbo.GetNums is as per below, straight from one of Jeff's discussions/articles I believe:

CREATE FUNCTION [dbo].[GetNums]

(

@low AS BIGINT,

@high AS BIGINT

)

RETURNS TABLE WITH SCHEMABINDING

AS

RETURN

WITH

L0 AS (SELECT c FROM (VALUES(1),(1)) AS D(c)),

L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum

FROM L5)

SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n

FROM Nums

ORDER BY rownum;

For our purposes capitalising each Letter of a word is acceptable.

Dohsan SSCrazy Points: 2972 More actions · Answer 6

Trying to piece together some test data, some Uppercase letters with some spaces inserted:

IF OBJECT_ID('tempdb..#ProperTest','U') IS NOT NULL

DROP TABLE #ProperTest;

SELECTSTUFF(STUFF(CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),N%16+1,1,' '),N%3+1,1,' ') AS String

INTO #ProperTest

FROMdbo.GetNums(1,300000);

Run the Code

PRINT 'Old Method Scalar Function'

SELECTPT.String,

Utility.f_ProperCase(PT.String) AS Proper

FROM#ProperTest AS PT;

PRINT 'Dohsan'

SELECTPT.String,

CA1.ProperCaseString

FROM#ProperTest AS PT

CROSS

APPLYdbo.ProperCase(PT.String) AS CA1;

PRINT 'Luis'

SELECTPT.String,

CA1.ProperCased

FROM#ProperTest AS PT

CROSS

APPLYdbo.ProperCase1(PT.String) AS CA1;

DECLARE @TestVar VARCHAR(16);

DECLARE @StartTime DATETIME;

PRINT 'Old Method Scalar Function'

SELECT @StartTime = GETDATE();

SELECT@TestVar = Utility.f_ProperCase(PT.String)

FROM#ProperTest AS PT;

PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

PRINT 'Dohsan'

SELECT @StartTime = GETDATE();

SELECT@TestVar = CA1.ProperCaseString

FROM#ProperTest AS PT

CROSS

APPLYdbo.ProperCase(PT.String) AS CA1;

PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

PRINT 'Luis'

SELECT @StartTime = GETDATE();

SELECT@TestVar = CA1.ProperCased

FROM#ProperTest AS PT

CROSS

APPLYdbo.ProperCase1(PT.String) AS CA1;

PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

Results:

Old Method Scalar Function

47843 MilliSeconds

Dohsan

18906 MilliSeconds

Luis

4710 MilliSeconds

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

That's nice Dohsan,

Just make sure my function gets the correct results as it won't check for characters different than spaces when trying to capitalize the first letter of each word.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Dohsan SSCrazy Points: 2972 More actions · Answer 8

Yes, I'll have to play around with it a little more when I have some time tomorrow.

I must say i hadn't thought about using the splitter to separate the words out, so +1 for you and ed wagner for having that idea!

Ed Wagner SSC Guru Points: 287026 More actions · Answer 9

Luis, your function is much simpler than mine was working out to be. I was working on creating a derivative on the DelimitedSplit8K function and was most of the way there, but the presence of multiple trailing spaces was giving me fits during testing. Also, the multiple delimiters were definitely impacting performance.

I like the way you used the unaltered and well-tested function and manipulated the output of it. Mine would have required testing to make sure all bases were covered.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Luis Cazares SSC Guru Points: 183706 More actions · Answer 10

Ed, you gave me an idea an here's what I came out with after playing with the Jeff's splitter. Dohsan would be able to change the word delimiters as he considers correct without much problem.

It's a simple query but can be easily converted in a iTVF.

DECLARE @pString VARCHAR(255) = 'this is a 3d-printer from wahsington d.c.';

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

-- enough to cover NVARCHAR(4000)

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

),

cteStart(N1) AS (--==== This returns N+1 (starting position of each word)

SELECT 1 UNION ALL

SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) IN(' ', '.', '-', '''')

)

SELECT (SELECT CASE WHEN s.N1 IS NULL

THEN LOWER(SUBSTRING(@pString,t.N,1))

ELSE UPPER(SUBSTRING(@pString,t.N,1)) END

FROM cteTally t

LEFT JOIN cteStart s ON t.N = s.N1

ORDER BY N

FOR XML PATH(''),TYPE).value('.', 'varchar(255)') ProperCaseString

;

EDIT: I reduced the datatypes for GREAT performance improvement.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

mister.magoo SSC-Forever Points: 47174 More actions · Answer 11

Here's one similar to Luis, but handles detecting the start of a word differently and works on nvarchar(4000) input.

create function [dbo].[propercase](@text nvarchar(4000))

returns table with schemabinding

as

return (

with seed1 (a)

as

(

select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all

select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1

),

numbers (n) as

(

select top (datalength(@text)) row_number() over (order by (select null))

from seed1 s1, seed1 s2, seed1 s3

)

select a.b.value('(./text())[1]', 'nvarchar(4000)') as [text]

from (

select

case

when n = 1 then upper(substring(@text, n, 1))

when substring(@text, n - 1, 2) like N'[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1))

else lower(substring(@text, n, 1))

end

from numbers

for xml path (''), type

) a (b)

)

On the small test strings it performs about 50% slower than Luis original one, but that is just because of the NVARCHAR(4000) compatibility.

Below is a varchar(255) version that is comparable to Luis, but with the extra word start checks.

create function [dbo].[propercase](@text varchar(255))

returns table with schemabinding

as

return (

with seed1 (a)

as

(

select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1

),

numbers (n) as

(

select top (datalength(@text)) row_number() over (order by (select null))

from seed1 s1, seed1 s2, seed1 s3

)

select a.b.value('(./text())[1]', 'varchar(255)') as [text]

from (

select

case

when n = 1 then upper(substring(@text, n, 1))

when substring(@text, n - 1, 2) like '[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1))

else lower(substring(@text, n, 1))

end

from numbers

for xml path (''), type

) a (b)

)

EDIT: modified to cope with Accénted characters...Damn you Luis! I wanted to sleep:-D

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Luis Cazares SSC Guru Points: 183706 More actions · Answer 12

I just love when people continue to post to get better solutions.

Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

mister.magoo SSC-Forever Points: 47174 More actions · Answer 13

Luis Cazares (1/16/2014)
I just love when people continue to post to get better solutions.
Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.

Thanks Luis, I agree there are compromises, as mine will handle O'Brien as a name, where others posted so far don't. I agree though that accented characters should probably be handled - even in English - we are a very multicultural society these days and it's not uncommon to get almost any name possible. It's just a bit late now for me to be bothered :w00t:

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Dwain Camps SSC Guru Points: 86948 More actions · Answer 14

Perhaps something like this might help?

CREATE FUNCTION [dbo].[ProperCase]

(

@MyStr VARCHAR(8000) = NULL

)

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

SELECT ProperCaseStr=

(

SELECT CASE WHEN [Matched] = 1 THEN STUFF(Item, 1, 1, UPPER(LEFT(Item, 1))) ELSE Item END

FROM dbo.PatternSplitCM(@MyStr, '[a-zA-Z]') b

ORDER BY ItemNumber

FOR XML PATH(''), TYPE

).value('.', 'varchar(8000)')

;

My ProperCase iTVF uses PatternSplitCM which can be found in the 4th article in my signature links.

WITH SampleData (MyStr) AS

(

SELECT 'The cat in the hat'

UNION ALL SELECT 'the mouse is in the parlour. but the dog is out to lunch'

)

SELECT MyStr, ProperCaseStr

FROM SampleData

CROSS APPLY dbo.ProperCase(MyStr);

You'd need to verify of course that it handles all of the cases you want.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St