May 14, 2012 at 1:02 pm
Also found this neat little function, in case you are interested:
create function ProperCase(@Text as varchar(8000))
returns varchar(8000)
as
begin
declare @Reset bit;
declare @Ret varchar(8000);
declare @i int;
declare @C char(1);
select @Reset = 1, @i=1, @Ret = '';
while (@i <= len(@Text))
select @C= substring(@Text,@i,1),
@Ret = @Ret + case when @Reset=1 then UPPER(@c) else LOWER(@c) end,
@Reset = case when @C like '[a-zA-Z]' then 0 else 1 end,
@i = @i +1
return @Ret
end
select dbo.ProperCase('this,my friends, is a test.wHat DO you think?i like shaquile o''neal')
May 14, 2012 at 1:17 pm
Ooof, I suspect that will be awfully slow when applied to a non-trivial amount of data. Have you tested it on a table with more than a few thousand rows?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 14, 2012 at 1:57 pm
opc.three (5/14/2012)
Calling my example function like this:
SELECT dbo.ToTitleCase('The quick brown fox JUMPS over the lazy dog.');
Yields:
The Quick Brown Fox JUMPS Over The Lazy Dog.
JUMPS stays uppercase. I may decide to send everything to lowercase before passing the string into the .Net method. I think that would make the function more useful in general but I wonder how that would affect performance.
Maybe the .Net architects named the TextInfo method ToTitleCase() to avoid confusion with "sentence case". Naming is so important with basic functionality which is why I wanted it open for discussion. I suppose anyone can pickup the function definition and simply rename it if they like, per their interpretation. There is plenty here for an article 😀
What about a case-sensitive collation?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 14, 2012 at 2:00 pm
opc.three (5/14/2012)
bitbucket-25253 (5/12/2012)
opc.three (5/11/2012)
Jeff Moden (5/11/2012)
Do you have any performance metrics for this, Orlando?I do not, but I would be happy to mock up some test data and run some tests. I used the TextInfo class extensively in my .Net dev days, but never in the SQLCLR space. The weather turned cold here so golf became a no-go meaning I may have some time to look at this tomorrow AM.
Having this "hidden gem" and if you go to the trouble of testing same .... I think it is worthy enough to be an article here on SCC, or if not wanting to go that far ... at the very least add it to the SQL SPACKLE entries.
I never gave it much thought, but now that you mention it I think you are spot on with adding it to the SQL Spackle series. I will give it a few more paces to see what quirks I can find so I can document "known issues" ahead of time. Provided it makes sense after running it down I agree it would be a nice complement to UPPER() and LOWER(). We are forced to reference UDFs using a schema so what do you think of dbo.PROPER() or dbo.TITLE()? Any other suggestions on naming, schema or object?
I'll send you the script for the UDF I wrote so you can do your own speed comparisons in the article. Actually, I have two scripts. One uses a While Loop and the other a Tally CTE. Guess which one wins? It's not the one you might be thinking.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2012 at 2:03 pm
WayneS (5/14/2012)
opc.three (5/14/2012)
Calling my example function like this:
SELECT dbo.ToTitleCase('The quick brown fox JUMPS over the lazy dog.');
Yields:
The Quick Brown Fox JUMPS Over The Lazy Dog.
JUMPS stays uppercase. I may decide to send everything to lowercase before passing the string into the .Net method. I think that would make the function more useful in general but I wonder how that would affect performance.
Maybe the .Net architects named the TextInfo method ToTitleCase() to avoid confusion with "sentence case". Naming is so important with basic functionality which is why I wanted it open for discussion. I suppose anyone can pickup the function definition and simply rename it if they like, per their interpretation. There is plenty here for an article 😀
What about a case-sensitive collation?
Oooh, another wrinkle. I hadn't considered it but my initial thoughts say collation may not matter since .Net is doing the re-casing, outside of any database context, outside of the database engine for that matter. The culture returned by CultureInfo.CurrentCulture. in the .Net function will likely be the determining factor. I think that would take on a value per the OS' language settings. I will add it to the list of items to research. Thanks for raising it.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 14, 2012 at 2:10 pm
Jeff Moden (5/14/2012)
opc.three (5/14/2012)
bitbucket-25253 (5/12/2012)
opc.three (5/11/2012)
Jeff Moden (5/11/2012)
Do you have any performance metrics for this, Orlando?I do not, but I would be happy to mock up some test data and run some tests. I used the TextInfo class extensively in my .Net dev days, but never in the SQLCLR space. The weather turned cold here so golf became a no-go meaning I may have some time to look at this tomorrow AM.
Having this "hidden gem" and if you go to the trouble of testing same .... I think it is worthy enough to be an article here on SCC, or if not wanting to go that far ... at the very least add it to the SQL SPACKLE entries.
I never gave it much thought, but now that you mention it I think you are spot on with adding it to the SQL Spackle series. I will give it a few more paces to see what quirks I can find so I can document "known issues" ahead of time. Provided it makes sense after running it down I agree it would be a nice complement to UPPER() and LOWER(). We are forced to reference UDFs using a schema so what do you think of dbo.PROPER() or dbo.TITLE()? Any other suggestions on naming, schema or object?
I'll send you the script for the UDF I wrote so you can do your own speed comparisons in the article. Actually, I have two scripts. One uses a While Loop and the other a Tally CTE. Guess which one wins? It's not the one you might be thinking.
Thanks, Jeff. That will help set the board.
My initial thought in thinking of a T-SQL solution was to use a Tally table but your last statement raises suspicions about the WHILE-loop outperforming it. In the WHILE are you inch-worming through using CHARINDEX looking for spaces?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 14, 2012 at 6:21 pm
Nope... even "worse". PATINDEX.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2012 at 11:19 pm
Here's a solution that may work if you won't or can't use a CLR:
DECLARE @t TABLE (strcol VARCHAR(100))
INSERT INTO @t
SELECT 'here is some text'
UNION ALL SELECT ' and HERE is SOME more'
UNION ALL SELECT 'with, some. punctuation'
;WITH ProperCase AS (
SELECT CAST(UPPER(SUBSTRING(strcol, 1, 1)) AS VARCHAR(100)) AS prev
,UPPER(SUBSTRING(strcol, 1, 1)) AS strcol
,LOWER(SUBSTRING(strcol, 2, LEN(strcol))) AS rest
,ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as m
,1 AS n
FROM @t
UNION ALL
SELECT CAST(prev + CASE WHEN strcol = ' '
THEN UPPER(SUBSTRING(rest, 1, 1))
ELSE SUBSTRING(rest, 1, 1) END AS VARCHAR(100))
,CASE WHEN strcol = ' '
THEN UPPER(SUBSTRING(rest, 1, 1))
ELSE SUBSTRING(rest, 1, 1) END
,CAST(SUBSTRING(rest, 2, LEN(rest)) AS VARCHAR(100))
,m
,n + 1
FROM ProperCase WHERE LEN(rest) > 0
),
Rows AS (
SELECT strcol, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as m
FROM @t
)
SELECT strcol
,(SELECT prev
FROM ProperCase s2
WHERE s1.m = s2.m and s2.n = LEN(s1.strcol)) AS ProperCaseString
FROM Rows s1
If it is well formed sentences, it should work OK. Punctuation not followed by a space will cause issues.
Forgive me oh experts for I have sinned. I have not compared performance against other suggestions. 🙂
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
May 15, 2012 at 7:15 am
tburk 5368, something others haven't mentioned is that you might consider just storing the data as is, then converting it for the output. For example, in SSRS reports, see this link (http://social.msdn.microsoft.com/forums/en-US/sqlreportingservices/thread/81fd7199-4979-4f7f-96ae-2fe5f42e4eb8/[/url])
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
May 15, 2012 at 7:36 am
dwain.c (5/14/2012)
Forgive me oh experts for I have sinned. I have not compared performance against other suggestions. 🙂
Since you used a recursive CTE for this task, that may be a bigger problem than you imagined. 😉 Of course, only a test will tell for sure.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 15, 2012 at 8:15 am
--edit-- Fixed a bug 😛
DECLARE @Table TABLE (iD INT, AttrVal1 VARCHAR(100));
INSERT INTO @Table
SELECT 1 , 'exec Proc myproc'
UNION ALL SELECT 2 , 'Exec Proc Myproc2'
UNION ALL SELECT 3 , 'CREATE FUNCTION A'
UNION ALL SELECT 4 , 'bulk upload all files'
UNION ALL SELECT 5 , ' dir /*';
SELECT finalSubQuery.iD, finalSubQuery.xmlList.value('.', 'VARCHAR(100)')
FROM (SELECT iD, (SELECT piece
FROM (SELECT iD,
ISNULL(STUFF(a.b.value('.', 'varchar(100)'), 1, 1, UPPER(LEFT(a.b.value('.','varchar(100)'), 1))), ' ')
FROM (SELECT iD,
CAST('<xmlList><element item="' +
REPLACE(AttrVal1, ' ', ' " /><element item="') +
'" /></xmlList>' AS XML) AS AttrVal1
FROM @Table) data
CROSS APPLY AttrVal1.nodes('xmlList/element/@item') a(b)
) a(iD, piece)
WHERE a.iD = tbl.iD
FOR XML PATH(''), TYPE) AS xmlList
FROM @Table tbl
GROUP BY iD) finalSubQuery;
Returns: -
iD
----------- ----------------------------------------------------------------------------------------------------
1 Exec Proc Myproc
2 Exec Proc Myproc2
3 CREATE FUNCTION A
4 Bulk Upload All Files
5 Dir /*
Fancy running it through your test Jeff?
May 15, 2012 at 9:17 am
Here's an all-TSQL version:
;
WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3)
SELECT C AS Before,
[After]
FROM ( VALUES ( 1, 'This song is just six words long'), ( 2, 'Not as simple. as the prior test'),
( 3, 'even-more-complex TEST'), ( 4, 'F16A') ) AS V (Seq, C)
CROSS APPLY (SELECT (
SELECT ''
+ CASE WHEN SUBSTRING(C,
Number - 1, 1) LIKE '[a-zA-Z]'
THEN LOWER(SUBSTRING(C,
Number, 1))
ELSE UPPER(SUBSTRING(C,
Number, 1))
END
FROM Numbers
WHERE Number BETWEEN 1 AND LEN(C)
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]',
'varchar(1000)')) AS Splitter ([After])
ORDER BY Seq ;
I use a table-value constructor for the initial values, but a real table or temp table or CTE or table variable would work there just as well. The Case statement that decides whether to upper or lower a character can deal with case-sensitive colations.
Would be interesting to extend it to accept standard accronyms as exceptions to the title-casing. So "USA" wouldn't be turned into "Usa". But I think I'd go CLR and use REGEX functionality to do that kind of thing.
Another flaw is that "MacGuiver" would end up "Macguiver", but that again requires an exceptions library and would be better done in CLR.
Would also mess up "dbo.MyTable" into "Dbo.Mytable". Same comment as above about exceptions.
Because of all of those kinds of exceptions, and dozens more, I don't recommend doing this in T-SQL, and don't recommend doing this at all without serious research into what it's for. But this will do basic Title Casing.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 15, 2012 at 10:40 am
GSquared (5/15/2012)
Because of all of those kinds of exceptions, and dozens more, I don't recommend doing this in T-SQL, and don't recommend doing this at all without serious research into what it's for. But this will do basic Title Casing.
Exception dictionaries says "full-scale data quality framework" to me. I wonder what Data Quality Services might offer in 2012.
The ToTitleCase function can be torn apart on many fronts, as you have shown. A simple use-case is all it will satisfy. One required customization of the case-rules and the whole approach must change. I am thinking the CLR function will simply do the title-casing. If the caller has strings in all caps to begin with they can use the built-in LOWER function on the value before passing it into the CLR function. If that does not satisfy their use-case then they'll be forced to find an alternate, custom solution.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 15, 2012 at 11:12 am
opc.three (5/15/2012)
GSquared (5/15/2012)
Because of all of those kinds of exceptions, and dozens more, I don't recommend doing this in T-SQL, and don't recommend doing this at all without serious research into what it's for. But this will do basic Title Casing.Exception dictionaries says "full-scale data quality framework" to me. I wonder what Data Quality Services might offer in 2012.
The ToTitleCase function can be torn apart on many fronts, as you have shown. A simple use-case is all it will satisfy. One required customization of the case-rules and the whole approach must change. I am thinking the CLR function will simply do the title-casing. If the caller has strings in all caps to begin with they can use the built-in LOWER function on the value before passing it into the CLR function. If that does not satisfy their use-case then they'll be forced to find an alternate, custom solution.
Proper-case, especially with names, is another one of those things that seems like it should be really simple, but turns out horrifically complex. Like name-and-address data. Seems straightforward, till you start dealing with anything outside of your default culture.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 15, 2012 at 11:41 am
GSquared (5/15/2012)
opc.three (5/15/2012)
GSquared (5/15/2012)
Because of all of those kinds of exceptions, and dozens more, I don't recommend doing this in T-SQL, and don't recommend doing this at all without serious research into what it's for. But this will do basic Title Casing.Exception dictionaries says "full-scale data quality framework" to me. I wonder what Data Quality Services might offer in 2012.
The ToTitleCase function can be torn apart on many fronts, as you have shown. A simple use-case is all it will satisfy. One required customization of the case-rules and the whole approach must change. I am thinking the CLR function will simply do the title-casing. If the caller has strings in all caps to begin with they can use the built-in LOWER function on the value before passing it into the CLR function. If that does not satisfy their use-case then they'll be forced to find an alternate, custom solution.
Proper-case, especially with names, is another one of those things that seems like it should be really simple, but turns out horrifically complex. Like name-and-address data. Seems straightforward, till you start dealing with anything outside of your default culture.
I agree. It can turn into a major effort in a hurry. The exceptions are endless, which is why I am going to leave ToTitleCase to stand on its own two feet. For name and address data I have relied on tools like MelissaData and have been quite happy with the results. For other data it is unlikely that a simple solution like ToTitleCase would suffice completely. However sprinkling in a few calls to REPLACE to handle some instances (e.g. dbo not Dbo) important to the line-of-business may be all that is needed to arrive at an acceptable solution. Performance obviously must be a consideration as well. To me the real key is ensuring all data is run through a single set of logic, i.e. AppA cannot be using different casing-logic from AppB if both are writing to the same database.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 15 posts - 16 through 30 (of 40 total)
You must be logged in to reply to this topic. Login to reply