Tame Those Strings Part 7 - Be Proper With Your Data

This is a continuation of string manipulation techniques. If you are interested, you can read the other articles,
though you do not need to read it before this one. These are mostly beginning
programming articles, but advanced T-SQL programmers may still find something useful here.
Part 1 deals with SUBSTRING and how it can be used to extract some information
from a field of data
Part 2 deals with CHARINDEX and how it can be used to extract some information
from a field of data when the data is delimited rather than stored in a particular format.
Part 3 deals with REPLACE and how it can be used to remove unwanted information
from a field of data when the data is not in a known format.
Part 4 deals with numeric conversions.
Part 5 deals with STUFF.
Part 6 works with concatenating results.

Introduction

I recently saw someone ask a question about how to proper case a field of data. I thought

this would be a simple update statement, but when I actually tried it, I realized there was

a bit more work involved. In the instance that I saw, the person wanted to handle a field

that contained names. The solution was a parsing solution that used cursors to find a space

before each name and then use substrings to reformat the string. This works fine in a procedural

process, but doesn't take advantage of the power of SQL Server. I decided to develop a more set

oriented approach that should handle large result sets.

The Problem

The basic problem is to turn one string into another, with the second string being formatted

in proper case. An example would be:

Source                Result
------------          ---------------
steve jones           Steve Jones
tia jones             Tia Jones
delaney steven jones  Delaney Steven Jones
  kyle                  Kyle

Nothing special here, but notice that there are a few different cases with differing numbers

of words in each field. This is important because a simple solution that depended on all names

being two words would fail. This was what I ran into when building my solution.

Setup

I am including my script file here:ProperCase.sql. This is

the file I used to develop this solution. If you want to practice yourself, this little script

will build you a test file:

if object_id('ProperTest') Is not null
 drop table ProperTest
go
create table ProperTest
( OldName varchar( 80),
  NewName varchar( 80)
)
go
insert ProperTest
 select lower(contactname), lower( contactname)
 from Customers
insert ProperTest
select 'my old friend bob', 'my old friend bob'
insert ProperTest
select '  my old friend bill', 'my old friend bill'
-- reset values
-- update propertest set newname = oldname
select * from ProperTest

This script will build a test table and populate it using data from the Northwind database.

Work with the Newname field for your manipulations. The last line in the script "fixes" the

Newname data to its original form.

The Solution

My first attempt at reformatting data resulted in a set-oriented solution that worked great

for two word names, but failed after that. Here is the first thing I quickly came up with:

update ProperTest
 set NewName = upper( substring( NewName, 1, 1))
+ substring( NewName, 2, charindex( ' ', NewName)-1)
+ upper( substring( NewName, charindex( ' ', Newname) + 1, 1))
+ substring( NewName, charindex( ' ', Newname) + 2, 80)

This works great if there are only two names in the field, but most likely your data

will be like this data and there are some names with 3 words such as "José Pedro freyre".

Note that this is the result using my first solution.

My next attempt looked at somehow looping through the result set, but how many times would I

know how to loop? This was a tough one, but I finally came up with a solution. I would

add a "flag" to my data. This flag would be used to find the beginning of each word. Since

the basic pattern that I am looking for is word, I need to flag this.

When building a set-oriented solution, you have to search for common patterns among all items in

the set. When you build a WHERE clause, you often specify some common pattern such as LIKE 'SQL%'.

I decide to use the same approach. To do this, you have to choose a special character in whatever

code page or language you are using that will not occur in your data. For my purposes, I choose the

'@' symbol. I could have easily chosen &, ~ or !.

So my first step was to set my flag, which I did by replacing the spaces with '@'.

update ProperTest
 set NewName = replace( NewName, ' ', '@')

This sets all my data to look like this:

Newname
-------------
maria@anders
ana@trujillo
antonio@moreno
josé@pedro@freyre

Now that I have flags, I can use these to mark the position in each string that I am trying

to process.

The basic process for each string is now:

to find the '@' symbol
set the next character to upper case
remove the '@' symbol since this word has been reformatted

Now in order to keep processing until all words are reformatted, I merely need to check

for the existence of rows that contain a '@'. All my processing is then limited to these rows.

The script to perform this processing is below:

-- Replace spaces with the "special" character
update ProperTest
 set NewName = replace( NewName, ' ', '@')
-- Handle case 1 - First item
update ProperTest
 set NewName = upper( substring( ltrim( NewName), 1, 1)) + substring( ltrim( NewName), 2, 80)
-- loop while there are rows with the flag
while exists(
select *
 from ProperTest
 where NewName like '%@%'
)
 begin
    -- Proper case the word after the flag.
update ProperTest
 set NewName = substring( NewName, 1, charindex( '@', NewName)) + 
upper( substring( NewName, charindex( '@', NewName)+1, 1 )) + 
substring(NewName, charindex( '@', Newname)+2, 80)
 where Newname like '%@%'
-- Remove the first flag encountered in each row
update ProperTest
 set NewName = substring( NewName, 1, charindex( '@', Newname)-1) +
' ' + substring( NewName, charindex( '@', Newname) + 1, 80)
 where Newname like '%@%'
 end

Conclusions

This was an interesting exercise for me and forced my to spend some time integrating a

simple algorithm into a set-oriented approach. With a little adaptation, this could be easily

adapted to proper case sentences (look for period-space) or any other patterning structure.

I hope this has proven useful for some of you and as always, I welcome feedback. Please click the

"Your Opinion" link below and add your comments.

Steve Jones

Return to Steve Jones Home

Tame Those Strings! Part 7 - Proper Casing Strings

Tame Those Strings Part 7 - Be Proper With Your Data

Introduction

The Problem

Setup

The Solution

Conclusions

Rate

Share

Categories

Share

Rate

Tame Those Strings! Part 7 - Proper Casing Strings

Tame Those Strings Part 7 - Be Proper With Your Data

Introduction

The Problem

Setup

The Solution

Conclusions

Rate

Share

Categories

Share

Rate

Related content

Tame Those Strings - Using SUBSTRING

Tame Those Strings! Part 2 - Using CHARINDEX

Tame Those Strings! Part 5 - Using STUFF

SP_MSFOREACHDB - Getting a list of databases via TSQL.

SP_MSForeachtable - Life Without Cursors.