So here I am, straight home from work and ready to relax for the evening when a mysterious message from an old client turns up on my answering machine. It felt very "noir".
"Help!" he says. "I've got a problem. There are strings lurking in my database that are breaking the rules. They're all hanging about in upper case, mocking me and my attempts to fix them. I need all the capital letters in the correct place with lower case letters in the rest of the word. But something's sabotaging my efforts. I can't seem to get these hooligans in line."
He emails me the script below:
-- Code Generator for expression DECLARE @x INT, @c CHAR(1), @sql VARCHAR(8000) SET @x = 0
SET @sql = '@str' -- actual variable/column you want to replace WHILE @x < 26 BEGIN SET @c = CHAR(ASCII('a') + @x) SET @sql = 'REPLACE(' + @sql + ', '' ' + @c+ ''', '' ' + UPPER(@c) + ''')'
SET @x = @x + 1 END PRINT @sql
He wants to have it read directly from his table but can't figure out how to make the variable read correctly. My first instinct is that he's been bamboozled. This script isn't what he wants, especially when I see the PRINT statement. So I take the case. Just for a look-see, I substitute a word into the place of the @str (BRANDIE) as an example and what do I get?
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(BRANDIE, ' a', ' A'), ' b', ' B'), ' c', ' C'), ' d', ' D'), ' e', ' E'), ' f', ' F'), ' g', ' G'), ' h', ' H'), ' i', ' I'), ' j', ' J'), ' k', ' K'), ' l', ' L'), ' m', ' M'), ' n', ' N'), ' o', ' O'), ' p', ' P'), ' q', ' Q'), ' r', ' R'), ' s', ' S'), ' t', ' T'), ' u', ' U'), ' v', ' V'), ' w', ' W'), ' x', ' X'), ' y', ' Y'), ' z', ' Z')
I blink, look at the script again. Sure enough. The only thing it does is generate a REPLACE function to be cut-n-pasted into a T-SQL statement. And if the unwary doesn't look at the script closely enough, they'll end up with all lower case letters instead of a mixed case string.
When I break the news to him, he's crushed. He admits he doesn't know enough about SQL functions to write his own code and couldn't find anything on the internet that would solve the problem. So that's when I get thinking. How many other people have had this problem and can't find exactly what they need?
So here it is, my solution. Granted, it's based on words being separated by spaces. If you have a string that could be separated out by other characters, you'll have to tweak it a little, but I think this solution is a little more elegant than the above REPLACE function.
First create yourself a table and populate it with some values. In this example we're going to use location names.
Create Table Import_Data_Filter (MyID Int Identity(1,1) NOT NULL, Location varchar(100))
GO Insert into Import_Data_Filter (Location)
(Select Upper('Mandarin') UNION ALL Select Upper('San Jose') UNION ALL Select Upper('Baymeadows') UNION ALL
Select Upper('My FH Locale') UNION ALL Select Upper('St. Augustine') UNION ALL Select Upper('Test For Three Spaces')
UNION ALL Select Upper('Test for being Four Spaces') UNION ALL Select Upper('Test for being Five More Spaces')
UNION ALL Select Upper('Baymeadows') UNION ALL Select Upper('St. Augustine'))
GO
Notice that I have a couple of duplicates. I did this deliberately to make sure my code would do the final update properly.
Next, let's populate a temp table with the DISTINCT values of the real data table. This way, we're only looping through the values once. Since all the values are upper case, I'm going to set them all to lower case. If you have all lower case values, you can ignore this step. Then I go back through the table and set the very first character of all strings to upper case while appending the rest of the lower case string to it. Please note the SUBSTRING functions very carefully. They are important if you don't want to accidently end up with all lower or all upper case again. Finally, I add a bit column called "Done" so I can check off a row once I've finished iterating through it. Now my temp table is ready for a real fix.
If (Select Object_ID('tempdb..#MyTemp1')) is NOT NULL
Drop table #MyTemp1; -- Drop temp table if it already exists Select Distinct Location into dbo.#MyTemp1 from dbo.Import_Data_Filter; --Get distinct values for all locations Update mt1 Set Location = Lower(mt1.Location) from dbo.#MyTemp1 mt1; -- Set entire string to lower case letters Update mt1 Set Location = UPPER(Substring(mt1.Location,1,1)) + Substring(mt1.Location,2,Len(mt1.Location))
from dbo.#MyTemp1 mt1; -- Set very first character in string to Upper case and add in lower case rest of string --Select * from dbo.#MyTemp1; -- For troubleshooting only Alter Table dbo.#MyTemp1
Add Done bit NOT NULL Default 0; GO
We have several variables we want to create at this point. The string holder (@pInput), the string position counter (@vIndex), the string length holder (@vInputLength), the number of strings to check (@LocCount) and the dynamical SQL string for updating our Temp Table (@MySQL). The @vCount string is only for counting the number of spaces I have in my string. I used it for troubleshooting, then commented it out when I was done with it.
I set my intial variables and create two nested WHILE Loops. The outer loop will iterate through the individual Locations while the inner loop will iterate through a single location string until it's reached the end of the string. Here's the loop code:
DECLARE @vInputLength INT, @vIndex INT, @pInput VARCHAR(100), @LocCount int, @MySQL varchar(300) --,@vCount
-- Needed variables for below loops. @vCount is used for troubleshooting only --SET @vCount = 0; -- Sets the count of spaces to zero
SET @vIndex = 1; -- Sets the starting position of the string to 1
Select @LocCount = Count(Location) from dbo.#MyTemp1; -- Counts the # of distinct locations in temp table for outer loop While @LocCount > 0 -- While we still have locations to browse Begin Set @pInput = (Select Top 1 Location from dbo.#MyTemp1 where Done = 0); -- Get the first "unfixed" location SET @vInputLength = LEN(@pInput); --Get the total string length WHILE @vIndex <= @vInputLength -- While we still have characters to search
BEGIN
IF SUBSTRING(@pInput, @vIndex, 1) = ' ' --If the current character is a space
Begin
--SET @vCount = @vCount + 1; --Add one to the spaces count. Use only if verifying code. -- Update mt1
-- Set NumSpaces = @vCount, Pos = ISNULL(mt1.Pos,'') + ' ' + Convert(varchar(2),@vIndex)
-- from dbo.#MyTemp1 mt1 where Location = @pInput;
--This table is for testing purposes only, to verify space count code is working correctly. Set @MySQL = 'Update mt1 Set Location = Substring(mt1.Location,1,' + Convert(varchar(3),@vIndex) + ') '
+ '+ UPPER(Substring(mt1.Location,' + Convert(varchar(3),(@vIndex+1)) +',1))'
+ ' + Substring(mt1.Location,' + Convert(varchar(3),(@vIndex+2)) + ',Len(mt1.Location)) from dbo.#MyTemp1 mt1'
+ ' Where Location = ''' + @pInput + ''' and Done = 0;';
-- Dynamic SQL. Will update the character AFTER the current space to an Upper Case character
--Select @MySQL; --For testing & verification of SQL String only
Exec (@MySQL); -- Excecute the update code set above End -- END of IF statement SET @vIndex = @vIndex + 1 --Increment the position counter
--Select @pInput as Loc, @vIndex as Pos, @vCount as Cnt --For testing & verification of variables only. END -- End of singular Location string search / Inner WHILE loop Set @LocCount = @LocCount - 1; --Decrement the # of locations to search
--Set @vCount = 0; --Reset the spaces counter
Set @vIndex = 1; --Reset the position counter Update dbo.#MyTemp1
Set Done = 1
where Location = @pInput; --Update current location as fixed
END -- End of Outer While loop. All Locations fixed
GO
I've put extensive notes in each part of the script to explain their purpose. As you can see, I'm a big believer in keeping my troubleshooting code as part of the script. That way, if something breaks later on, I can check the values of everything without having to re-write the troubleshooting stuff. If this bothers you, feel free to cut it all out.
Basically, the inner loop goes through looking for spaces in a string. If there are no spaces, nothing happens. However, if the loop encounters a space, it'll update the temp table, changing the NEXT character in line to an upper case character while pre-pending and appending the rest of the string to that upper case character. DON'T FORGET TO PREPEND. Otherwise, you loose part of your string. Also, if 100 characters is not enough for your string or 300 not enough for your update statement, change the sizes of the @pInput and @MySQL variables.
Also, the way this script is written, there are no limits to the number of spaces you can have in a string. Of course, the one problem I haven't fixed is if there are multiple spaces in a row, the code will try to update a space as an upper case character, but since you can't see "upper case spaces", I didn't bother with trying to fix that part. Yes, I'm lazy. @=) I'm assuming my client will only have 1 space between words.
Now, the only thing we have left is to update the real table from the temp table.
Select * from dbo.#MyTemp1
-- Shows fixed Temp table Update idf
Set Location = mt1.Location
from dbo.Import_Data_Filter idf
join dbo.#MyTemp1 mt1
on idf.Location = mt1.Location;
--Updates Real Table Select * from dbo.Import_Data_Filter;
--Shows results of update
Of course there are other ways of writing this code. You could use "the dreaded cursor". So, to find out if a cursor would be better than my above code, I wrote one. After 1 minute and 30 seconds, the darned thing was still running, so I killed it. For 8 distinct names, 1:30+ is way too long for an update. This is why I'm sticking with the nested WHILE Loops. Much faster and much more efficient. I did try the Tally Table method, which was a quick update, but got stuck when trying to put the string back together. If I figure that one out, I'll post it.
If, of course, you have shorter / easier code that is just as efficient, I'd love to hear from you. After all, we code detectives have to stick together. Who else will defend databases from the dirty rotten data scoundrels?