February 27, 2008 at 5:26 pm
Okidoki... first things first... If you don't already have a Tally table, now is the time to add one to your arsenal of SQL prestidigitation... a Tally table (also known as a "Numbers" table) contains nothing more than a single column of well indexed numbers starting at 0 or 1 and counting up to some other number. A Tally table has many uses mostly related to replacing WHILE loops with a JOIN. It makes for some very fast, very easy to write code. Here's how to make one...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
Next, as Matt said, the splitting of the LastName is not only obvious, but is a given. It's always going to be followed by a comma according to what you said. That allows us to pick up even on composit last names like "ST VINCENT", for example.
The hard part is normalizing the rest of the data and splitting what turns out to be an unknown quantity of name parts... that's where the Tally table comes in. The following code is a self sufficient demonstration of how to do all of it and maybe give you some ideas for more... as usual, the comments in the code explain all...
DROP TABLE #Name
GO
CREATE TABLE #Name (CustNumber INT,FullName VARCHAR(50))
INSERT INTO #Name
(CustNumber,FullName)
SELECT 1510,'NYSTROM, GRAHAM, A' UNION ALL
SELECT 23,'ST CLOSSON, SEAN T' UNION ALL
SELECT 524,'MOORE, AMANDA' UNION ALL
SELECT 18,'ST VINCENT, MARY ANN T' UNION ALL
SELECT 357,'EASTWOOD, CLINT HARRY'
--===== Split the names and show the work...
-- The names are reassembled on each line using simple "cross-tab" technology based on the name
-- part number that we determined during the split in derived table "s"/.
SELECT CustNumber = n.CustNumber,
FullName = n.FullName,
LastName = LEFT(n.FullName,CHARINDEX(',',n.FullName)-1), --Handles even composite last names
FirstName = MAX(CASE WHEN s.PartNumber = 1 THEN s.NamePart ELSE NULL END),
Initial = MAX(CASE WHEN s.PartNumber = 2 THEN s.NamePart ELSE '' END),
Error1 = MAX(CASE WHEN s.PartNumber = 2
AND LEN(s.NamePart) > 1 THEN 'Composite first name or full middle name'
ELSE '' END),
Error2 = MAX(CASE WHEN s.PartNumber = 3 THEN 'Composite first name or too many parts'
ELSE '' END)
FROM #Name N
INNER JOIN
(--==== Derived table "s" does the split with a count to number each "partnumber"
-- This may be run separately (include all the way up to ")s") to see what it does
-- Tally table "steps" through each string 1 word at a time based on where the commas
-- in the inherent join are found... it really does turn out to be a bit of a cross join
-- but is very very fast and does not repeat itself.
SELECT CustNumber = f.CustNumber,
NamePart = SUBSTRING(f.TheRest, t.N+1, CHARINDEX(',', f.TheRest, t.N+1)-t.N-1),
PartNumber = LEN(LEFT(f.TheRest,t.N)) - LEN(REPLACE(LEFT(f.TheRest,t.N), ',', ''))
FROM dbo.Tally t
INNER JOIN
(--==== Derived table "f" formats the rest of the name for a split
-- This cleans up commas and spaces "auto-magically" but will stumble on composite names
-- This may be run separately to see what it does
SELECT CustNumber = CustNumber,
TheRest = REPLACE(REPLACE(SUBSTRING(FullName, CHARINDEX(',',FullName),8000),', ',','),' ',',')+','
FROM #Name
) f
ON SUBSTRING(f.TheRest, t.N, 1) = ','
AND t.N < LEN(f.TheRest)
)s
ON n.CustNumber = s.CustNumber
GROUP BY n.CustNumber, n.FullName
ORDER BY LastName, FirstName
I made no attempt to resolve every possible name combination in the world... but it's enough to get you started on what you need. The error columns provide hints as to which names did not successfully resolve.
Part of the speed comes from the fact that there is no UDF involved in this at all. The join is at the character level between the full name column and the Tally table.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 4, 2008 at 8:21 pm
Ya know... I busted a bit of a hump for ya here... and no one on the other forum came close... the least you could do, MH, is let me know if the code did what you wanted...
--Jeff Moden
Change is inevitable... Change for the better is not.
August 20, 2012 at 4:58 am
Jeff, I know it's almost 5 years since you posted this solution but if it's any consolation then let me thank you for this code.
I'm learning so much from your coding skills.
Thank you so much!
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 20, 2012 at 5:10 am
Abu Dina (8/20/2012)
Jeff, I know it's almost 5 years since you posted this solution but if it's any consolation then let me thank you for this code.I'm learning so much from your coding skills.
Thank you so much!
Gosh... Thanks, Abu. I really appreciate the kind words.
My skin has thickened a lot since that post. I still think it's rude not to reply to folks that tried to help but most of the people do believe in some form of dialog and that makes up for it.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 16 through 18 (of 18 total)
You must be logged in to reply to this topic. Login to reply