April 14, 2009 at 12:44 pm
Todd, the problem with it is that a human being has to go through, row by row, and identify what format each name has, then record that somewhere so that the function can be run.
It would almost certainly be cheaper to hire a bunch of temps and set them up at data-entry stations.
Run the list through a simple algorithm that can spot all the obvious names, but make sure it doesn't treat "Dr. Jones" as first name = "Dr.", or anything like that. Then take all the exceptions and give them to the data-entry temps and have them blast through it.
Otherwise, it's going to be someone going through and having to go, "'Col. John W Smith, PhD', let's see, what format is that? Oh heck, we don't have a template for that one. Joe, gotta build a template for someone with a rank, a three-part-name, and a suffix." for each name format.
Even human entry will run into issues with unfamiliar names. Any American should be able to handle "Col. John W Smith, PhD", but how many will be able to figure out "Padma Vibhushan Subrahmanyan Chandrasekhar"? (That's a real name, by the way.) How about "Chan Kong Sang"? How many would realize that "Chan" is the "last name" (family name)? (That's Jackie Chan's birth-name. Chan being first is pretty standard in that part of the world.)
With names like that, how can you possibly expect an algorithm to get it right?
So, somebody came up with an algorith that requires that a person tell the computer which piece of the string is which name. That's what that function does. You have to tell it "f f F L" on the Indian name I used as an example, or whatever the proper sequence of codes is for "not really his first name, not the first name he goes by, first name he goes by, family name".
By the time you've figured that out by eyeballing it, you may as well have typed the thing in yourself. Overall, it'll save time.
That's why I call that function a toy. It was, I'm sure, fun to build. It shows off some useful features of SQL. But in the real world, the purpose of software is to do things that people can't do themselves more efficiently, since that's the sole and only purpose of tools.
Would we have knives if we had fingernails that could cut things the way knives do? Nope. Would we have calculators if people could do 10-digit multiplication, etc., in their heads? Nope.
So why have a software function that requires so much hand-holding that it's just as easy, if not easier, to just do it yourself? No practical reason at all. That means it's not a tool. If it's not a tool, that makes it a toy.
I'm not trying to insult anyone. I'm just assessing the value I see in the thing.
Make sense?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 14, 2009 at 1:39 pm
I am not insulted. I think you that haven't evaluated the function correctly. The function returns the name you supply in the format that you ask for. There is no doubt that it definately makes some huge assumptions about the way the original full name is entered. But it doesn't ask you how the entry is formated. I sure that It falls short in many instances where the assumptions the code makes about the entry are wrong.
However, I am sure that if you did have a person go row by row to check each name. It would much faster if you let the code take a stab at it first. I am also sure that even after the person checked the name there would still be some errors. I have a cousin who is named James Todd (I am named after my mothers whole side of the family.) If his name was eneterd 'Todd James', The code says his first name is Todd and his last is James. I doubt any person, except those that know my cousin, could find reason to challenge the result. However if it were enterd Todd, James the code would get it right.
Unfortunately for us Data often comes to us in a terrible mess, but we can't just throw up our hands and give up, I still see this as a valuable tool. I admit it is not the tool that does everything or guarenties 100% accuracy. In the end given all the possible ways to enter a name, you have to put some one on the row by row check. But I still say I would have found this function useful when doing data conversions for the Utility Billing Systems, even though in the end I would have to check the results row by row.
April 14, 2009 at 2:33 pm
I guess I don't see it that way. I've had to process millions of names (used to work for a couple of direct mail marketing companies) in mail lists, and in looking over this function, the fact that you have to manually assign a format to each row eliminates all value for what I was doing. Doesn't reduce the value, eliminates it.
Here's what I recommend: create a list of 100 names, with a percentage of complex ones in it (things like "Dr. Jones, MD", with no first name, and "Billy-Bob James Smith VI"), and insert it into a table. Then use the function on that table, and see how long it takes to set up the run on the function, including adding all the flags it requires in order to parse the names.
Then, with the exact same list, see how long it takes to eyeball them and type them into a simple data entry form.
If you find that running them through the function, including adding flags to it, is easier and faster, well and good. You've got something useful. If you find that it isn't, that's fine too.
Then take the time that was spent on each, and work out how that translates in your salary (for the more technical version) vs the minimum wage that a data entry clerk will probably cost. This is where I think the difference will show up. If you want to take it one step further, check with temp agencies and see what they are charging for data entry clerks in your area. It's usually pretty cheap, and with the labor market right now it's almost certainly even cheaper than usual.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 14, 2009 at 3:53 pm
You don't have to manually inspect each name pass in a flag to get the function to run.
Given a table of full names in my earlier post.
[Code]
SELECT
FULLNAME,
dbo.FormatName([FullName],'F') AS FirstName --Returns best guess of FistName
dbo.FormatName([FullName],'L') AS LastName --Returns best guess of LastName
dbo.FormatName([FullName],'m') AS Middle Initial --Returns best guess of middle initial
[/code]
The time savings of this function is that it makes a best guess based on common name patterns used to enter names. If I had a table of mostly irregular names, or of names from other countries where the patterns are different the code wouldn't help me very much. And I am still left inspecting each row after I run the names through the parser to catch the errors. I my experience I typically had 3000 names to process for the Utility Billing data conversions, and my boss had written some access vba code to parse the names into parts. It didn't look for many Common patterns just put the first word it found in the First Name field and last word it found in the Last Name Field and left everything else in the Middle. I don't know how many times I had to cut Mac from the middle name and paste it into the last name, or cut the first part of the middle name and paste it into the first name field for things Like Mr. Todd P Payne. However, I still found his tool a time saver. I would say it got at least 90% of the names parsed correctly. It is kind of like the spell checkers, they help us but can't do it all. My boss always gave them two options... We will run it through the tool for $xxx and if you want to pay more we'll have Todd go through it row by row for $xxx a record. They smart ones choose just the tool and went through the records themselves, as they knew their data far better than any one else. Maybe I was just trying to pass the work off to some one else...
Thanks for listening. I don't know what I would do with 1,000,0000 names except put more people on it. If it was something I had to do every day to keep the business alive and well I would probably have some of those people coding parsers to help. At some point you reach a point of dimishing returns as I can't ever see not having to inspect the results for good quality control.
April 15, 2009 at 7:45 am
Your boss came up with pretty much the same solution we did. Thus, most lists arrived already formatted.
I must be looking at a different function than you are. The one you linked to, that I looked at, requires a second input with codes for each part of the name. For example, "Mr Joe Jones" would have a code like "T F L", for "Title First Last", while "Bobby Smith Jr" would have "F L S" for "First Last Suffix". It got more complicated from there, with upper-case letters meaning different things than lower-case letters. Each name run through the function would have to have that data added to it before it could run. That's what I'm looking at when I say each name would have to be handled by a person anyway, since someone would have to go through and figure out what combination of letters to put on it.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 15, 2009 at 9:20 am
We must be looking at two different function here is the link again. http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=56499 If we are I apoligize. I hate to see some ones code called a "Useless Toy" based on false assertions. I do agree creating a name parser that required the name and some format code that described the structure of the name wouldn't be very useful. And the task of determining the format string and typing it in for each record would pretty much eliminate any usefulness.
Here is what I read in the Authors inline documentation at the begining of the code he posted. If this isn't what you read I am sorry.
CREATE function FormatName(@NameString varchar(100), @NameFormat varchar(20))
returns varchar(100) as
begin
--blindman, 11/04
--FormatName parses a NameString into its component parts and returns it in a requested format.
--
--@NameString is the raw value to be parsed.
--@NameFormat is a string that defines the output format. Each letter in the string represents
--a component of the name in the order that it is to be returned.
--[H] = Full honorific
--[h] = Abbreviated honorific
--[F] = First name
--[f] = First initial
--[M] = Middle name
--[m] = Middle initial
--[L] = Last name
--[l] = Last initial
-- = Full suffix
-- = Abbreviated suffix
--[.] = Period
--[,] = Comma
--[ ] = Space
--Example: select dbo.Formatname('Reverend Gregory Robert Von Finzer Junior', 'L, h. F m. s.')
--Result: 'Von Finzer, Rev. Gregory R. Jr.'
You are correct that the function requires two arguments. But the format code is used to Define the Output format of the string the function returns not the structure of the original input. The first part of the authors code parses the @NameString variable into it parts and stores them in variables (@Honorific, @FirsName, @LastName, @MiddleName and @Suffix). This part of the code doesn't use the @NameFormat arguement, as far as I can tell. I have looked at the code a few times, but I have been wrong before, I think It basically follows your suggestion (I think it was your suggestion) to break the name into parts and look for patterns.
The second part of his code uses the parsed name to reformats the Original Name to the Output format defined by the @NameFormat argument. So If we use his function it is possible to get the last name. using the select statement.
select dbo.Formatname('Reverend Gregory Robert Von Finzer Junior','L')
It would be very easy to take the first part of his code and produce as stored procedure the returns the nameparts as output parameters.
April 16, 2009 at 12:13 am
amaizing this was really helpfull, thanks guys
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply