October 28, 2008 at 12:18 pm
Greetings all. I have a situation where I save an Excel file as tab delimited text, then BCP it to a temp table. No problem there. The problem is that there is a field for a job number, that is often missing one digit, or it is fat fingered, and I have to search my database for likely values, then change it to what it is supposed to be before I save it. Today I came across one I could not easily identify, so I made a procedure to do it more efficiently. The logic is that the wrong job number in question will be entered to the procedure as a string. If there is simply 1 digit missing, no problem. If there are any adjacent extra digits, then the number of extra digits will have to be entered as a parameter as well. So, here it is.
First, create test table and populate the test values
IF OBJECT_ID('TempDB..test','u') IS NOT NULL
DROP TABLE test
CREATE TABLE test
(
string VARCHAR(20)
)
INSERT INTO test
SELECT 'ABC1234' UNION ALL
SELECT 'ABC2345' UNION ALL
SELECT 'ABC3456' UNION ALL
SELECT 'ABC5678'
Now, the procedure.
CREATE PROCEDURE spTest
@string VARCHAR(100),--This is the bad job number
@Len SMALLINT,--this should be the length of @string
@NumberOff TINYINT--this is the number of "fat finger" digits, 0 of 1 digit missing
AS
--SELECT @Len = LEN(@String) can't figure out where to put this
--First part selects values where @string is missing 1 digit.
--@NumberOff does not affect this part.
--It basically breaks test.string into two parts, removing sequential
--digits, to match @string
SELECT DISTINCT
string
FROM test t1,
tally t2
WHERE SUBSTRING(string,0,n)+ SUBSTRING(string,n+1,LEN(string)-n) = @string
AND t2.n < = LEN(String)
UNION ALL
--The second part finds matches where @string has been fat fingered, and
--will be off by the number of characters specified in @NumberOff.
--I realize this only works if the fat fingered digits are adjacent, but
--this covers the vast majority of my problem records.
--This part breaks @string into two parts, lopping off the specified
--number of digits, to try to match test.string.
SELECT DISTINCT
string
FROM test t1,
tally t2
WHERE SUBSTRING(@String,0,n)+SUBSTRING(@String,n+@NumberOff,@len+1) = string
AND t2.n < = LEN(@String) + @NumberOff
Now see the results
EXEC spTest 'ABC134',6,1
Now, try it with one that has been fat fingered
EXEC spTest 'ABC122234',8,2
My trouble is that I can not figure out how to set @Len = LEN(@String), once I pass @String. I do not want to have to enter this value manually. I have tried putting SELECT @Len = LEN(@String) at various places to no avail. Also, I can not figure out how to make @NumberOff default to 0 if no value is supplied. I don't want to have to manually enter it unless I am working with a fat fingered record. I know it is a simple sytax thing, but I can't find it. Any ideas would be greatly appreciated.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
October 28, 2008 at 7:06 pm
Greg,
A couple things. The first question is, how important is this data? I normally try to avoid this type of manipulation, as you are just about as likely in any given situation to replace a "fat fingered" data value with an "incorrect" data value that actually matches a record as you are the correct data value. Missing or obviously bad (because the length is too long) data is usually a lot easier to find / fix than data that has modified to be simply inaccurate. It is usually safer from what I've seen as well. IE. I would rather be missing an invoice for a customer than assign the invoice to the wrong customer / try to bill them. Once you manipulate this data and attempt to "figure out" which row they mean, you will have little chance of knowing that the data is wrong later. Beyond this, if your users ever somehow figure out that they can "fat finger" said data and have it still go in correctly, your problem just grew immensely. If you're going to do this type of manipulation, at the very least I would insert their original data into the table as well, so if/when things come up wonky, you at least have somewhere to start looking. (IE. "Oh, ABC122334 was supposed to be ABC1234 NOT ABC1223!" ) Personally, I would probably try to pull these rows of bad data aside and have an actual person look at them / correct them, rather than trying to build it all into a script. No matter how many scenarios you think you have covered, some user will almost always figure out a way that completely defies your comprehension to circumvent your rules.
Now that that warning is out of the way, let me ask... why do you even need a @Len parameter? I only see it being used in one place, and if all it is supposed to be is the length of the string you passed in, why can't you simply replace it with LEN(@String) ? Perhaps I'm not understanding that.
Also, does that sp actually run for you? I'm getting invalid length parameter passed to substring function based on the second substring in the first select every time I run it (unless I comment that piece out). Haven't figured out precisely why yet. It is somehow feeding a -1 in for the length which is what causes that error, but I haven't been able to track down exactly why it's getting to -1 yet. My tally table on the tempdb is only 12 rows, how many is yours?
October 28, 2008 at 8:25 pm
Seth
Thanks for the reply. Probably I should have mentioned that I am using the result set only as a starting set of job numbers. Certainly there are going to be several "real" job numbers that could be a match for a fat fingered one. So, let's say I enter in a string, and get three matches. I know that out of around 50,000 job numbers in my jobs table, I only need to investigate three, by referencing other information in the data file, such as dates, and contractor names. I never intend to use this as an automated tool, but only to make my investigation easier. Now, you are right about the @Len parameter. You would not believe the convoluted mess I was working with to get it this far. I first had only the top portion, then realized I would need to have the ability to check for fat fingers as well, and I can't say exactly why, but there was a point that I actually needed it. I started to combine them, taking out stuff until it broke. Maybe some further explanation will help.
Let's take the first part first, the scenarios where the given job number is missing one character. And, I realize it was not wise to name the field in question 'string', so I have renamed it 'Job', so I'll begin with the test set up again, and simplified procedure, without the comments, and without @Len.
IF OBJECT_ID('test','u') IS NOT NULL
DROP TABLE test
CREATE TABLE test
(
Job VARCHAR(20)
)
INSERT INTO test
SELECT 'ABC1234' UNION ALL
SELECT 'ABC2345' UNION ALL
SELECT 'ABC3456' UNION ALL
SELECT 'ABC5678'
GO
ALTER PROCEDURE spTest
@string VARCHAR(100),--This is the bad job number
@NumberOff TINYINT--this is the number of "fat finger" digits, 0 of 1 digit missing
AS
SELECT DISTINCT
Job
FROM test t1,
tally t2
WHERE SUBSTRING(Job,0,n)+ SUBSTRING(Job,n+1,LEN(Job)-n) = @string
AND t2.n < = LEN(Job)
UNION ALL
SELECT DISTINCT
Job
FROM test t1,
tally t2
WHERE SUBSTRING(@String,0,n)+SUBSTRING(@String,n+@NumberOff,LEN(@String)+1) = Job
AND t2.n < = LEN(@String) + @NumberOff
And now an attempt to make sense out of it
--Ok, assuming you have the test table set up, this is what's happening.
--The procedure is saying, "ok, I have this job number, 'ABC134' that is missing
--one character." What it does is take away each character from each job
--number in the table, one position at a time, and returns the job numbers where,
--if you take away any one character, equals the invalid job number
--passed as the @string variable. The below is what it is doing, with
--hard coded values instead of parameters. I added the underscorre to represent
--the character that is being taken away for each value of n. You can see the underscore
--moving along, which represents the character taken away. In reality,
--there would not be the underscore, just a missing character from the
--real job number
SELECT SUBSTRING(Job,0,1)+ '_' + SUBSTRING(Job,1,6) FROM test
SELECT SUBSTRING(Job,0,2)+ '_' + SUBSTRING(Job,2,5) FROM test
SELECT SUBSTRING(Job,0,3)+ '_' + SUBSTRING(Job,3,4) FROM test
SELECT SUBSTRING(Job,0,4)+ '_' + SUBSTRING(Job,4,3) FROM test
SELECT SUBSTRING(Job,0,5)+ '_' + SUBSTRING(Job,5,2) FROM test
SELECT SUBSTRING(Job,0,6)+ '_' + SUBSTRING(Job,6,1) FROM test
SELECT SUBSTRING(Job,0,7)+ '_' + SUBSTRING(Job,7,0) FROM test
--Now see for yourself
EXEC spTest 'ABC124',0
Now for the second part, I just hope I can explain it well
--Now the second part. This is kind of a reversal of the first part, exept that
--it removes characters from the fat fingered job number, passed as @string, and
--returns all the job numbers in the table that equal @string minus any specified
--quantity of sequential characters. I added the underscore to represent two character
--positions that are being removed, and you can see it move along.
DECLARE @string VARCHAR(20)
SELECT @string = 'ABC122234'
SELECT SUBSTRING(@string,0,1)+ '__' + SUBSTRING(@string,3,8)
SELECT SUBSTRING(@string,0,2)+ '__' + SUBSTRING(@string,4,7)
SELECT SUBSTRING(@string,0,3)+ '__' + SUBSTRING(@string,5,6)
SELECT SUBSTRING(@string,0,4)+ '__' + SUBSTRING(@string,6,5)
SELECT SUBSTRING(@string,0,5)+ '__' + SUBSTRING(@string,7,4)
SELECT SUBSTRING(@string,0,6)+ '__' + SUBSTRING(@string,8,3)
--Now try it and see
EXEC spTest 'ABC122234',2
I hope I have explained it better, and I am sorry for leaving in an unnecessary variable which confused things. Maybe it is more clear now, and my only problem is that I want @NumberOff to default to 0 unless I specify a value. I guess that was a lot of time wasted to ask how to make the parameter optional, when I should be able to find the answer myself, but it's not my fault...I was confused. So, maybe another question is, how do you seach a column for a value that is either missing charactors, or has extra ones? Thanks for your input.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
October 28, 2008 at 8:46 pm
Garadin (10/28/2008)
The first question is, how important is this data?
Also, does that sp actually run for you? I'm getting invalid length parameter passed to substring function based on the second substring in the first select every time I run it (unless I comment that piece out). Haven't figured out precisely why yet. It is somehow feeding a -1 in for the length which is what causes that error, but I haven't been able to track down exactly why it's getting to -1 yet. My tally table on the tempdb is only 12 rows, how many is yours?
I just realized I did not address two of your questions.
Firstly, I am the sole owner of the data, so no one else ever sees it, nor do any reports come from it, nor can anyone else touch it except for the DBA, and he does not mess with my stuff.
Secondly, I did get that error several times, but I was not getting it when I posted, and anyway, I think it should go away now that @Len is gone.
Lastly, this one is just because I was bored at work, so it was more of an academic exercise for me rather than need driven. This stuff is not even my job. So, with that being said, I am still interested in finding an all inclusive solution to my dilema.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
October 28, 2008 at 11:17 pm
Greg Snidow (10/28/2008)
Firstly, I am the sole owner of the data, so no one else ever sees it, nor do any reports come from it, nor can anyone else touch it except for the DBA, and he does not mess with my stuff.Lastly, this one is just because I was bored at work, so it was more of an academic exercise for me rather than need driven. This stuff is not even my job. So, with that being said, I am still interested in finding an all inclusive solution to my dilema.
Good. Between that and what you're actually doing with the list you're generating, my first point is moot in this case... however, I feel it is important that it be here for people who see this and think it might be a solution to a real production issue with typos.
Secondly, I did get that error several times, but I was not getting it when I posted, and anyway, I think it should go away now that @Len is gone.
That's not where the error was coming from. It was coming from your n-1 in your first query. I've put in code to prevent it from happening in my updates to your query below.
I guess that was a lot of time wasted to ask how to make the parameter optional, when I should be able to find the answer myself, but it's not my fault...I was confused.
It's not wasted time. This is a good excersize that many people might learn from now that it is posted here.
A few additional notes. Since you are currently in the debugging stage of this operation, might I suggest you set up your environment like this:
[font="Courier New"]DECLARE @Test TABLE
(
Job VARCHAR(20)
)
INSERT INTO @test
SELECT 'ABC1234' UNION ALL
SELECT 'ABC2345' UNION ALL
SELECT 'ABC3456' UNION ALL
SELECT 'ABC5678'
--ALTER PROCEDURE Testsp
--/*
--DEBUG
DECLARE
@string VARCHAR(100), --This is the bad job number
@NumberOff TINYINT --=NULL--this is the number of "fat finger" digits, 0 of 1 digit missing
SET @String = 'ABC122234'
SET @NumberOff = 2 -- If you comment this out, it will use 0
-- */
--AS
IF @NumberOff IS NULL SET @NumberOff = 0
SELECT DISTINCT
Job
FROM @test t1,
tally t2
WHERE SUBSTRING(Job,0,n)+ SUBSTRING(Job,n+1,CASE WHEN LEN(Job)-n < 0 THEN 0 ELSE LEN(Job)-n END) = @string
AND t2.n < = LEN(Job)
UNION ALL
SELECT DISTINCT
Job
FROM @test t1,
tally t2
WHERE SUBSTRING(@String,0,n)+SUBSTRING(@String,n+@NumberOff,LEN(@String)+1) = Job
AND t2.n < = LEN(@String) + @NumberOff
[/font]
This setup has several advantages. You do not have to worry about dropping your temp table(s) every time. You can change the data that is in them extremely easily. You can run the entire script without having to worry about highlighting crap every time. You can change input parameters on the fly. You don't have to worry about whether or not you ran the changes you just made to the script before you tried to execute the sp again, etc. etc. Even if you don't like the idea of the table variables, commenting out the ALTER / CREATE SP line and putting a little debug section in the top of your SP can save you a lot of time while you're trying to get things running. Perhaps you are already doing something like this, but I figured I'd mention it anyways.
There are two other things of note about the above. One is:
ALTER PROCEDURE ...
@NumberOff TINYINT =NULL
AS
IF @NumberOff IS NULL SET @NumberOff = 0
This removes the requirement for that input parameter to be submit when running the query. You can still submit it if you like, but you don't have to. One thing to keep in mind with optional parameters though is that you may need to start using declarative parameter input once you start making them optional. IE. Instead of exec testsp 'ABC1234',3 It would be exec testsp @string='ABC1234', @NumberOff=2.
The other is the CASE I added to the substring. I'm finding that you can stick a Case statement prettymuch anywhere(I'm kinda curious as to where they can't be used, I've stuck them in at least 3 different places today that I really didn't expect them to work... this being one of them), and it's one of the only ways to ensure that the optimizer doesn't hose this code.
WHERE SUBSTRING(Job,0,n)+ SUBSTRING(Job,n+1,CASE WHEN LEN(Job)-n < 0 THEN 0 ELSE LEN(Job)-n END) = @string
I think I've covered everything. Sorry if I've missed any questions in all that. Thanks for giving such thorough answers to all my questions.
October 29, 2008 at 6:30 am
Seth
Thank you so much for the time and thought you put into this. I tested it against a production table, and it purred like a kitten. It never occured to me to use an IF statement to deal with setting a varible value. Just one more concept to add to the list. Thanks again.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply