October 7, 2010 at 1:04 pm
Hi guys,
can you help pls.
I have a "person" table with the following fields: title, forename, surname, fullname, address1, address2, postcode etc.
Data is loaded into this table daily via a file (with the same field names)received from an external source. Unfortunately, the source data was corrupted for a while and the data that was sent under the "fullname" field was actually the "address1" and the "address1" was under "address2" - with all the fields after shifting 1 place forward. The source data has been corrected but we need to fix the ones already loaded into the person table. I can easily fix this by doing an UPDATE on the error rows.
The problem now is how do i find them? There's no logic to this so the only thing i can think of is a query that does a "partial" comparison of the surname and fullname fields i.e
--Check "fullname" to see if some of the sting inside it matches the "surname". This will give me all the ones with the error fullname.
I tried the queries below but they didn't do exactly what i expect. Can you help pls. Can you even use LIKE to compare 2 fields?
SELECT * FROM person
WHERE fullname NOT LIKE '%[surname]%'
-----------------------------------------
SELECT * FROM person a
WHERE a.fullname not like '%(SELECT surname
FROM person
WHERE personID = a.personID)%'
Thanks in advance.
k
October 7, 2010 at 5:28 pm
I do not think comparing fullname to surname like that would be wise, because even if the data was correct it would come up as true. For example 'John Smith' would match for '%Smith%' because John is not like Smith.
The source file that you are using to supply your data, does it always fill in every field? Is it possible that when the data was corrupt and shifted over one column that a field did not get inserted like it should? That would allow you to be able to do a search where a particular column was null. It is probably not the best solution, but it is most likely easier than doing a substring comparison.
Joie Andrew
"Since 1982"
October 7, 2010 at 7:41 pm
Nothing much to do this evening so I will post your table structure the way you should have done it in order to receive a tested answer.
CREATE TABLE #T (title VARCHAR(20) NULL,
forename VARCHAR(20) NULL, surname VARCHAR(20) NULL,
fullname VARCHAR(20) NULL
, address1 VARCHAR(20) NULL, address2 VARCHAR(20) NULL, postcode VARCHAR(20) NULL)
Ditto with some sample data:
INSERT INTO #T
SELECT 'Mgr','Ron','Smith','Ron Smith','361 1st St',' ','99087' UNION ALL
SELECT 'DBA','Sam','Jones','Sam Jones','366 15th St',' ','99088' UNION ALL
SELECT 'DBA','Alice','Jones','366 15th St',' ','99088',' ' UNION ALL
INSERT INTO #T
SELECT 'DBA','Alice','Brown','Alice Brown', '1 33rd St','99 5th St','88765'
Now 2 statements that may assist you in locating incorrect data:
SELECT * FROM #T WHERE forename + ' ' + surname <> fullname
SELECT * FROM #T WHERE LEN(postcode)= 0
Both will return:
DBAAliceJones366 15th St 9908
Now once you have tested the select statements, I will leave it up to you to determine the proper sequence for updating each field. Remember the update sequence must not destroy data in the column being updated, before that columns data has updated the column it should have been in had the data been submitted correctly.
Here is one way to perform the update. (Warning before using any of this code, test, test and test again)
UPDATE #T SET postcode = Address2,
address2 = address1,
address1 = fullname,fullname = forename + ' ' + surname
WHERE forename + ' ' + surname <> fullname
But if you have data like this:
'DBA', 'John','Bullock','J. Bulllock','261 Ave 77',' ', '11123'
Where the fullname is NOT the forename + surname - the method(s) given above will not work properly.
October 8, 2010 at 4:39 am
Thanks so much for taking the time to answer this guys – I really appreciate it. Unfortunately the field that is left empty when the rest of fields are shifted 1 place forward is not one that is always populated anyway - so doing a "where field is null” or "len() = 0" will not work – it will bring lots of legitimate rows up.
Also, the “fullname” is NOT always a combination of all the name fields i.e
SELECT * FROM person
WHERE fullname <> (title + ' '+ forename + ' '+ surname)
sometimes 1 or 2 of these name fields can be missing so the "fullname" is just a combination of what’s left - or sometimes it's a manually entered as title & surname. Not exactly sure how this fullname is generated at source, but the only thing I am sure of is that it always contains the surname in it.
This is why I was thinking of something to check each “fullname” against the corresponding “surname” - if the surname does not exist as a string in the fullname, then most likely it contains the address and hence one of the error records.
Thanks
October 8, 2010 at 6:31 am
Couldnt you try to find which record have the postcode in the wrong position? Depends a bit offcourse on how your postcode looks like. But here in Sweden an IsNumeric would work since ours contains just numbers.
That would also tell you how far the data has been shifted and you could use that information to correct the data.
So building on BitBuckets exampelcode you would wind up with something like
update #T
set PostCode = Address2,
Address2 = Address1,
Address1 = FullName,
FullName = ForeName + ' ' + SurName
where IsNumeric(Address2) = 1
update #T
set PostCode = Address1,
Address2 = FullName,
Address1 = '',
FullName = ForeName + ' ' + SurName
where IsNumeric(Address1) = 1
Would offcourse only work if you know all entries have that data and that its numeric.
/T
October 8, 2010 at 7:08 am
askquestions (10/8/2010)
This is why I was thinking of something to check each “fullname” against the corresponding “surname” - if the surname does not exist as a string in the fullname, then most likely it contains the address and hence one of the error records.
If that is the case then
WHERE fullname NOT LIKE '%'+surname'
should work
otherwise you could check for surname and either title or forename
WHERE NOT (fullname LIKE '%'+surname
AND (fullname LIKE '%'+title+'%' OR fullname LIKE '%'+forename+'%'))
Far away is close at hand in the images of elsewhere.
Anon.
October 8, 2010 at 8:20 am
I am getting close guys! David's code below:
select * from #T WHERE fullname NOT LIKE '%'+ surname
seems to do what i wanted, but threw me a suprise when I tested it further by adding an extra row where the fullname is not a combination of the name fields but has been manually entered as initial + surname (using bitbucket's code below:)
insert into #T
select 'DBA', 'John','Bullock','J. Bulllock','261 Ave 77',' ', '11123'
Now, to see what the output is
select * from #T
select * from #T WHERE fullname NOT LIKE '%'+ surname
select * from #T
WHERE NOT (fullname LIKE '%'+surname
AND (fullname LIKE '%'+title+'%' OR fullname LIKE '%'+forename+'%'))
titleforenamesurnamefullnameaddress1address2postcode
MgrRonSmithRon Smith361 1st St 99087
DBASamJonesSam Jones366 15th St 99088
DBAAliceJones366 15th St 99088
DBAAliceBrownAlice Brown1 33rd St99 5th St88765
DBAJohnBullockJ. Bulllock261 Ave 77 11123
titleforenamesurnamefullnameaddress1address2postcode
DBAAliceJones366 15th St 99088
DBAJohnBullockJ. Bulllock261 Ave 77 11123
titleforenamesurnamefullnameaddress1address2postcode
DBAAliceJones366 15th St 99088
DBAJohnBullockJ. Bulllock261 Ave 77 11123
I was execting to see "Alice Jones" in the result because her surname doesn't match what's in fullname. However, manually entered "J. Bullock" is a suprise! His surname matches a string in fullname - so why is he in the result?
HELP PLEASE
October 8, 2010 at 9:40 am
Askquestions - You have been starring at this for far too long, Now note what you entered.
DBA John Bullock J. Bulllock
See the different spelling for the surname and the surname in the full name field. Here let me point it out again
bu ll ock vs Bu lll ock
Do not feel too bad, it took me about 5 minutes to see the difference.
October 9, 2010 at 5:05 am
omg! I have definitely been starring at this TOO long! I was suddenly obvious AFTER you pointed it out.
I guess this case is closed then.
Many thanks for all your help guys, much appreciated.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply