September 9, 2014 at 12:00 pm
Hi all, I have a new challenge for you!
I have written a simple routine to do some address checking between sources. The business owner wanted the check done to the address as a whole and to report the address' that did not return a match. The comparison works perfectly fine.
Well, now the customer has come back wanting to know what element(s) of the address make the match fail. I'm having a booger of a time coming up with an efficient solution.
Here's the basic address compare:
SELECT *
FROM Full_Address_Map fam
WHERE fam.StreetName + ' ' + fam.RoadType + ' ' + fam.CityName + ' ' + fam.StateName + ' ' + fam.Country COLLATE Cyrillic_General_CI_AI NOT IN
(
SELECT s.base_name + ' ' + s.road_type + ' ' + p.name + ' ' + v.name + ' ' + v.country_name COLLATE Cyrillic_General_CI_AI
FROM tbl_street s
INNER JOIN tbl_place p
ON s.parent_id = p.place_id
INNER JOIN state_list_view v
ON s.state_id = v.place_id
WHERE s.base_name COLLATE Cyrillic_General_CI_AI = fam.StreetName COLLATE Cyrillic_General_CI_AI
AND s.road_type COLLATE Cyrillic_General_CI_AI = fam.RoadType COLLATE Cyrillic_General_CI_AI
AND p.name COLLATE Cyrillic_General_CI_AI = fam.CityName COLLATE Cyrillic_General_CI_AI
AND v.name COLLATE Cyrillic_General_CI_AI = fam.StateName COLLATE Cyrillic_General_CI_AI
AND v.country_name COLLATE Cyrillic_General_CI_AI = fam.Country COLLATE Cyrillic_General_CI_AI
)
You can see that we are building the address by sticking the 5 columns together with a space in between to create a single string. We are then doing the same thing with the other source data and simply comparing the string. This worked perfectly and was nice and simple... until the customer decided he wanted to see the element(s) causing the issue.
For example, let's say there is an address of '123 Main ST Dallas Texas United States' in one source, but it's '123 Main Dallas Texas United States' in the other. The customer wants to know that it is the road type (ST) that caused the mismatch.
Now, I wrote a bunch of queries to try and single out each element in order to determine if it is the one causing the issue or not. It seems to work partially, but I don't believe it is the best approach and it doesn't work if more than one column is the culprit.
Any ideas? I'm not opposed to changing the way I do the initial address compare, if that will prove to be the better approach.
September 9, 2014 at 7:25 pm
I don't have access to a pc at the moment otherwise I would give a better example...
You could accomplish this using a splitter and splitting the strings using the space as a delimiter; note the splitter in my signature line.
I will put an example together when I am at a pc (likely not tonight)
-- Itzik Ben-Gan 2001
September 9, 2014 at 8:34 pm
Ok, I had a few minutes to kill. There will be a little more work to get this to work with what you have done thus far.
The Function:
CREATE FUNCTION dbo.stringDiff(@s1 varchar(100), @s2 varchar(100))
RETURNS TABLE AS
RETURN
WITH
s1 AS
(
SELECT s = 1, *
FROM dbo.DelimitedSplit8K(@s1,' ') s1
),
s2 AS
(
SELECT s = 2, *
FROM dbo.DelimitedSplit8K(@s2,' ') s1
)
SELECT TOP 1
s1 = @s1,
s2 = @s2,
Diff_Position = s1.ItemNumber,
s1_txt = isnull(s1.Item,''),
s2_txt = isnull(s2.Item,'')
FROM s1
LEFT JOIN s2 ON s1.Item = s2.Item
WHERE isnull(s1.Item,'') <> isnull(s2.Item,'')
ORDER BY s1.ItemNumber;
GO
Use:
DECLARE @s1 varchar(100) = '123 Main ST Dallas Texas United States',
@s2 varchar(100) = '123 Main Dallas Texas United States';
SELECT * FROM dbo.stringDiff(@s1, @s2);
-- Itzik Ben-Gan 2001
September 10, 2014 at 7:59 am
-- If you change your correlated NOT IN subquery to a standard NOT EXISTS subquery like this:
SELECT *
FROM Full_Address_Map fam
WHERE NOT EXISTS (
SELECT 1
FROM tbl_street s
INNER JOIN tbl_place p
ON s.parent_id = p.place_id
INNER JOIN state_list_view v
ON s.state_id = v.place_id
WHERE s.base_name COLLATE Cyrillic_General_CI_AI = fam.StreetName COLLATE Cyrillic_General_CI_AI
AND s.road_type COLLATE Cyrillic_General_CI_AI = fam.RoadType COLLATE Cyrillic_General_CI_AI
AND p.name COLLATE Cyrillic_General_CI_AI = fam.CityName COLLATE Cyrillic_General_CI_AI
AND v.name COLLATE Cyrillic_General_CI_AI = fam.StateName COLLATE Cyrillic_General_CI_AI
AND v.country_name COLLATE Cyrillic_General_CI_AI = fam.Country COLLATE Cyrillic_General_CI_AI
)
-- which is very nearly the logical equivalent, then you can easily eliminate predicates from the WHERE clause.
-- You will probably have to decide which columns you can safely mis-match on, or you could end up with tens of thousands of
-- false positives: for instance, mismatches on StreetName + RoadType would give everybody in the same city/state/country.
-- Google "fuzzy matching" to learn about how this exercise is usually done.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply