February 14, 2006 at 7:30 am
Im looking at ways of keep duplicates on our system to a low. one way that I wanted to try was to match all the data entered in a week to the current data. and then score the data each time.
For example my query below is what ive started but realise its much harder and involved at teh same time.
SELECT CX.ROW_ID, CX.FST_NAME, CX.LAST_NAME, F.NAME ,'PARENT' =CASE WHEN CX.CREATED > '2006-02-01 00:00:00.000' THEN 'X' ELSE '' END
FROM S_CONTACT CX
JOIN FIRM F ON F.ID = CX.FIRM_ID
WHERE lower(CX.LAST_NAME) in (
SELECT Lower(C.LAST_NAME)
FROM CONTACT C
WHERE C.CREATED>'2006-02-01 00:00:00.000'
AND C.STATUS <> 'Marked For Archive')
AND CX.STATUS <> 'Marked For Archive'
Order by CX.LAST_NAME ASC,CX.CREATED desc, CX.FST_NAME
This gives me a nice list of Contacts listing the newest one at the top and ordering the rest in a relatively easyt to use list
Id like this to be a little smarter , ie showing only where
A) the initial is the same
B) if the firm name is similar (i wanted to do some kind of score based on the characters in the firm names
any ideas on how I could do this
February 14, 2006 at 8:38 am
To find out the duplicates names you should use this query:-
select CX.FST_NAME+CX.LAST_NAME,count(*) num
from table_name
group by CX.FST_NAME+CX.LAST_NAME
having count(*) > 1
February 14, 2006 at 8:42 am
Try looking at the SOUNDEX and DIFFERENCE functions.
--Jeff
February 14, 2006 at 8:46 am
Thanks Amit unfortunately it never is that easy, suppose you have two people entered at the Smith & Co Company
The First is Abbey Smith, the second is A B Smith
This would mean that your query wouln't see that
Further to the query I posted though I was looking at Just joining as a subquery but I can't do that either as if someone is entered on to the database and someone else with the surname in the week, the report will return 2x the number of people sharing that name
Jeff, thanks for your note, on a much early post I did include Soundex and difference as part of the query but that seemed to hit the CPU much harder as the Names will have non Alpha numeric characters and inorder for the Query to work positively I would first strip down all the Characters in the Last/First name
and then when it scans the whole contact table I will need to remove the characters from each of the names in turn inorder to ensure that the SOUNDEX / DIFFERENCE worked
February 14, 2006 at 11:33 am
One thing to do is to store the SOUNDEX of each row as a column. Also, store the stripped down names as colums. It's easier to amortize the cost of manipulating each name over each insert/update than doing it everytime a query is done for all records.
--Jeff
February 15, 2006 at 2:49 am
Intresting, how do you mean store the value of soundex as a column ?(or did you mean store the value in a column)
unfortunately our DBA doesn't like us writing to the database unless its through the front end so just adding a column would take at least 2 months before it actually gets done.
time to create everything in excel then
February 15, 2006 at 8:42 am
What I mean is that as part of the code that inserts a new customer, populate a column with the SOUNDEX of the stripped down customer name. An example script would be:
create table Customer
(
CustomerID int identity(1,1),
--REST OF CUSTOMER FIELDS
CustomerSoundex char(4)
)
On the insert:
insert customer (...column declaration...)
values (...rest of customer data..., SOUNDEX(dbo.fnStripCustomerName(...))
You would also have to write the function dbo.fnStripCustomerName() to take the customer name arguements and strip and concatentate them according to the rule you mentioned.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply