Duplicates

Question

Duplicates

faulknerwilliam2

SSCrazy

Points: 2299
More actions
March 26, 2015 at 1:27 am

#302520

Hello
I have 4 tables:
Customer address, name, telephone number and email address:
SELECT A.Contact_id, A.first_name, A.last_name,
B.Contact_id, B.Address_1, B.Address_2, B.Address_3, B.zipcode
C.Contact_id, C.Telephone,
D.Contact_Id_D.Email,
FROM Contact AS A,
Addresses AS B,
Telephones AS C,
Emails as D,
INNER JOIN A.contact_id = B.contact_id
INNER JOIN c.contact_id = a.contact_id;
INNER JOIN d.contact_id = c.contact_id
I want to find out if I have duplications in name (surname) and address. However I also want to show details like the customer's telephone number in my report.
So I have basically linked the 4 tables together with inner joins. Next step is trying to find where surname and address 1 and zip code are duplicates.
Do you have any suggestions on how I can achieve this?
Many thanks.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286590 More actions · Answer 1

faulknerwilliam2 (3/26/2015)
Hello
I have 4 tables:
Customer address, name, telephone number and email address:
SELECT A.Contact_id, A.first_name, A.last_name,
B.Contact_id, B.Address_1, B.Address_2, B.Address_3, B.zipcode
C.Contact_id, C.Telephone,
D.Contact_Id_D.Email,
FROM Contact AS A,
Addresses AS B,
Telephones AS C,
Emails as D,
INNER JOIN A.contact_id = B.contact_id
INNER JOIN c.contact_id = a.contact_id;
INNER JOIN d.contact_id = c.contact_id
I want to find out if I have duplications in name (surname) and address. However I also want to show details like the customer's telephone number in my report.
So I have basically linked the 4 tables together with inner joins. Next step is trying to find where surname and address 1 and zip code are duplicates.
Do you have any suggestions on how I can achieve this?
Many thanks.

Your question is pretty confusing. Can you post ddl, sample data and desired output?

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

faulknerwilliam2 SSCrazy Points: 2299 More actions · Answer 2

Hello Sean,

Yes, it was a hopeless entry. I reposted with this, which I hope is clearer:

Hello, I posted earlier with a poor script. Below is a better one.

It pulls through all customers, plus their addresses, email and phone numbers.

However some have duplicate surnames and addresses.

Please can you advise an amendment to the script that would show which records were duplicates? Ideally I'd like the output to show where the surname, address 1 and the zipcode were identical - and restrict the output to show duplicate rows only.

Many thanks.

SELECT

Reference.ReferenceNumber AS [Reference.ReferenceNumber)

Reference.LastName

Reference.FirstName

Address.ReferenceNumber AS [Address.ReferenceNumber]

Address.Address1

Address.Address2

Address.Address3

Address.Address4

Address.ZipCode

Telephone.ReferenceNumber AS [Telephone.ReferenceNumber]

Telephone.TelephoneNumber

Email.ReferenceNumber AS [Email.ReferenceNumber]

Email.EmailAddress

FROM

Reference

INNER JOIN Address

ON Reference.ReferenceNumber = Reference.ContactNumber

LEFT OUTER JOIN Telephone

ON Reference.ReferenceNumber = Telephone.ReferenceNumber

LEFT OUTER Join Email

ON Reference.ReferenceNumber = Email.ReferenceNumber

Sean Lange SSC Guru Points: 286590 More actions · Answer 3

faulknerwilliam2 (3/26/2015)
Hello Sean,
Yes, it was a hopeless entry. I reposted with this, which I hope is clearer:
Hello, I posted earlier with a poor script. Below is a better one.
It pulls through all customers, plus their addresses, email and phone numbers.
However some have duplicate surnames and addresses.
Please can you advise an amendment to the script that would show which records were duplicates? Ideally I'd like the output to show where the surname, address 1 and the zipcode were identical - and restrict the output to show duplicate rows only.
Many thanks.
SELECT
Reference.ReferenceNumber AS [Reference.ReferenceNumber)
Reference.LastName
Reference.FirstName
Address.ReferenceNumber AS [Address.ReferenceNumber]
Address.Address1
Address.Address2
Address.Address3
Address.Address4
Address.ZipCode
Telephone.ReferenceNumber AS [Telephone.ReferenceNumber]
Telephone.TelephoneNumber
Email.ReferenceNumber AS [Email.ReferenceNumber]
Email.EmailAddress
FROM
Reference
INNER JOIN Address
ON Reference.ReferenceNumber = Reference.ContactNumber
LEFT OUTER JOIN Telephone
ON Reference.ReferenceNumber = Telephone.ReferenceNumber
LEFT OUTER Join Email
ON Reference.ReferenceNumber = Email.ReferenceNumber

Starting with a query that actually works would be a big improvement.

Why do you want to return the reference number over and over. They will either be NULL or the same value because that is your join criteria.

And since you still didn't post ddl, sample data or desired output this is completely untested.

with Duplicates as

(

select r.LastName

, a.Address1

, a.ZipCode

, COUNT(*) as DuplicateCount

FROM Reference r

INNER JOIN Address a ON a.ReferenceNumber = r.ReferenceNumber --?? not sure what is right here

LEFT OUTER JOIN Telephone t ON r.ReferenceNumber = t.ReferenceNumber

LEFT OUTER Join Email e ON r.ReferenceNumber = e.ReferenceNumber

group by r.LastName

, a.Address1

, a.ZipCode

having COUNT(*) > 1

)

SELECT

r.ReferenceNumber

, r.LastName

, r.FirstName

--Address.ReferenceNumber AS [Address.ReferenceNumber] why bother? This is the same value

, a.Address1

, a.Address2

, a.Address3

, a.Address4

, a.ZipCode

--, t.ReferenceNumber AS [Telephone.ReferenceNumber]

, t.TelephoneNumber

--, e.ReferenceNumber AS [Email.ReferenceNumber]

, e.EmailAddress

, d.DuplicateCount

FROM Reference r

INNER JOIN Address a ON a.ReferenceNumber = r.ReferenceNumber --?? not sure what is right here

LEFT OUTER JOIN Telephone t ON r.ReferenceNumber = t.ReferenceNumber

LEFT OUTER Join Email e ON r.ReferenceNumber = e.ReferenceNumber

join Duplicates d on d.LastName = r.LastName

AND d.Address1 = a.Address1

AND d.ZipCode = a.ZipCode

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

faulknerwilliam2 SSCrazy Points: 2299 More actions · Answer 4

Thanks for taking the time and your patience; I'm relatively new to actual scripting, as is probably obvious.

The query worked fine, in that it pulled through all our customers: their names, addresses, phone numbers and email addresses. The first join that you question, the INNER, is to link the address table to the reference table. Should I be doing that in some other way?

My ultimate aim is to return duplicate values only.

Hope this is clear(er)?

Sean Lange SSC Guru Points: 286590 More actions · Answer 5

faulknerwilliam2 (3/26/2015)Should I be doing that in some other way?

Not necessarily but your join condition did not reference the address table.

INNER JOIN Address
ON Reference.ReferenceNumber = Reference.ContactNumber

If there is something else not working in your query you need to provide ddl, sample data and desired output because I have no idea what that might be. Please take a few minutes and read the first link in my signature.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

faulknerwilliam2 SSCrazy Points: 2299 More actions · Answer 6

Dear Sean

The script you suggested above worked, but (due to my poor explanation of what I actually wanted) the data that's come out isn't quite what I want.

Basically our contact centre sometimes logs the same person without checking if they have contacted us before and so the customer ends up with two (sometimes more!) reference numbers. We want to cleanse this, so:

I would like to output instances where the customer's surname, address1 and zipcode are duplicated but only if the customer has different reference numbers.

This is the type of data that I'd like to see output:

Ref Number Surname Address1 Zip

1875 Faulkner 10 Smith Street 08540

1876 Faulkner 10 Smith Street 08540

If we had input John Smith, 11 Smith Street, 08976 twice, but using the same reference number (as should be happening) then I wouldn't want to see him being output in the report.

So basically, your script suggestion returns all duplicates, not those that duplicate except for the reference number.

Do you have any advice on how I can achieve this?

Many thanks.

pietlinden SSC Guru Points: 63503 More actions · Answer 7

Could you please post the scripts for the two tables in the problem and enough inserts statements to recreate the problem you are having? If you ask a question and include code to create an populate the tables in your problem, you are more likely to get tested solutions instead of haphazard guesses. Go ahead and obfuscate your data if you need to, but if we have nothing to work with, it's really hard to come up with a solution that will actually work.

faulknerwilliam2 SSCrazy Points: 2299 More actions · Answer 8

Will do. Apologies for my many egregious breaches of forum etiquette: I will do better next time.

faulknerwilliam2 SSCrazy Points: 2299 More actions · Answer 9

As per the request above, please see:

create table Reference(referencenumber int primary key identity(1,1),lastname varchar(20),firstname varchar(20))

create table [Address](addressid int primary key identity(1,1),referencenumber int references reference(referencenumber),address1 varchar(20),

city varchar(20),state varchar(2),country char(2),zipcode varchar(20))

go

insert into reference values('lincoln','sam'),('wright','mike'),('lincoln','will'),('lincoln','sam'),('lincoln','sam')

go

insert into [Address] values(1,'east avenue','atlanta','GA','US',12345),

(2,'south avenue','atlanta','GA','US',12345),

(3,'west avenue','atlanta','GA','US',12345),

(4,'east avenue','atlanta','GA','US',12345),

(5,'east avenue','atlanta','GA','US',12345)

So basically in my query I would only want to see reference numbers 4 and 5. Basically surname, address 1, zip code should = but the reference number should be <> , not =.

I hope this is clearer and that the test data is workable.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

I'd be tempted to work on each table in turn, logging the duplicated reference numbers. There are loads of ways of detecting dupes. Here's one of them which works nicely on your sample set:

SELECT a.*

FROM #Address a

WHERE EXISTS (

SELECT 1

FROM #Address b

WHERE b.address1 = a.address1

AND b.city = a.city

AND b.state = a.state

AND b.country = a.country

AND b.zipcode = a.zipcode

AND b.referencenumber <> a.referencenumber

)

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Sean Lange SSC Guru Points: 286590 More actions · Answer 11

Why not just add this to the join predicates to the Duplicates cte in the script I posted yesterday.

AND d.ReferenceNumber <> r.ReferenceNumber

I am more than a little concerned that you don't understand the code very well because that should be a pretty minor tweak.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

faulknerwilliam2 SSCrazy Points: 2299 More actions · Answer 12

Why not just add this to the join predicates to the Duplicates cte in the script I posted yesterday.

AND d.ReferenceNumber <> r.ReferenceNumber

I did try that Sean, but the output is currently doing two things. It IS showing me where the fields are identical apart from the reference number but confusingly it is still bringing through records where all the fields are matching, including the reference number.

I am slightly confused about this as I thought using <> would stop records coming through where the reference number was the same.

Yours (or anyone else's) thoughts about this would be appreciated.

Re your second point, which I accept completely:

I am more than a little concerned that you don't understand the code very well because that should be a pretty minor tweak.

Up till now I have written fairly basic scripts then used filters, IIF statements etc in SSRS to get the data I want. The above is an attempt to get the data in a relatively presentable state prior to using SSRS. However, as I soon realised, I was probably trying to run before I could walk.

Thanks.

faulknerwilliam2 SSCrazy Points: 2299 More actions · Answer 13

To ChrisM@Work

Thanks - your solution worked well and WHERE EXISTS was not something I had encountered before.