March 26, 2015 at 1:27 am
Hello
I have 4 tables:
Customer address, name, telephone number and email address:
SELECT A.Contact_id, A.first_name, A.last_name,
B.Contact_id, B.Address_1, B.Address_2, B.Address_3, B.zipcode
C.Contact_id, C.Telephone,
D.Contact_Id_D.Email,
FROM Contact AS A,
Addresses AS B,
Telephones AS C,
Emails as D,
INNER JOIN A.contact_id = B.contact_id
INNER JOIN c.contact_id = a.contact_id;
INNER JOIN d.contact_id = c.contact_id
I want to find out if I have duplications in name (surname) and address. However I also want to show details like the customer's telephone number in my report.
So I have basically linked the 4 tables together with inner joins. Next step is trying to find where surname and address 1 and zip code are duplicates.
Do you have any suggestions on how I can achieve this?
Many thanks.
March 26, 2015 at 7:18 am
faulknerwilliam2 (3/26/2015)
HelloI have 4 tables:
Customer address, name, telephone number and email address:
SELECT A.Contact_id, A.first_name, A.last_name,
B.Contact_id, B.Address_1, B.Address_2, B.Address_3, B.zipcode
C.Contact_id, C.Telephone,
D.Contact_Id_D.Email,
FROM Contact AS A,
Addresses AS B,
Telephones AS C,
Emails as D,
INNER JOIN A.contact_id = B.contact_id
INNER JOIN c.contact_id = a.contact_id;
INNER JOIN d.contact_id = c.contact_id
I want to find out if I have duplications in name (surname) and address. However I also want to show details like the customer's telephone number in my report.
So I have basically linked the 4 tables together with inner joins. Next step is trying to find where surname and address 1 and zip code are duplicates.
Do you have any suggestions on how I can achieve this?
Many thanks.
Your question is pretty confusing. Can you post ddl, sample data and desired output?
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 26, 2015 at 8:06 am
Hello Sean,
Yes, it was a hopeless entry. I reposted with this, which I hope is clearer:
Hello, I posted earlier with a poor script. Below is a better one.
It pulls through all customers, plus their addresses, email and phone numbers.
However some have duplicate surnames and addresses.
Please can you advise an amendment to the script that would show which records were duplicates? Ideally I'd like the output to show where the surname, address 1 and the zipcode were identical - and restrict the output to show duplicate rows only.
Many thanks.
SELECT
Reference.ReferenceNumber AS [Reference.ReferenceNumber)
Reference.LastName
Reference.FirstName
Address.ReferenceNumber AS [Address.ReferenceNumber]
Address.Address1
Address.Address2
Address.Address3
Address.Address4
Address.ZipCode
Telephone.ReferenceNumber AS [Telephone.ReferenceNumber]
Telephone.TelephoneNumber
Email.ReferenceNumber AS [Email.ReferenceNumber]
Email.EmailAddress
FROM
Reference
INNER JOIN Address
ON Reference.ReferenceNumber = Reference.ContactNumber
LEFT OUTER JOIN Telephone
ON Reference.ReferenceNumber = Telephone.ReferenceNumber
LEFT OUTER Join Email
ON Reference.ReferenceNumber = Email.ReferenceNumber
March 26, 2015 at 8:26 am
faulknerwilliam2 (3/26/2015)
Hello Sean,Yes, it was a hopeless entry. I reposted with this, which I hope is clearer:
Hello, I posted earlier with a poor script. Below is a better one.
It pulls through all customers, plus their addresses, email and phone numbers.
However some have duplicate surnames and addresses.
Please can you advise an amendment to the script that would show which records were duplicates? Ideally I'd like the output to show where the surname, address 1 and the zipcode were identical - and restrict the output to show duplicate rows only.
Many thanks.
SELECT
Reference.ReferenceNumber AS [Reference.ReferenceNumber)
Reference.LastName
Reference.FirstName
Address.ReferenceNumber AS [Address.ReferenceNumber]
Address.Address1
Address.Address2
Address.Address3
Address.Address4
Address.ZipCode
Telephone.ReferenceNumber AS [Telephone.ReferenceNumber]
Telephone.TelephoneNumber
Email.ReferenceNumber AS [Email.ReferenceNumber]
Email.EmailAddress
FROM
Reference
INNER JOIN Address
ON Reference.ReferenceNumber = Reference.ContactNumber
LEFT OUTER JOIN Telephone
ON Reference.ReferenceNumber = Telephone.ReferenceNumber
LEFT OUTER Join Email
ON Reference.ReferenceNumber = Email.ReferenceNumber
Starting with a query that actually works would be a big improvement.
Why do you want to return the reference number over and over. They will either be NULL or the same value because that is your join criteria.
And since you still didn't post ddl, sample data or desired output this is completely untested.
with Duplicates as
(
select r.LastName
, a.Address1
, a.ZipCode
, COUNT(*) as DuplicateCount
FROM Reference r
INNER JOIN Address a ON a.ReferenceNumber = r.ReferenceNumber --?? not sure what is right here
LEFT OUTER JOIN Telephone t ON r.ReferenceNumber = t.ReferenceNumber
LEFT OUTER Join Email e ON r.ReferenceNumber = e.ReferenceNumber
group by r.LastName
, a.Address1
, a.ZipCode
having COUNT(*) > 1
)
SELECT
r.ReferenceNumber
, r.LastName
, r.FirstName
--Address.ReferenceNumber AS [Address.ReferenceNumber] why bother? This is the same value
, a.Address1
, a.Address2
, a.Address3
, a.Address4
, a.ZipCode
--, t.ReferenceNumber AS [Telephone.ReferenceNumber]
, t.TelephoneNumber
--, e.ReferenceNumber AS [Email.ReferenceNumber]
, e.EmailAddress
, d.DuplicateCount
FROM Reference r
INNER JOIN Address a ON a.ReferenceNumber = r.ReferenceNumber --?? not sure what is right here
LEFT OUTER JOIN Telephone t ON r.ReferenceNumber = t.ReferenceNumber
LEFT OUTER Join Email e ON r.ReferenceNumber = e.ReferenceNumber
join Duplicates d on d.LastName = r.LastName
AND d.Address1 = a.Address1
AND d.ZipCode = a.ZipCode
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 26, 2015 at 8:40 am
Thanks for taking the time and your patience; I'm relatively new to actual scripting, as is probably obvious.
The query worked fine, in that it pulled through all our customers: their names, addresses, phone numbers and email addresses. The first join that you question, the INNER, is to link the address table to the reference table. Should I be doing that in some other way?
My ultimate aim is to return duplicate values only.
Hope this is clear(er)?
March 26, 2015 at 9:14 am
faulknerwilliam2 (3/26/2015)Should I be doing that in some other way?
Not necessarily but your join condition did not reference the address table.
INNER JOIN Address
ON Reference.ReferenceNumber = Reference.ContactNumber
If there is something else not working in your query you need to provide ddl, sample data and desired output because I have no idea what that might be. Please take a few minutes and read the first link in my signature.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 27, 2015 at 12:47 am
Dear Sean
The script you suggested above worked, but (due to my poor explanation of what I actually wanted) the data that's come out isn't quite what I want.
Basically our contact centre sometimes logs the same person without checking if they have contacted us before and so the customer ends up with two (sometimes more!) reference numbers. We want to cleanse this, so:
I would like to output instances where the customer's surname, address1 and zipcode are duplicated but only if the customer has different reference numbers.
This is the type of data that I'd like to see output:
Ref Number Surname Address1 Zip
1875 Faulkner 10 Smith Street 08540
1876 Faulkner 10 Smith Street 08540
If we had input John Smith, 11 Smith Street, 08976 twice, but using the same reference number (as should be happening) then I wouldn't want to see him being output in the report.
So basically, your script suggestion returns all duplicates, not those that duplicate except for the reference number.
Do you have any advice on how I can achieve this?
Many thanks.
March 27, 2015 at 12:50 am
Could you please post the scripts for the two tables in the problem and enough inserts statements to recreate the problem you are having? If you ask a question and include code to create an populate the tables in your problem, you are more likely to get tested solutions instead of haphazard guesses. Go ahead and obfuscate your data if you need to, but if we have nothing to work with, it's really hard to come up with a solution that will actually work.
March 27, 2015 at 12:57 am
Will do. Apologies for my many egregious breaches of forum etiquette: I will do better next time.
March 27, 2015 at 1:18 am
As per the request above, please see:
create table Reference(referencenumber int primary key identity(1,1),lastname varchar(20),firstname varchar(20))
create table [Address](addressid int primary key identity(1,1),referencenumber int references reference(referencenumber),address1 varchar(20),
city varchar(20),state varchar(2),country char(2),zipcode varchar(20))
go
insert into reference values('lincoln','sam'),('wright','mike'),('lincoln','will'),('lincoln','sam'),('lincoln','sam')
go
insert into [Address] values(1,'east avenue','atlanta','GA','US',12345),
(2,'south avenue','atlanta','GA','US',12345),
(3,'west avenue','atlanta','GA','US',12345),
(4,'east avenue','atlanta','GA','US',12345),
(5,'east avenue','atlanta','GA','US',12345)
So basically in my query I would only want to see reference numbers 4 and 5. Basically surname, address 1, zip code should = but the reference number should be <> , not =.
I hope this is clearer and that the test data is workable.
March 27, 2015 at 7:40 am
I'd be tempted to work on each table in turn, logging the duplicated reference numbers. There are loads of ways of detecting dupes. Here's one of them which works nicely on your sample set:
SELECT a.*
FROM #Address a
WHERE EXISTS (
SELECT 1
FROM #Address b
WHERE b.address1 = a.address1
AND b.city = a.city
AND b.state = a.state
AND b.country = a.country
AND b.zipcode = a.zipcode
AND b.referencenumber <> a.referencenumber
)
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 27, 2015 at 7:43 am
Why not just add this to the join predicates to the Duplicates cte in the script I posted yesterday.
AND d.ReferenceNumber <> r.ReferenceNumber
I am more than a little concerned that you don't understand the code very well because that should be a pretty minor tweak.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 29, 2015 at 8:31 am
Why not just add this to the join predicates to the Duplicates cte in the script I posted yesterday.
AND d.ReferenceNumber <> r.ReferenceNumber
I did try that Sean, but the output is currently doing two things. It IS showing me where the fields are identical apart from the reference number but confusingly it is still bringing through records where all the fields are matching, including the reference number.
I am slightly confused about this as I thought using <> would stop records coming through where the reference number was the same.
Yours (or anyone else's) thoughts about this would be appreciated.
Re your second point, which I accept completely:
I am more than a little concerned that you don't understand the code very well because that should be a pretty minor tweak.
Up till now I have written fairly basic scripts then used filters, IIF statements etc in SSRS to get the data I want. The above is an attempt to get the data in a relatively presentable state prior to using SSRS. However, as I soon realised, I was probably trying to run before I could walk.
Thanks.
March 29, 2015 at 2:23 pm
To ChrisM@Work
Thanks - your solution worked well and WHERE EXISTS was not something I had encountered before.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply