July 5, 2014 at 3:14 pm
Query Help Please
I have included DDL scripts to facilitate the process.
I have a table called Duplicate_Cust_Numbers_to_DeDupe.
This table was created as a result of a script to detect duplicate Cust Numbers.
The following explains each column.
CustSGID is a system generated id for the customer when that customer is created in the database.
CUSTein is a taxid for that company, but the numbers you see have been made up and are not real and the column name is made up.
Cust_Name is the name of the Customer but again these are made up names and any similarity to real names is purely by accident.
Cust_Number this is a number provided by the outside source that is providing the data that is inserted into the database originally.
The Cust_Number is supposed to be unique by customer but as you can see that is not always the case. The outside source violates their own rule very often.
Has_Account means a user has come into the system and created an account for that customer.
Has_Cert this means a person has certified that the customer is who they say they are.
PrimaryCustid is the primary CustSGID that will be used as the final CustSGID once the de duplication process is complete.
SecondaryCustid is the CustID that will be blended into the primary CustSGID.
Goal
I need to populate the PrimaryCUSTID and the SecondaryCUSTID based on the criteria below, 1 being top and 3 being lowest criteria.
1.) Has an account
2.) Has a CustEIN
3.) Has a Cert.
In the data provided
In the actual table there are over 27,000 records.
The Customer Bissell inc is the PrimaryCustSGID because it has a CUSTEIN, no account and no cert.
The customer Bissell International becomes the SecondatyCustID
The Customer JAMES BLDG PRODUC is the PrimaryCustSGID because it has an account and there is no CustEIN or Cert.
The Customer JAMES BLDG PR becomes the SecondaryCustSGID because there is no account eventhough there is a CustEIN.
The Customer DOORCRAFT has no account, or CustEIN, or Cert so in this case the lowest CustSGID(399108) will become the PrimaryCustSGID
and the highest(1059533) will become the secondaryCustSGID
The Customer AMERDALE with the account becomes the primaryCUSTSGID and the second instance of AMERDALE becomes the secondaryCustID
even though both have a CUSTein. The instance with the Accounts wins in this case.
I have struggled over a week on this and have gotten nowhere, and now I think I am too close to the situation.
I need a different set of eyes or brains to look at this a show me how to do this.
I really appreciate the help I have received in the past.
This is a forum and the people who respond are a great resource.
Thanks in Advance
Gary
DDL
----CREATE TABLE [dbo].[Duplicate_Cust_Numbers_to_DeDupe](
----
----[CustSGID] [int] NOT NULL,
----[CUSTein] [varchar](25) NULL,
----[Cust_Name] [varchar](70) NOT NULL,
----[Cust_Number] [nvarchar](50) NOT NULL,
----[Has_Account] [varchar](3) NOT NULL,
----[Has_Cert] [varchar](3) NOT NULL,
----
----
----[PrimaryCUSTID] [int] NULL,
----[SecondaryCUSTID] [int] NULL,
----
----
----) ON [PRIMARY]
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(399011,'123456789','BISSELL INC','100140','No','No',0,0)
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(1060533,'','BISSELL Inernational','100140','No','No',0,0)
--
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(395524,'','JAMES BLDG PRODUC','12092','Yes','No',0,0)
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(1059069,'999999990','JAMES BLDG PR','12092','No','No',0,0)
--
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(399108,'','DOORCRAFT','12723','No','No',0,0)
--
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(1059533,'','DOORCRAFT','12723','No','No',0,0)
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(401860,'999999909','AMERDALE','14537','Yes','No',0,0)
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(1060335,'999999009','AMERDALE','14537','No','No',0,0)
July 5, 2014 at 5:28 pm
And what code have you developed so far in an attempt to solve your problem?
July 5, 2014 at 6:06 pm
While waiting for what you tried I thought I'd give it a shot. Is this basically what you were looking for?
select
*,
rn = row_number() over (partition by Cust_Number order by
case when Has_Account = 'Yes' then 1
when CUSTein > '' then 2
when Has_Cert = 'Yes' then 3
else 4
end, CustSGID)-- as SortOrder
from
[dbo].[Duplicate_Cust_Numbers_to_DeDupe];
with dedup_id as (
select
*,
rn = row_number() over (partition by Cust_Number order by
case when Has_Account = 'Yes' then 1
when CUSTein > '' then 2
when Has_Cert = 'Yes' then 3
else 4
end, CustSGID)-- as SortOrder
from
[dbo].[Duplicate_Cust_Numbers_to_DeDupe]
)
select
Cust_Number,
max(case rn when 1 then Cust_Name else '' end) as Cust_Name,
max(Has_Account) as Has_Account,
max(Has_Cert) as Has_Cert,
max(case rn when 1 then CustSGID else '' end) as PrimaryCustSGID,
max(case rn when 2 then CustSGID else '' end) as SecondaryCustSGID
from
dedup_id
where
rn <= 2 -- Is it possible there may be more than 2 duplicate Cust_Number records?
group by
Cust_Number
July 5, 2014 at 6:15 pm
Updated the code above.
July 6, 2014 at 7:37 am
Hopefully what I posted works for you, unfortunately I just noticed that you had posted in a SQL Server 7/2000 forum and the code I provided will work in SQL Server 2005 and later, not in SQL Server 2000.
July 8, 2014 at 5:42 pm
Thank you Lynn for your response.
I have been offline for a few days and have not been able to try your solution.
What I had been trying was using a series of case statements checking for each condition and that was working about as well as tits on a bore hog, hence my post.
Yes, this is on SQL 2000, but since you suggested CTEs maybe I can use your suggestion as a guide to put something together in SQL 2000.
Thanks Again.
Gary
July 8, 2014 at 6:08 pm
Using SQL Server 2000 will make things more difficult. You don't have access to the windowing function row_number() for one.
I will think if I can come up with another possible answer.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply