Recursion Problem

Question

Post reply

Recursion Problem

hawaiianrebel

Old Hand

Points: 343
More actions
July 11, 2011 at 12:37 am

#389943

Here is the scenario, we have a Customer DB that has duplicate Customers with different CUSTID values. We plan on keeping the oldest CUSTOMER Record and merge any newer records into the Oldest one and cancel the newer records:
tblDUPS:
CUSTNAME CUSTID CUSTID2 MatchField
Richard Smith 101 102 NAME
Dick Smith 102 105 NAME
Robert Bug 103 104 CITY
Dickie Smith 105 106 CITY
The objective is relate all three rows to the smallest CUSTID and PIVOT the MatchField values:
RESULT SET
CUSTID1 CUSTID2 NAME DLN CITY
101 102 Y N N
101 105 N Y N
101 106 N N Y
103 104 N N Y
In the example above, I have already identified the duplicate rows in the the Customer Table and populated table tblDUPS. I'm just having mental block on rendering the last result set. There is no number to the amount of duplicates that can be in the table. In the example above CUSTID's 101,102,105, and 106 all belong to the same customer. Eventually, we will roll all purchase history into the smallest (oldest) CUSTID's (101,103) and Cancel the high CUSTID's (102,104,105,106).
Any idea's?

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Eralper SSCarpal Tunnel Points: 4438 More actions · Answer 1

Hi hawaiian,

Can you try following code

select

isnull(tt.custid, t.custid) custid,

t.custid2,

NAME = CASE when t.MatchField = 'NAME' then 'Y' else 'N' end,

DLN = CASE when t.MatchField = 'CITY' then 'Y' else 'N' end,

CITY = CASE when t.MatchField = 'CITY' then 'Y' else 'N' end

from tblDUPS t

left join tblDUPS tt on t.custid = tt.custid2

Sorry for the DLN column, I have no idea how to populate it

Eralper
SQL Server and T-SQL Tutorials and Articles
Microsoft Certification and Certification Exams

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 2

Provide DDL, DML for sample data and expected results relative to the sample data please.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

hawaiianrebel Old Hand Points: 343 More actions · Answer 3

Below is the DDL. I should explain that the matched field column means the to customers matched on either their Drivers License Number (DLN); Last 4 of their Social Security Number (SSN); or some combination of their name.

You will notice after the table build out that Customer 102 matches to customer 105 based on a Name Combination Match. Customer 105 also matches to customer 106 on a Name Combination Match. The Name Combination Match query compares the First letter of the first name and some combination of the Last Name.

Here is a real life example:

Customer ID First Name Last Name

102 Mary Elizabeth Del La Rosa

105 Mary Rosa

106 M Elizabeth Rosa

The customer table is compared to itsself to find duplicates, but to keep the customer ID from showig up on both sides of the results there is a WHERE clause that states WHERE CustID1 < CustID2. So the problem is 102 will relate to 105 on a name match and 105 will relate to 106 on a name match. The solution should:

Origianl Data

101102DLN

101103SSN

102105NAME

105106NAME

Pass1

Relate 101 to 102

Relate 101 to 103

Relate 101 to 105

Relate 102 to 106

Pass2

Relate 101 to 102

Relate 101 to 103

Relate 101 to 105

Relate 101 to 106

The code below will get me to Pass 1. I tried it with an Update Statement and a Loop but that had its own problem. Hence, a recursion problem.

USE [SQLDEVL]

GO

/****** Object: Table [dbo].[tblCustomerIDs] Script Date: 07/13/2011 22:32:37 ******/

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tblDUPS]') AND type in (N'U'))

DROP TABLE [dbo].[tblDUPS]

GO

USE [SQLDEVL]

GO

/****** Object: Table [dbo].[tblCustomerIDs] Script Date: 07/13/2011 22:32:37 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[tblDUPS](

[custid] [int] NULL,

[custid2] [int] NULL,

[MatchField] [CHAR](4)

) ON [PRIMARY]

GO

INSERT INTO tblDUPS (custid,custid2,MatchField)

VALUES(101,102,'DLN')

GO

INSERT INTO tblDUPS (custid,custid2,MatchField)

VALUES(101,103,'SSN')

GO

INSERT INTO tblDUPS (custid,custid2,MatchField)

VALUES(102,105,'NAME')

GO

INSERT INTO tblDUPS (custid,custid2,MatchField)

VALUES(105,106,'NAME')

GO

SELECT * FROM tblDUPS

GO

SELECT

ISNULL(tt.custid, t.custid) AS custid,

t.custid2,

NAME = CASE

WHEN t.MatchField = 'NAME' THEN 'Y' ELSE 'N'

END,

DLN = CASE

WHEN t.MatchField = 'DLN' THEN 'Y' ELSE 'N'

END,

CITY = CASE

WHEN t.MatchField = 'CITY' THEN 'Y' ELSE 'N'

END

FROM

tblDUPS AS t LEFT JOIN tblDUPS AS tt ON t.custid = tt.custid2;

hawaiianrebel Old Hand Points: 343 More actions · Answer 4

hawaiianrebel

Old Hand

Points: 343

July 15, 2011 at 1:53 am

#1353233

BTT

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 5

See if this approach will work for you. If you have questions about my choices let me know.

There may be more efficient ways to do this too...others may drop in and contribute new or improve on this.

USE tempdb

GO

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'[dbo].[tblDUPS]')

AND type IN (N'U') )

DROP TABLE [dbo].[tblDUPS]

GO

CREATE TABLE [dbo].[tblDUPS]

(

[custid] [int] NULL,

[custid2] [int] NULL,

[MatchField] [CHAR](4)

)

GO

INSERT INTO tblDUPS

(custid, custid2, MatchField)

VALUES (101, 102, 'DLN'),

(102, 105, 'NAME'),

(105, 106, 'NAME'),

(101, 103, 'SSN'),

(109, 110, 'NAME'),

(110, 112, 'NAME'),

(112, 113, 'NAME') ;

GO

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'dbo.get_leaves')

AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT') )

DROP FUNCTION dbo.get_leaves ;

GO

CREATE FUNCTION dbo.get_leaves (@custid INT)

RETURNS TABLE

AS

RETURN

(WITH cte(custid, custid2, [level])

AS (

SELECT DISTINCT

t.custid AS custid,

t.custid2 AS custid2,

0 AS [level]

FROM dbo.tblDUPS t

WHERE custid = @custid

UNION ALL

SELECT t.custid,

t.custid2,

cte.[level] + 1 AS [level]

FROM cte

JOIN dbo.tblDUPS t ON cte.custid2 = t.custid

),

cte2

AS (

SELECT custid,

custid2,

level,

ROW_NUMBER() OVER (PARTITION BY custid2 ORDER BY level DESC) AS row_num

FROM cte

)

SELECT DISTINCT

@custid AS custid,

custid2

FROM cte2

WHERE row_num = 1 ) ;

GO

SELECT DISTINCT

t.custid,

gl.custid2

FROM dbo.tblDUPS t

CROSS APPLY dbo.get_leaves(t.custid) gl

WHERE NOT EXISTS ( SELECT *

FROM dbo.tblDUPS

WHERE custid2 = t.custid )

GO

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

hawaiianrebel Old Hand Points: 343 More actions · Answer 6

opc.three you got it; hence, you got the problem. Two things I want to stress:

I did come up with a solution but it was very ... ugly and worked for small groups of data. When I process all 6k ROWS, it failed. So it wasn’t really much of a solution.

Secondly, I made up this scenario. There is no Customer table that begins with tbl. A record is composed of one or more rows from one or more tables. A tuple is a row but you never hear it being used today.

Anyway, there was no way I could post my customers data DDL. What is true is the fact that the database has multiple "customers" because people can register 200 times in a day with 200 different names or veriations of their name. I was tasked with finding duplicate registrations, reporting on them, and prevent any new duplicates to enter the system.

A human being has to actually verify if the two customers are in fact the same person. They can then use a merge program to merge the newer customer registration and their history into the oldest customer ID. It would be very helpful if the entire related customer ID's were group as opc has shown me.

Out of 2.1 million customers there are 6 thousand duplicate pairs. Identifying them was easy, reporting on them was a challenge.

Thank you again opc!!!

Have a great day

Hawaiian

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 7

You're welcome Hawaiian! I am happy it will work for you...enjoy the weekend 🙂

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

hawaiianrebel Old Hand Points: 343 More actions · Answer 8

Heheeeee. I got mad props from my customer 🙂

I couldn't in good conscience take all the credit