September 14, 2011 at 11:22 am
To All:
I need a performant approach to merging multiple records into one. This will need to merge about 3 mill records. I thought of a cursor but is there a better way?
INSERT INTO #CustomerInfo
( CustRowID
,CustFuzzyGroup_Key_Out_ID
,CustLocID
,CustNo
,FirstName
,MiddleName
,LastName
,AddressLine1
,City
,State
,ZipCode
,CellPhone
,BusinessPhone
,EmailAddress
,CreateDate
)
VALUES
( 25, 3, 204, '127489', 'Tommy', 'A', 'James', '123 Any Street', 'New York', 'NY', '10011', '2121231234', '', '', '2011-08-09 16:23:45' ),
( 18, 3, 255, '36987', 'Tommy', '', 'James', '123 Any Street', 'New York', 'NY', '10011', '', '2124441212
', 'corpemail@somecorp.com', '2011-09-09 00:00:00' ),
( 1, 3,173, '68974', 'Tommy', '', 'James', '123 Any Street', 'New York', 'NY', '10011', '', '2124155000', 'email@somemail.com', '2011-07-09 23:23:23' ),
( 31, 4, 89, '789657', 'Paula', 'P', 'Lady', '45 Main Street', 'Milwaukee', 'WI', '53111', '4148889797', '4148256000', 'email@somemail.com', '2011-08-07 23:00:00' ),
( 22, 4,58, '9897865T', 'Paula', '', 'Lady', '45 Main Street', 'Milwaukee', 'WI', '53112', '4148011234', '', '', '2011-09-03 23:00:00' )
GO
SELECT * FROM [#CustomerInfo]
GO
--DROP TABLE [#CustomerInfo]
-- Desired Results
/*
MasterCustIDFirstNameMiddleNameLastNameAddressLine1CityStateZipCodeCellPhoneBusinessPhoneEmailAddress CreateDate
1TommyA James123 Any StreetNew YorkNY1001121211112122124441212corpemail@somecorp.com9/9/2011
2PaulaPLady45 Main StreetMilwaukeeWI5311241480112344148256000email@somemail.com 9/3/2011
*/
/*
--CustInfoKeyLU
MasterCustIDCustRowIDCustLocIDCustNoCustFuzzyGroup_Key_Out_ID
1252041274893
118255369873
11173689743
231897896574
221589897865T4
*/
Notice I want the MAX NonNULL values based on the CreateDate if there is a possibility of 2 or more values.
Any ideas as to the best approach? I can use TSQL, SSIS or both.
Thanks in advance.
September 14, 2011 at 1:07 pm
First of all, please provide the table def for #CustomerInfo so your script can actually be used (we would have to guess the data type you're using).
Also, please note that an empty string is different than NULL. In the given example, "'Tommy', '', 'James'" would lead to an empty middle name since '' is not equal null.
Regarding the business logic itself: How do you know the two persons T. James and T. A. James are not two different people just sharing the same address? How do you guarantee to correctly identify rows that can be merged?
September 14, 2011 at 1:16 pm
Sorry About that. Bad pasting job
CREATE TABLE [#CustomerInfo](
[CustRowID] [int] NULL,
[CustFuzzyGroup_Key_Out_ID] [int] NULL,
[CustLocID] [int] NULL,
[CustNo] [varchar](17) NULL,
[FirstName] [varchar](25) NULL,
[MiddleName] [varchar](25) NULL,
[LastName] [varchar](40) NULL,
[AddressLine1] [varchar](45) NULL,
[City] [varchar](35) NULL,
[State] [varchar](3) NULL,
[ZipCode] [varchar](10) NULL,
[CellPhone] [varchar](10) NULL,
[BusinessPhone] [varchar](15) NULL,
[EmailAddress] [varchar](40) NULL,
[CreateDate] [datetime] NULL
) ON [PRIMARY]
GO
September 14, 2011 at 1:20 pm
September 14, 2011 at 1:24 pm
First of all you are correct, the '' in the middle name should be nulls.
In regards to identifying an individual I have to make some choices here. So if it is the same name with the same address I have to assume that it is the same person. I have spoken with the business and they are ok with this.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply