June 1, 2016 at 2:23 pm
Email addresses are modified over the period of time and i'm trying write a query to get initial emailaddress and the last modified emailaddress along with last modified date
something like this. Any help will be appreciated.
Email_FromEmail_ToMoDt
A D1/1/2013
P T 12/12/2013
CREATE TABLE [dbo].[Email](
[Email_From] [varchar](50) NULL,
[Email_To] [varchar](50) NULL,
[MoDt] [date] NULL
) ON [PRIMARY]
INSERT INTO [Email]
([Email_From]
,[Email_To]
,[MoDt])
VALUES
('A','B','2/1/2012'),
('B','C','3/1/2012'),
('B','C','4/5/2012'),
('C','D','1/1/2013'),
('P','Q','8/1/2013'),
('Q','R','9/1/2013'),
('R','S','10/10/2013'),
('S','T','12/12/2013')
GO
Thanks
June 1, 2016 at 3:05 pm
A "recursive CTE" will parse through that kind of data. For your example, this works:
WITH rCTE
AS (SELECT
Email_From,
Email_To,
MoDt,
Email_From AS RNode
FROM
dbo.Email
WHERE
Email_From NOT IN (SELECT
Email_To
FROM
dbo.Email)
UNION ALL
SELECT
E2.Email_From,
E2.Email_To,
E2.MoDt,
rCTE.RNode
FROM
dbo.Email AS E2
INNER JOIN rCTE
ON rCTE.Email_To = E2.Email_From
),
Parser
AS (SELECT
rCTE.Email_From,
rCTE.Email_To,
rCTE.MoDt,
rCTE.RNode,
ROW_NUMBER() OVER (PARTITION BY rCTE.RNode ORDER BY rCTE.MoDt DESC) AS R
FROM
rCTE
)
SELECT
Parser.RNode AS SourceMail,
Parser.Email_To AS CurrentMail,
Parser.MoDt
FROM
Parser
WHERE
Parser.R = 1;
I'm not sure how that will do on your real data, but it works on the sample.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 1, 2016 at 3:30 pm
Thank you GSquared. I really appreciate your help on this. I'll see if i can implement this with real data.
June 1, 2016 at 3:38 pm
However, there are problems when the data is even a little more complex. For instance, a person might switch from one email to another and then switch back to their original. I added one record to indicate this, and it's no longer producing a row for that person.
('B', 'A', '5/1/2011')
Also, there are issues in the given data. For instance, B is changed to C on 3/1/2012 and then again on 4/5/2012. There is no indication of how B got back into the system, so it's not clear whether this second B should be treated as a duplicate (likely given that the change is to the same email) or treated as a completely new instance of B (if the second change was to another email).
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 1, 2016 at 4:38 pm
Yes drew.allen. I also considered the scenario where customer started with one emailaddress and changed to couple more different emailaddreses and changed back to same emailaddress he/she started with. In that case i'm not looking have that customer in the result set. I have really complex data. Data is pooled into one table from different sources and and have lot of duplicates. we are in the process of cleaning up..
Thanks for thinking along with me 🙂
June 1, 2016 at 7:58 pm
neela.teki (6/1/2016)
Yes drew.allen. I also considered the scenario where customer started with one emailaddress and changed to couple more different emailaddreses and changed back to same emailaddress he/she started with. In that case i'm not looking have that customer in the result set. I have really complex data. Data is pooled into one table from different sources and and have lot of duplicates. we are in the process of cleaning up..Thanks for thinking along with me 🙂
But, they're not duplicates. They have dates associated with them. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply