May 24, 2011 at 3:13 pm
A third party vendor has given us read access to an audit table, allowing us to review historical changes for a given person. In this case we wish to show email address changes. Here is some sample data.
if OBJECT_ID('TempDB..#audit_Person','U') IS NOT NULL
drop table #audit_Person
create table #audit_Person
(
ind_id int,
first_name varchar(10),
last_name varchar(10),
email_addr varchar(255),
created_dt datetime,
updated_dt datetime
)
Insert into #audit_Person (ind_id, first_name, last_name, email_addr, created_dt, updated_dt)
SELECT 1, 'Joe', 'Smith', 'nobody@nobody.com', '2010-02-27 17:40:03.117','2010-10-19 15:24:53.253' UNION ALL
SELECT 1, 'Joseph', 'Smith', 'nobody@nobody.com', '2010-02-27 17:40:03.117', '2010-12-02 11:23:09.430' UNION ALL
SELECT 1, 'Josephina', 'Smith', 'nobody@nowhere.com', '2010-02-27 17:40:03.117', '2010-12-31 11:59:59.055' UNION ALL
SELECT 1, 'Joseph', 'Smith', 'somebody@somebody.com', '2010-02-27 17:40:03.117', '2011-05-02 12:09:34.443'
The desired output:
[font="Courier New"]ind_id Old_Email_Addr New_Email_Addr updated_dt
----------- ------------------------- ------------------------- -----------------------
1 nobody@nobody.com nobody@nowhere.com 2010-12-31 11:59:59.057
1 nobody@nowhere.com somebody@somebody.com 2011-05-02 12:09:34.443[/font]
The actual table contains approximately 2.3 million rows, hence my interest in a fast performing set based solution for this. I have played with ROW_NUMBER() but it isn't exactly correct.
Any help is appreciated!
May 24, 2011 at 7:32 pm
How's this? (comments in the code)
;
WITH cte AS
(
-- get the columns we actually need
-- add a row number that restarts whenever the ind_id changes
-- order the row number by the updated_dt
SELECT ind_id,
email_addr,
updated_dt,
RN = ROW_NUMBER() OVER (PARTITION BY ind_id ORDER BY updated_dt)
FROM #audit_Person p
)
-- get the records where the email_addr has changed
-- show the old/new email addr, and the date changed
SELECT c1.ind_id,
Old_Email_Addr = c1.email_addr,
New_Email_Addr = c2.email_addr,
c2.updated_dt
FROM cte c1
LEFT JOIN cte c2
ON c1.ind_id = c2.ind_id -- same ind_id
AND c1.RN = c2.RN - 1 -- next change
WHERE c1.email_addr <> c2.email_addr -- email_addr changed
ORDER BY c1.ind_id, c1.RN;
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 25, 2011 at 3:52 pm
Great solution, it works well. It's taking me a bit to thoroughly understand how the join on c2.RN-1 works. At first I thought that would be an issue where no update to the email was made. I am assuming because a CTE is used the non-email change rows are filtered in the same pass when the row number is created. I need to do more reading on how SQL Server logically processes a select statement.
Thanks for the help!
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply