January 9, 2007 at 12:07 pm
I have a table two colums, SSN and date. There are also several other unimportant columns.
For each SSN there are going to be several rows, but each row will be unique per date . I am trying to figure out how I'd write a query to return me all of the unique SSNs (only one row per SSN) with the most recent date. And I'd like to return all columns as a result. The goal is just to get the most recent record for each ssn/person.
Is this possible w/o a cursor? I'd like to only return the Fired row for Bob Joe, and the Fired Row for jane doe. Those are the rows with the highest dates for that SSN.
Example
SSN, Date, Name, Age
1111, 1/1/2007, Bob Joe, 23, Hired
1111, 1/15/2007, Bob Joe, 23, Fired
2222, 1/14/2007, Jane Doe, 18, Hired
2222, 2/1/2007, Jane Doe, 18, Fired
Thanks for any direction,
Patrick
Patrick Allmond
batteryfuel.com
For all of your laptop, camera, power tool and portable battery needs.
January 9, 2007 at 12:12 pm
Select Main.* FROM dbo.Tbl Main INNER JOIN (SELECT SSN, MAX(Date) AS Date from dbo.Tbl GROUP BY SSN) dtLast ON Main.SSN = dtLast.SSN AND Main.Date = dtLast.Date
January 9, 2007 at 2:59 pm
Thanks. Testing it but it is returning nothing. I'll play with this for a little bit.
Patrick Allmond
batteryfuel.com
For all of your laptop, camera, power tool and portable battery needs.
January 9, 2007 at 3:14 pm
Please post your version of the query... we might be able to figure out why.
January 9, 2007 at 3:24 pm
Select
Main.* FROM dbo.empdata Main
INNER
JOIN (SELECT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN) dtLast
ON
Main.SSN = dtLast.SSN AND Main.dateid = dtLast.date2
Would the date datatype matter? In my data the date is an INT - 20060521 for example. The date column is called dateid for my data
The query goes on to 300K rows before I cancel it. That cannot be right since there are only 267329 distinct ssns in the table.
Patrick Allmond
batteryfuel.com
For all of your laptop, camera, power tool and portable battery needs.
January 9, 2007 at 4:55 pm
And where in your script are you narrowing it down to ONLY DISTINCT SSNs????
Run JUST the inner select "SELECT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN" and see what it returns. I'll bet it will return row with the SSN and include the max(dateid) for the SSN.
For example:
1111 1/15/2007
1111 1/15/2007
2222 2/1/2007
2222 2/1/2007
-SQLBill
January 9, 2007 at 4:56 pm
Compare:
SELECT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN
to:
SELECT DISTINCT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN
See if that helps....
-SQLBill
January 9, 2007 at 9:16 pm
How the heck is that supposed to change anything???
BTW, I hope you have an index of both SSN (clustered if possible) and also on DateID. Without those 2 you can be sure to wait quite a while...
January 10, 2007 at 9:04 am
Select
Main.* FROM dbo.empdata Main
INNER
JOIN (SELECT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN) dtLast
ON
Main.SSN = dtLast.SSN AND Main.dateid = dtLast.date2
When I run the subselect it works OK. I am getting what I expect. But why in the world I get too much when I run the whole thing is beyond me. I just stopped the query at 350K records. Yet the subselect only returns 267K records. I see the SSN duplicated in there multiple times too.
Patrick Allmond
batteryfuel.com
For all of your laptop, camera, power tool and portable battery needs.
January 10, 2007 at 9:20 am
Is it possible that you have more than one record per day per SSN and that you would be in need of some sort of tie breaker?
Can you run this and tell us the results :
SELECT SSN, DateCol, COUNT(*) FROM dbo.YourTable GROUP BY SSN, DateCol HAVING COUNT(*) > 1 ORDER BY COUNT(*) DESC, SSN, DateCol
January 10, 2007 at 10:57 am
Thanks SSC - that was the problem. The file is cumulative so there are duplicate rows where every field is the same.
Sorry to drag everybody down with this.
Patrick Allmond
batteryfuel.com
For all of your laptop, camera, power tool and portable battery needs.
January 10, 2007 at 11:11 am
So what did you do to solve the remaining of the problem?
January 10, 2007 at 12:07 pm
Oh... I haven't yet.
So now I need to look at that query and try to get the same information out. Because each SSN does have date records with different dates as well as date records with the same date. So now I need to figure out how to get the most recent record even when there are duplicate records.
Patrick Allmond
batteryfuel.com
For all of your laptop, camera, power tool and portable battery needs.
January 10, 2007 at 1:57 pm
Do you have a secondary date on the same record, an auditing table or an identity column?
January 10, 2007 at 2:07 pm
No identity column. There is a FY and Month which looks like when a record was added to the larger dataset. But even as I eyeball that I see duplicate records in there for a given SSN, FY and month combo. So at a minimum there will one duplicate record where EVERY column is duplicate.
Patrick Allmond
batteryfuel.com
For all of your laptop, camera, power tool and portable battery needs.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply