Not sure how to write this

  • I have a table two colums, SSN and date. There are also several other unimportant columns.

    For each SSN there are going to be several rows, but each row will be unique per date . I am trying to figure out how I'd write a query to return me all of the unique SSNs (only one row per SSN) with the most recent date. And I'd like to return all columns as a result. The goal is just to get the most recent record for each ssn/person.

    Is this possible w/o a cursor? I'd like to only return the Fired row for Bob Joe, and the Fired Row for jane doe. Those are the rows with the highest dates for that SSN.

     

    Example

    SSN, Date, Name, Age

    1111, 1/1/2007, Bob Joe, 23, Hired

    1111, 1/15/2007, Bob Joe, 23, Fired

    2222, 1/14/2007, Jane Doe, 18, Hired

    2222, 2/1/2007, Jane Doe, 18, Fired

     

    Thanks for any direction,

    Patrick

     


    Kindest Regards,

    Patrick Allmond
    batteryfuel.com
    For all of your laptop, camera, power tool and portable battery needs.

  • Select Main.* FROM dbo.Tbl Main INNER JOIN (SELECT SSN, MAX(Date) AS Date from dbo.Tbl GROUP BY SSN) dtLast ON Main.SSN = dtLast.SSN AND Main.Date = dtLast.Date

  • Thanks. Testing it but it is returning nothing. I'll play with this for a little bit.

     


    Kindest Regards,

    Patrick Allmond
    batteryfuel.com
    For all of your laptop, camera, power tool and portable battery needs.

  • Please post your version of the query... we might be able to figure out why.

  • Select

    Main.* FROM dbo.empdata Main

    INNER

    JOIN (SELECT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN) dtLast

    ON

    Main.SSN = dtLast.SSN AND Main.dateid = dtLast.date2

     

    Would the date datatype matter? In my data the date is an INT - 20060521 for example. The date column is called dateid for my data

     

    The query goes on to 300K rows before I cancel it. That cannot be right since there are only 267329 distinct ssns in the table.

     


    Kindest Regards,

    Patrick Allmond
    batteryfuel.com
    For all of your laptop, camera, power tool and portable battery needs.

  • And where in your script are you narrowing it down to ONLY DISTINCT SSNs????

    Run JUST the inner select "SELECT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN" and see what it returns. I'll bet it will return row with the SSN and include the max(dateid) for the SSN.

    For example:

    1111 1/15/2007

    1111 1/15/2007

    2222 2/1/2007

    2222 2/1/2007

    -SQLBill

  • Compare:

    SELECT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN

    to:

    SELECT DISTINCT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN

    See if that helps....

    -SQLBill

  • How the heck is that supposed to change anything???

     

    BTW, I hope you have an index of both SSN (clustered if possible) and also on DateID.  Without those 2 you can be sure to wait quite a while...

  • Select

    Main.* FROM dbo.empdata Main

    INNER

    JOIN (SELECT SSN, MAX(dateid) AS Date2 from dbo.empdata GROUP BY SSN) dtLast

    ON

    Main.SSN = dtLast.SSN AND Main.dateid = dtLast.date2

     

    When I run the subselect it works OK. I am getting what I expect. But why in the world I get too much when I run the whole thing is beyond me. I just stopped the query at 350K records. Yet the subselect only returns 267K records. I see the SSN duplicated in there multiple times too.


    Kindest Regards,

    Patrick Allmond
    batteryfuel.com
    For all of your laptop, camera, power tool and portable battery needs.

  • Is it possible that you have more than one record per day per SSN and that you would be in need of some sort of tie breaker?

     

    Can you run this and tell us the results :

     

    SELECT SSN, DateCol, COUNT(*) FROM dbo.YourTable GROUP BY SSN, DateCol HAVING COUNT(*) > 1 ORDER BY COUNT(*) DESC, SSN, DateCol

  • Thanks SSC - that was the problem. The file is cumulative so there are duplicate rows where every field is the same.

    Sorry to drag everybody down with this.


    Kindest Regards,

    Patrick Allmond
    batteryfuel.com
    For all of your laptop, camera, power tool and portable battery needs.

  • So what did you do to solve the remaining of the problem?

  • Oh... I haven't yet.

     

    So now I need to look at that query and try to get the same information out. Because each SSN does have date records with different dates as well as date records with the same date. So now I need to figure out how to get the most recent record even when there are duplicate records.


    Kindest Regards,

    Patrick Allmond
    batteryfuel.com
    For all of your laptop, camera, power tool and portable battery needs.

  • Do you have a secondary date on the same record, an auditing table or an identity column?

  • No identity column. There is a FY and Month which looks like when a record was added to the larger dataset. But even as I eyeball that I see duplicate records in there for a given SSN, FY and month combo. So at a minimum there will one duplicate record where EVERY column is duplicate.


    Kindest Regards,

    Patrick Allmond
    batteryfuel.com
    For all of your laptop, camera, power tool and portable battery needs.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply