Need to select rows with the latest date only

  • In the attached  image, i need to change the query to return only the rows with latest RecruitmentDate (highlighted in green)
    What would such query be?

    thank you.

    Likes to play Chess

  • Simple:

    WITH BaseData as (
    SELECT
        AgentID
        , WC
        , RecruitmentDate
        , [Agent preferred gender]
        , [Agent Gender at Birth]
        , [rn] = ROW_NUMBER() OVER (PARTITION BY [AgentID] ORDER BY [RecruimentDate] DESC)
    FROM
        [MWDatesPR]
    )
    ELECT
        AgentID
        , WC
        , RecruitmentDate
        , [Agent preferred gender]
        , [Agent Gender at Birth]
    FROM
        BaseData
    WHERE
        [rn] = 1;

  • For others, here is that image again:

    Select from a CTE which contains ROW_NUMBER(), partitioned by AgentId and ordered by RecruitmentDate Desc, where ROW_NUMBER() = 1.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Lynn Pettis - Thursday, October 11, 2018 11:31 AM

    Simple:

    Grrr!

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Thursday, October 11, 2018 11:32 AM

    Lynn Pettis - Thursday, October 11, 2018 11:31 AM

    Simple:

    Grrr!

    Hey, after the troll I needed to post something usable.  😀

  • Sorry, I did not complete my question.
    The only way I know is this:

    select * from MWDatesPR M1
    where AgentID = '303844611' and RecruitmentDate= (select Max(RecruitmentDate) from MWDatesPR M2 where M1.agentID =  m2.agentID) 

    My question:
    On huge data-sets against 500 million and more rows tables,
    what would be a more efficient performance wise way to write this query?

    Thank you.

    Likes to play Chess

  • Thank you.
    I think the above CTE is my answer.

    Likes to play Chess

  • So what all have you tried?  Do you have the actually execution plans of the various options? What indexing exists on these large tables?

  • VoldemarG - Thursday, October 11, 2018 11:34 AM

    Sorry, I did not complete my question.
    The only way I know is this:

    select * from MWDatesPR M1
    where AgentID = '303844611' and RecruitmentDate= (select Max(RecruitmentDate) from MWDatesPR M2 where M1.agentID =  m2.agentID) 

    My question:
    On huge data-sets against 500 million and more rows tables,
    what would be a more efficient performance wise way to write this query?

    Thank you.

    If you have more than about 5 rows per AgentId then this method will be the fastest providing you have the following index:
    CREATE INDEX IX_MWDatesPR_1 ON MWDatesPR (AgentId, RecruitmentDate)
    ;WITH CTE AS
    (
        SELECT DISTINCT AgentID
         FROM MWDatesPR
    )
    SELECT m.*
    FROM CTE
    CROSS APPLY(SELECT TOP(1) *
                    FROM MWDatesPR m
                   WHERE m.AgentId = CTE.AgentId
                   ORDER BY m.RecruitmentDate DESC) AS m

    This will get the most recent row for all AgentId's
    If you are only interested in AgentId = '303844611'. Then you just need the following query:
    SELECT TOP(1) * FROM MWDatesPR WHERE AgentId = '303844611' ORDER BY RecruitmentDate DESC

  • Jonathan AC Roberts - Thursday, October 11, 2018 3:26 PM

    ...
    ;WITH CTE AS
    (
        SELECT DISTINCT AgentID
         FROM MWDatesPR
    )
    ...

    If you have a table where the number of distinct values out number those that have duplicates, then take a look at Paul White's super-quick rCTE-based DISTINCT equivalent

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply