October 11, 2018 at 11:24 am
In the attached image, i need to change the query to return only the rows with latest RecruitmentDate (highlighted in green)
What would such query be?
thank you.
Likes to play Chess
October 11, 2018 at 11:31 am
Simple:
WITH BaseData as (
SELECT
AgentID
, WC
, RecruitmentDate
, [Agent preferred gender]
, [Agent Gender at Birth]
, [rn] = ROW_NUMBER() OVER (PARTITION BY [AgentID] ORDER BY [RecruimentDate] DESC)
FROM
[MWDatesPR]
)
ELECT
AgentID
, WC
, RecruitmentDate
, [Agent preferred gender]
, [Agent Gender at Birth]
FROM
BaseData
WHERE
[rn] = 1;
October 11, 2018 at 11:32 am
For others, here is that image again:
Select from a CTE which contains ROW_NUMBER(), partitioned by AgentId and ordered by RecruitmentDate Desc, where ROW_NUMBER() = 1.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 11, 2018 at 11:32 am
Lynn Pettis - Thursday, October 11, 2018 11:31 AMSimple:
Grrr!
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 11, 2018 at 11:34 am
Phil Parkin - Thursday, October 11, 2018 11:32 AMLynn Pettis - Thursday, October 11, 2018 11:31 AMSimple:Grrr!
Hey, after the troll I needed to post something usable. 😀
October 11, 2018 at 11:34 am
Sorry, I did not complete my question.
The only way I know is this:
select * from MWDatesPR M1
where AgentID = '303844611' and RecruitmentDate= (select Max(RecruitmentDate) from MWDatesPR M2 where M1.agentID = m2.agentID)
My question:
On huge data-sets against 500 million and more rows tables,
what would be a more efficient performance wise way to write this query?
Thank you.
Likes to play Chess
October 11, 2018 at 11:35 am
Thank you.
I think the above CTE is my answer.
Likes to play Chess
October 11, 2018 at 11:38 am
So what all have you tried? Do you have the actually execution plans of the various options? What indexing exists on these large tables?
October 11, 2018 at 3:26 pm
VoldemarG - Thursday, October 11, 2018 11:34 AMSorry, I did not complete my question.
The only way I know is this:select * from MWDatesPR M1
where AgentID = '303844611' and RecruitmentDate= (select Max(RecruitmentDate) from MWDatesPR M2 where M1.agentID = m2.agentID)My question:
On huge data-sets against 500 million and more rows tables,
what would be a more efficient performance wise way to write this query?Thank you.
If you have more than about 5 rows per AgentId then this method will be the fastest providing you have the following index:CREATE INDEX IX_MWDatesPR_1 ON MWDatesPR (AgentId, RecruitmentDate)
;WITH CTE AS
(
SELECT DISTINCT AgentID
FROM MWDatesPR
)
SELECT m.*
FROM CTE
CROSS APPLY(SELECT TOP(1) *
FROM MWDatesPR m
WHERE m.AgentId = CTE.AgentId
ORDER BY m.RecruitmentDate DESC) AS m
This will get the most recent row for all AgentId's
If you are only interested in AgentId = '303844611'. Then you just need the following query:SELECT TOP(1) * FROM MWDatesPR WHERE AgentId = '303844611' ORDER BY RecruitmentDate DESC
October 12, 2018 at 1:25 am
Jonathan AC Roberts - Thursday, October 11, 2018 3:26 PM...;WITH CTE AS
...
(
SELECT DISTINCT AgentID
FROM MWDatesPR
)
If you have a table where the number of distinct values out number those that have duplicates, then take a look at Paul White's super-quick rCTE-based DISTINCT equivalent
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply