Selecting distinct values from dupes based on latest date

Question

Selecting distinct values from dupes based on latest date

SSCarpal Tunnel

Points: 4302

May 26, 2022 at 3:59 pm

I have a table of patients that has often multiple occurrences of a patient number but with differing dates of which I need to return only the latest date but I'm unsure if I need a subquery, in short I'm stuck as to what might work. Here is my table and some sample data.

The below gives output of:

123548, '2022-01-17'

123548, '2022-02-21'

985254, '2022-03-19'

774589, '2022-01-17'

754512, '2022-04-04'

754512, '2022-02-09'

But I need the output to be:

123548, '2022-02-21'

754512, '2022-04-04'

create table [dbo].[cases](
[case_id] [int] identity(1,1) not null,
[pat_id] [bigint] not null,
[eff_dt] [datetime] null

INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (123548, '2022-01-17')
INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (123548, '2022-02-21')
INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (985254, '2022-03-19')
INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (774589, '2022-01-17')
INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (754512, '2022-04-04')
insert INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (754512, '2022-02-09')

select pat_id, eff_dt         
from dbo.cases

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99135 More actions · Answer 1

The standard approach for this is to use a CTE and ROW_NUMBER to get the "top 1" in each group. You can also use a windowed COUNT to allow you to filter to patients with multiple rows:

WITH CTE AS(
    SELECT pat_id,
           eff_dt,
           COUNT(pat_id) OVER (PARTITION BY pat_id) AS C,
           ROW_NUMBER() OVER (PARTITION BY pat_id ORDER BY eff_dt DESC) AS RN
    FROM dbo.cases)
SELECT pat_id,
       eff_dt
FROM CTE
WHERE C > 1
  AND RN = 1;

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

DaveBriCam SSCarpal Tunnel Points: 4302 More actions · Answer 2

DaveBriCam

SSCarpal Tunnel

Points: 4302

May 26, 2022 at 4:24 pm

#4039735

That works perfectly, many thanks!

DaveBriCam SSCarpal Tunnel Points: 4302 More actions · Answer 3

One question, what if I have some dates as NULL values and want them to return?

Ed B Hall of Fame Points: 3674 More actions · Answer 4

DaveBriCam wrote:

One question, what if I have some dates as NULL values and want them to return?

The null date is ordered after the valid dates (in descending order) so will not be returned with row_number = 1 unless there are no other dates. You can use isnull(eff_dt, '1900-0101') if you really wanted to make sure, but applying functions like this could degrade performance.

The code excludes patients with only one row, so if the patient has one row and the eff_dt is null, they will be dropped along with any other patients with a single eff_dt. Your desired output implied that's what you wanted, but if that is not the case you could drop the C > 1 line.

If you have a patient with two rows and both have null eff_dt, or there are duplicate max eff_dt values, one of them will be returned, but the result will be non-deterministic. In this case I would order the row_number by eff_dt desc, case_id desc so that the same rows are always returned. In this example it doesn't matter, but if dupes are possible and you needed to return the case_id of the most recent eff_dt so you could join it to other tables etc, making it deterministic would be a good idea.

This reply was modified 2 years, 8 months ago by Ed B.
This reply was modified 2 years, 8 months ago by Ed B. Reason: typo

DaveBriCam SSCarpal Tunnel Points: 4302 More actions · Answer 5

Sorry, I was wrong with what my needed output should be, I needed this, in other words keeping those with just one date:

123548, '2022-02-21'

985254, '2022-03-19'

774589, '2022-01-17'

754512, '2022-04-04'

Thom A SSC Guru Points: 99135 More actions · Answer 6

DaveBriCam wrote:

Sorry, I was wrong with what my needed output should be, I needed this, in other words keeping those with just one date:
123548, '2022-02-21'
985254, '2022-03-19'
774589, '2022-01-17'
754512, '2022-04-04'

All the information you need to get this answer is still in my code before; you just need to now not just return rows where there are more than 1 instance of a value.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

DaveBriCam SSCarpal Tunnel Points: 4302 More actions · Answer 7

I thought it best to add onto an existing thread since the issue is very similar with a new challenge. I need to return all the insurance numbers that have two different people listed such as:

pat_ID ins_num

111807 100449401H

461261 100449401H

106820 101159201B

35599 101203104H

85116 101859902J

445903 103703602G

199156 103839502A

342322 104206501C

457017 105117401F

98573 105450402E

Note that what would need to be returned would be only the number '100449401H'

Below is some code I hoped would work but too many records are being returned:

with CTE
as (select i.pat_id
         , i.ins_num
         , count(i.pat_id) over (partition by i.pat_id)                       as C
         , row_number() over (partition by i.ins_num order by i.ins_num desc) as RN
    from dbo.tb_pat_ins i
    )
select pat_id
     , ins_num
from CTE
where C > 1
      and RN = 1;

Ed B Hall of Fame Points: 3674 More actions · Answer 8

Does this work?

DROP TABLE IF EXISTS #tb_pat_ins 

CREATE TABLE #tb_pat_ins
(   pat_ID INT, 
    ins_num VARCHAR(20) 
)

INSERT #tb_pat_ins (pat_ID,ins_num)
VALUES (111807 , '100449401H'),
(461261, '100449401H'),
(106820, '101159201B'),
(35599,  '101203104H'),
(85116,' 101859902J'),
(445903, '103703602G'),
(199156, '103839502A'),
(342322, '104206501C'),
(457017, '105117401F'),
(98573, '105450402E')

SELECT i.ins_num
FROM #tb_pat_ins i
GROUP BY i.ins_num
HAVING COUNT(DISTINCT pat_ID) > 1

DaveBriCam SSCarpal Tunnel Points: 4302 More actions · Answer 9

That works until I try to add additional fields to be returned:

select distinct
       i.ins_num
from dbo.tb_pat_ins i
where i.active = 1
      and i.bp_id <> 1
group by i.ins_num
having count(distinct pat_id) > 1
order by i.ins_num desc; -- returns 442 records

select distinct
       i.ins_num
     , i.pat_id
     , i.bp_id
     , i.eff_dt
     , i.end_dt
     , i.card_ln
     , i.card_fn
     , i.card_mn
from dbo.tb_pat_ins i
where i.active = 1
      and i.bp_id <> 1
group by i.ins_num
       , i.pat_id
       , i.bp_id
       , i.eff_dt
       , i.end_dt
       , i.card_ln
       , i.card_fn
       , i.card_mn
having count(distinct pat_id) > 1
order by i.ins_num desc; -- returns no records

Ed B Hall of Fame Points: 3674 More actions · Answer 10

Identify the ins_nums, then join/semi join back to the table.

SELECT i.ins_num
     , i.pat_id
     , i.bp_id
     , i.eff_dt
     , i.end_dt
     , i.card_ln
     , i.card_fn
     , i.card_mn
from dbo.tb_pat_ins as i
where i.active = 1
and i.bp_id <> 1
and exists (select 1
            from dbo.tb_pat_ins as i2
            where i2.ins_num = i.imsnum
            and i2.active = 1
            and i2.bp_id <> 1
            group by i2.ins_num
            having count(i2.distinct pat_id) > 1
            )