September 9, 2022 at 7:31 pm
I'm trying to determine why I'm getting several (about 20%) duplicate values in my 'pt.pat_fullname' field. I would think the DISTINCT would work. Any ideas?
drop table if exists #allclaims;
select distinct cl.clm_hdr_id
, cl.clm_dos
, cl.pat_id
into #allclaims
from dbo.vw_claims cl
where cl.clm_dos
between '2022-08-01' and '2022-08-31'
group by cl.clm_hdr_id
, cl.clm_dos
, cl.pat_id
order by cl.pat_id;
select top 500
*
from
(
select distinct
pt.pat_fullname
, cl.clm_hdr_id
, cl.clm_dos as 'Service Date'
from #allclaims cl
inner join dbo.tb_patients pt
on cl.pat_id = pt.pat_id
) as a
order by newid();
September 9, 2022 at 7:51 pm
What is your definition of duplicates in this scenario? Please provide an actual example of multiple rows that comprise a duplicate (obfuscated if necessary).
There should be only one row for any given set/combination of pat_fullname, clm_hdr_id, & "Service Date" from that query.
But it is perfectly valid for it to return the same pat_fullname for various combinations of clm_hdr_id & "Service Date". Such cases would not be duplicates.
September 9, 2022 at 8:00 pm
September 9, 2022 at 8:00 pm
I see you have both distinct and group by in the same query - use one or the other, and since you are not using any aggregate functions then you would only need the distinct. You also don't need to include an ORDER BY - that is only valid when you have an IDENTITY column and you need the identity to be assigned based on that order.
Ideally, you would have another column in vw_claims that could be used to identify a single row. If so - then it would be better to use that column instead of using distinct. Given that you are looking for each distinct date of service for a claim - then you can and will get multiple rows per claim. It all depends on how that claim is billed out and the dates of service for each procedure performed.
With that said - there is no reason to insert into a temp table - and then select the top 500. It can all be done in a single query - and selecting from a derived table is just extra typing.
If the desired result is a distinct list of claims per patient - then you need to either remove the DOS or get the min/max date of service. If the desired result is all dates of service by claims per patient - then just use a group by.
select top (500)
pt.pat_fullname
, cl.clm_hdr_id
, max(cl.clm_dos) as 'Service Date'
from dbo.vw_claims cl
inner join dbo.tb_patients pt
on cl.pat_id = pt.pat_id
where cl.clm_dos
between '2022-08-01' and '2022-08-31'
group by cl.clm_hdr_id
, pt.pat_fullname
order by newid();
select top (500)
pt.pat_fullname
, cl.clm_hdr_id
, cl.clm_dos as 'Service Date'
from dbo.vw_claims cl
inner join dbo.tb_patients pt
on cl.pat_id = pt.pat_id
where cl.clm_dos
between '2022-08-01' and '2022-08-31'
group by cl.clm_hdr_id
, pt.pat_fullname
, cl.clm_dos
order by newid();
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
September 9, 2022 at 8:02 pm
It's funny how people rail against duplicates in a randomly selected set... it's a part of what "random" is.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 9, 2022 at 8:05 pm
I need for the person's name to be unique. So one of the records for 'Reece, Genais' needs to be eliminated. It doesn't matter which one:
Saw this after I posted above - those are not DISTINCT rows. The patient has multiple claims in the selected period - and stating it doesn't matter which one you gets means neither the DOS or claim header ID are important.
select top (500)
pt.pat_fullname
, clm_hdr = max(cl.clm_hdr_id)
, service_date = max(cl.clm_dos)
from dbo.vw_claims cl
inner join dbo.tb_patients pt
on cl.pat_id = pt.pat_id
where cl.clm_dos
between '2022-08-01' and '2022-08-31'
group by pt.pat_fullname
order by newid();
Of course, this won't work if you have 2 patients with the same name as one of them would be eliminated. If that is the case, change the group by to group by pt.pat_id.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
September 9, 2022 at 8:06 pm
It's funny how people rail against duplicates in a randomly selected set... it's a part of what "random" is.
It's also funny what some people consider duplicates - even when all but one column is different.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply