Help needed with Duplicates within a two day period.

  • Hi,

    I am trying to establish which prescriptions have been prescribed more than once within 2 days of each other as these may have been issued twice, so one is a duplicate.

    The table definition is:

    IF EXISTS (SELECT * FROM sysobjects WHERE name = 'tbl_prescription_dates' AND xtype = 'u')

    DROP TABLE tbl_prescription_dates

    GO

    CREATE table tbl_prescription_dates (

    Prescription_id int,

    Date_Issued datetime)

    The Data to populate the table is:

    INSERT INTO tbl_prescription_dates (Prescription_id, Date_Issued)

    SELECT 1, getdate()

    UNION SELECT 1, DATEADD(d,-1,getdate())

    UNION SELECT 1, DATEADD(d,-2,getdate())

    UNION SELECT 1, DATEADD(d,-3,getdate())

    UNION SELECT 2, getdate()

    UNION SELECT 3, getdate()

    UNION SELECT 3, DATEADD(d,-1,getdate())

    UNION SELECT 3, DATEADD(m,-1,getdate())

    UNION SELECT 3, DATEADD(y,-1,getdate())

    The output I am looking for is.

    Prescription_ID NoOfDuplicates

    ------------- ----------------

    1 2

    2 0

    3 1

    The explanation for this is:

    1. prescription_id 1 was prescribed 4 times in the table but 2 of the dates issued where within 2 days from the original prescription. Which means there was 2 duplicates.

    2. Prescription_id 2 was only prescribed once so no duplicates

    3. Prescription_id 3 was prescribed 5 times in the table and both were within two days of each other so there is one duplicate.

    I am totally stumped how to figure this one out so any help would really be appreciated.

    EDIT - Sorry about the layout of the results can't seem to get it to space out very well.

  • i had a go at this and ran out of time

    but you need to add an identity to your table definition so you don't match a row with itself

    CREATE table tbl_prescription_dates (

    Prescription_id int,

    Date_Issued datetime,

    id int identity(1,1)

    )

    then the basis for my attempt was

    select a.*,b.* from tbl_prescription_dates a

    inner join tbl_prescription_dates b

    on a.Prescription_id=b.Prescription_id

    and a.Date_Issued >= dateadd(dd,-2,b.Date_Issued)

    and a.Date_Issued <=dateadd(dd,+2,b.Date_Issued)

    and a.idb.id

    but the problem here is that we have multiple matches in the way the data is presented -

    let me thinkk more and i'll come up with something

    MVDBA

  • The table uses a composite key for the primary key by using the prescription_id and date_issued to make it unique, this is just a sample of the table loads of other columns are also in there but cut it down to make life a bit easier.

  • another problem here is that if you look at the 1st 4 rows of data (numbered 1 - 4)

    1 was within 2 days of 2 & 3

    2 was within 2 days of 1,3 and 4

    3 was within 2 days of 1,2 and 4

    4 was within 2 days of 2 and 3

    so prescriptionid 1 has potentially up to 4 duplicates - but your sample output shows 2

    MVDBA

  • this looks right to me

    select a.Prescription_id,count(distinct b.id) from tbl_prescription_dates a

    inner join tbl_prescription_dates b

    on a.Prescription_id=b.Prescription_id

    and a.Date_Issued >= dateadd(dd,-2,b.Date_Issued)

    and a.Date_Issued <=dateadd(dd,+2,b.Date_Issued)

    and a.idb.id

    group by a.Prescription_id

    but i needed the distinct row id to do it - you might have to dump your data into a temptable with identity column to do this in your live environment

    MVDBA

  • Are you really using SQL Server 7 or 2000?

  • Heres a solution I have thrown together:

    CREATE table dbo.tbl_prescription_dates (

    Prescription_id int,

    Date_Issued datetime)

    --The Data to populate the table is:

    INSERT INTO dbo.tbl_prescription_dates (Prescription_id, Date_Issued)

    SELECT 1, getdate()

    UNION ALL SELECT 1, DATEADD(d,-1,getdate())

    UNION ALL SELECT 1, DATEADD(d,-2,getdate())

    UNION ALL SELECT 1, DATEADD(d,-3,getdate())

    UNION ALL SELECT 2, getdate()

    UNION ALL SELECT 3, getdate()

    UNION ALL SELECT 3, DATEADD(dd,-1,getdate())

    UNION ALL SELECT 3, DATEADD(mm,-1,getdate())

    UNION ALL SELECT 3, DATEADD(yy,-1,getdate())

    create table #PrescriptionDates(

    PrescriptionID int,

    DateIssued datetime,

    DupRec int null,

    unique clustered (

    PrescriptionID asc,

    DateIssued asc

    ))

    insert into #PrescriptionDates (PrescriptionID, DateIssued)

    select Prescription_id, Date_Issued

    from tbl_prescription_dates

    declare @PrescriptionID int,

    @DateIssued datetime,

    @DupRec int

    select @PrescriptionID = -1, @DateIssued = cast(0 as datetime)

    update #PrescriptionDates set

    @DupRec = DupRec = case when @PrescriptionID pd.PrescriptionID then 0 else datediff(dd, DateIssued, @DateIssued) + 3 end,

    @DateIssued = case when (@PrescriptionID pd.PrescriptionID) or (@DupRec 0

    group by

    PrescriptionID) df

    on pd.PrescriptionID = df.PrescriptionID

    drop table #PrescriptionDates

    drop table dbo.tbl_prescription_dates

    --The output I am looking for is.

    --

    --Prescription_ID NoOfDuplicates

    --------------- ----------------

    --1 2

    --2 0

    --3 1

    --

  • Sorry for the slow reply on this, to answer a couple of questions.

    1. SQL 2000

    2. Each Prescription_ID can have many dates issued but duplicates are per ID only so prescription_id 1 has no relation to the others.

    3. Lynn will have a look at your solution.

    Thanks

  • Lynn,

    Looks good so far need to do a bit more testing but thanks very much for your help.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply