Puzzling SELECT TOP/Group By issue

  • I have a scenario that I KNOW should be fairly simple to write, but I am completely blanking as to a solution. For reference I'm writing this for Server 2k. I'll begin with my initial query and then try to explain what I need to change...

    SELECT a.application_id,

    d.brandname,

    dd.dosage,

    t.renew,

    c.short_name

    FROM pap_application a

    JOIN tracking t ON a.application_id = t.application_id

    JOIN drug d ON a.drug_id = d.drug_id

    JOIN drug_dosage dd ON a.dosage_id = dd.dosage_id

    JOIN clinic c ON a.clinic_id = c.clinic_id

    WHERE a.patient_id = @patient_id

    AND t.renew IS NOT NULL

    AND t.renewed <> 1

    AND t.remove_renew <> 1

    ORDER BY brandname

    The central table is the application table (sorry, the naming convention threw me off too - think of "applying" for something) which contains information concerning patients, doctors, drugs, etc... it's basically a record of foreign keys. My query returns a list of all the applications for a given patient that are eligible for renewal. The information I need is the application_id, the drug name, dosage, and clinic name. Now, the data I'm working will often return several records for the same drug, dosage, and clinic - but I only need one for each combination of drug/dosage/clinic.

    My first instinct is to use TOP - since I want the record with the most recent renew date - but since I also need to return the renewal date, I'd be forced to add it to the GROUP BY clause - thereby undoing the group. Similar issue with application_id - it is a distinct field so I can't include it in a GROUP BY.

    My second attempts revolved around using MAX:

    SELECT a.application_id,

    d.brandname,

    dd.dosage,

    MAX(t.renew) as renew,

    c.short_name

    FROM pap_application a

    JOIN tracking t ON a.application_id = t.application_id

    JOIN drug d ON a.drug_id = d.drug_id

    JOIN drug_dosage dd ON a.dosage_id = dd.dosage_id

    JOIN clinic c ON a.clinic_id = c.clinic_id

    WHERE a.patient_id = @patient_id

    AND t.renew IS NOT NULL

    AND t.renewed <> 1

    AND t.remove_renew <> 1

    GROUP BY brandname, dosage, short_name

    ORDER BY brandname

    but this isn't valid without putting the application_id in the GROUP BY clause - and if I do that every group will have a single record.

    I've come up with a rather complicated way to do it using a cursor, but there's absolutely got to be a better way to do this. My problem here is that I need the application ID of the most recent records for each drug/dosage/clinic, but my unique identifier seems to be getting in the way.

    Any help would be greatly appreciated!

  • Something along these lines will work just fine. You may need to adjust your indexes to get optimal performance.

    SELECT a.application_id,

    d.brandname,

    dd.dosage,

    t.renew,

    c.short_name

    FROM pap_application a

    JOIN tracking t ON a.application_id = t.application_id

    and t.renew = (SELECT TOP(1) t2.Renew

    FROM dbo.tracking t2

    WHERE t2.application_id = a.application_id

    ORDER BY t2.Renew DESC)

    JOIN drug d ON a.drug_id = d.drug_id

    JOIN drug_dosage dd ON a.dosage_id = dd.dosage_id

    JOIN clinic c ON a.clinic_id = c.clinic_id

    WHERE a.patient_id = @patient_id

    AND t.renew IS NOT NULL

    AND t.renewed <> 1

    AND t.remove_renew <> 1

    ORDER BY brandname

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the response!

    I gave it a shot but unfortunately that didn't seem to work. The application and tracking tables are 1-to-1, so selecting the top renewal date for a given application simply grabs THE renewal date for that application.

  • How about ?

    ;WITH cte

    AS

    (

    SELECT a.application_id,

    d.brandname,

    dd.dosage,

    t.renew,

    c.short_name,

    ROW_NUMBER() OVER(PARTITION BY a.application_id, a.drug_id, a.dossage_id, a.clinic_id ORDER BY t.renew DESC) AS rn

    FROM pap_application a

    JOIN tracking t ON a.application_id = t.application_id

    JOIN drug d ON a.drug_id = d.drug_id

    JOIN drug_dosage dd ON a.dosage_id = dd.dosage_id

    JOIN clinic c ON a.clinic_id = c.clinic_id

    WHERE a.patient_id = @patient_id

    AND t.renew IS NOT NULL

    AND t.renewed <> 1

    AND t.remove_renew <> 1

    )

    SELECT application_id,

    brandname,

    dosage,

    renew,

    short_name

    FROM cte

    WHERE rn = 1

    ORDER BY brandname


    * Noel

  • dave (1/6/2009)


    Thanks for the response!

    I gave it a shot but unfortunately that didn't seem to work. The application and tracking tables are 1-to-1, so selecting the top renewal date for a given application simply grabs THE renewal date for that application.

    But... You're using this in your own code

    MAX(t.renew) as renew,

    And if it's one-to-one, how can you have more than one renewal date in the first place?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Sorry, I guess that MAX was a confusingly bad idea

    Perhaps if I show you the result set I'm getting and the result set I'm aiming for it will help to describe my issue:

    Have:

    application_id brandname dosage renew short_name

    -------------- --------- ------- ----------------------- ----------

    100152 Lexapro 20 2008-09-30 00:00:00.000 37

    3875 Seroquel 200 2005-07-11 00:00:00.000 37

    3876 Trileptal 300 2005-07-10 00:00:00.000 37

    15156 Trileptal 300 2005-09-09 00:00:00.000 37

    20990 Trileptal 300 2005-11-23 00:00:00.000 37

    30944 Trileptal 300 2006-04-01 00:00:00.000 37

    36518 Trileptal 300 2006-02-05 00:00:00.000 37

    95814 Trileptal 300 2008-09-02 00:00:00.000 37

    19329 Trileptal 600 2006-02-20 00:00:00.000 37

    31591 Trileptal 600 2006-04-23 00:00:00.000 37

    99826 Vistaril 50 2008-09-02 00:00:00.000 37

    23286 Vistaril 50 2005-12-21 00:00:00.000 37

    28641 Vistaril 50 2006-02-13 00:00:00.000 37

    15683 Vistaril 50 2005-09-27 00:00:00.000 37

    3877 Vistaril 50 2005-07-14 00:00:00.000 37

    Want:

    application_id brandname dosage renew short_name

    -------------- --------- ------- ----------------------- ----------

    100152 Lexapro 20 2008-09-30 00:00:00.000 37

    3875 Seroquel 200 2005-07-11 00:00:00.000 37

    95814 Trileptal 300 2008-09-02 00:00:00.000 37

    31591 Trileptal 600 2006-04-23 00:00:00.000 37

    99826 Vistaril 50 2008-09-02 00:00:00.000 37

  • Noel - It worked! I had to remove application_id from the partition, but that did the trick.

    Thanks everyone for the help!

  • dave (1/6/2009)


    Noel - It worked! I had to remove application_id from the partition, but that did the trick.

    Thanks everyone for the help!

    I am glad you got it!


    * Noel

  • Hmm... didn't you mention originally that you're writing it for SQLS 2000? This won't work on 2k, it uses feature introduced in SQLS 2005.

    In case you are writing it for 2k, but are doing it using 2005, you will get a nasty surprise when you try to implement it. Let us know if this is the case, it can be done in 2k, but in a different way.

  • You're right. I'm developing locally on a 2k5 instance and when I tried to push the changes to our staging server (2k) it didn't work!

    I suppose this is a 2k5 forum - but anyone got any ideas for 2k?

  • Actually, looking at the data, what if you modified the solution I suggested to get the max ApplicationID instead of the max date? Just use the other columns as matching criteria. It ought to work.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Try this, see if it does what you need:

    SELECT a.application_id,

    d.brandname,

    dd.dosage,

    t.renew,

    c.short_name

    FROM pap_application a

    JOIN tracking t ON a.application_id = t.application_id

    JOIN drug d ON a.drug_id = d.drug_id

    JOIN drug_dosage dd ON a.dosage_id = dd.dosage_id

    JOIN clinic c ON a.clinic_id = c.clinic_id

    JOIN

    (SELECT d.brandname as BName, dd.dosage as Dose, max(t.renew) as MaxRenew

    FROM pap_application a

    JOIN tracking t ON a.application_id = t.application_id

    JOIN drug d ON a.drug_id = d.drug_id

    JOIN drug_dosage dd ON a.dosage_id = dd.dosage_id

    JOIN clinic c ON a.clinic_id = c.clinic_id

    WHERE a.patient_id = @patient_id

    AND t.renew IS NOT NULL

    AND t.renewed <> 1

    AND t.remove_renew <> 1

    GROUP BY brandname, dosage) Sub

    ON d.brandname = sub.BName

    AND dd.dosage = sub.Dose

    AND t.renew = sub.MaxRenew

    WHERE a.patient_id = @patient_id

    AND t.renew IS NOT NULL

    AND t.renewed <> 1

    AND t.remove_renew <> 1

    ORDER BY brandname

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • This should be close...

    SELECT a.application_id,

    d.brandname,

    dd.dosage,

    t.renew,

    c.short_name

    FROM pap_application a

    JOIN tracking t ON a.application_id = t.application_id

    JOIN drug d ON a.drug_id = d.drug_id

    JOIN drug_dosage dd ON a.dosage_id = dd.dosage_id

    JOIN clinic c ON a.clinic_id = c.clinic_id

    INNER JOIN (SELECT a.patient_id, d.brandname, dd.dosage, MAX(t.renew) AS RecentRenew

    FROM pap_application a

    INNER JOIN tracking t ON a.application_id = t.application_id

    INNER JOIN drug d ON a.drug_id = d.drug_id

    INNER JOIN drug_dosage dd ON a.dosage_id = dd.dosage_id

    WHERE t.renew IS NOT NULL

    AND t.renewed <> 1

    AND t.remove_renew <> 1) r

    ON r.patient_id = a.patient_id AND r.brandname = d.brandname AND r.dosage = dd.dosage AND r.RecentRenew = t.renew

    WHERE a.patient_id = @patient_id

    AND t.renew IS NOT NULL

    AND t.renewed <> 1

    AND t.remove_renew <> 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • what if you modified the solution I suggested to get the max ApplicationID instead of the max date?

    I thought of that but the renew dates are arbitrarily set so the most recent application (highest application_id) could have a renew date that isn't the latest. Did that make sense?

    Ex.)

    App_id Renew_date

    ------ ------------

    10000 1/2/2009

    10001 1/1/2009

    So if we went purely by application_id, we might not be grabbing the latest renewal date

  • Chris and GSquared - Thanks!

    I had just tried something essentially the same to that approach and it almost worked. What I discovered however was that there were a couple applications that the same renew date so I got this:

    appID drug dose clinic renew

    ------ -------- ---- ----- ------

    104291Lexapro20402009-02-07 00:00:00.000

    7218 Seroquel200372005-07-11 00:00:00.000

    3875 Seroquel200372005-07-11 00:00:00.000

    4003 Seroquel200372005-07-11 00:00:00.000

    37571Seroquel200372005-07-11 00:00:00.000

    99826Vistaril50372008-09-02 00:00:00.000

    95814Trileptal300372008-09-02 00:00:00.000

    instead of this:

    104291Lexapro20402009-02-07 00:00:00.000

    37571Seroquel200372005-07-11 00:00:00.000

    99826Vistaril50372008-09-02 00:00:00.000

    95814Trileptal300372008-09-02 00:00:00.000

    The only thing remaining is to add some sort of tie breaker to grab only one record for any give drug/dosage/clinic. I don't think it really matters - highest or lowest application_id would work, or it could be entirely random just so long as there is only one. I think I know how to do it... I'll post again with either the answer or yet another request for some assistance!

    Thanks again

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply