Difficulty with aggregating values that match the aggregate result

  • I should know how to do this, but it's escaping me. The actual schema I'm working against is proprietary and also adds more complication to the problem I'm trying to solve. So to solve this problem, I created a mock schema that is hopefully representative. See below for the mock schema, test data, my initial attempts at the query and the expected results.

    -- greatly simplified schema that makes as much sense as the real schema

    CREATE TABLE main (keyvalue INT NOT NULL PRIMARY KEY, otherdata VARCHAR(10));

    CREATE TABLE dates (datekeyvalue INT NOT NULL, keyvalue INT NOT NULL, datevalue DATE NULL, PRIMARY KEY(datekeyvalue, keyvalue));

    CREATE TABLE payments (datekeyvalue INT NOT NULL, keyvalue INT NOT NULL, paymentvalue INT NULL, PRIMARY KEY(datekeyvalue, keyvalue));

    -- test data

    INSERT main VALUES (1, 'first row'), (2, 'second row'), (3, 'third row');

    INSERT dates VALUES

    (10, 1, '2015-09-01'), (11, 1, '2015-09-25'), (12, 1, '2015-09-25'), (13, 1, '2015-09-27'), (14, 1, '2015-09-25'),

    (20, 2, NULL), (21, 2, '2015-10-11'), (22, 2, '2015-10-11'), (23, 2, '2015-12-24');

    INSERT payments VALUES

    (10, 1, 1), (11, 1, 17), (12, 1, 13), (13, 1, 100), (14, 1, NULL), (21, 2, 150);

    -- variable for today's date so that this script works in the future

    DECLARE @today date = '2015-09-23';

    -- outer query has this kind of pattern

    SELECT m.keyvalue, otherdata

    FROM main m;

    -- adding next date to outer query, works

    SELECT m.keyvalue, otherdata, min_dates.min_date

    FROM main m

    LEFT JOIN (SELECT keyvalue, MIN(d.datevalue) AS min_date

    FROM dates d

    WHERE d.datevalue > @today

    GROUP BY d.keyvalue) AS min_dates

    ON min_dates.keyvalue = m.keyvalue;

    -- attempt to add next payment to the query, WRONG RESULTS

    SELECT m.keyvalue, otherdata, t.min_date, t.sum_pmt

    FROM main m

    LEFT JOIN (SELECT d.keyvalue, MIN(d.datevalue) AS min_date, SUM(p.paymentvalue) AS sum_pmt

    FROM dates d

    JOIN payments p ON p.datekeyvalue = d.datekeyvalue AND p.keyvalue = d.keyvalue

    WHERE d.datevalue > @today

    GROUP BY d.keyvalue) AS t

    ON t.keyvalue = m.keyvalue;

    -- desired results

    SELECT 1 AS keyvalue, 'first row' AS otherdata, '2015-09-25' AS nextdate, 30 AS next_payment

    UNION ALL

    SELECT 2, 'second row', '2015-10-11', 150

    UNION ALL

    SELECT 3, 'third row', NULL, NULL

    I know I'm doing something wrong in the last query and I believe another sub-query is needed?

    Let me answer a few questions in advance:

    Q: This schema looks horrible!

    A: You don't know the half of it. I cleaned it up considerably for this question.

    Q: Why is this schema designed like this?

    A: Because it's a 3rd-party mainframe file dump being passed off as a relational database. And, no, I can't change it.

    Q: I hope this isn't a frequently-run query against a large, high-activity database in which performance is mission-critical.

    A: Yes, it is, and I left out the part where both the date and the amount are actually characters and have to pass through TRY_CONVERT (because I know how to do that part).

    Q: This is REALLY not going to perform well.

    A: Yup.

  • Here are 2 options to get your desired results. There might be others available but these where the first I could think of.

    --Option 1

    SELECT m.keyvalue,

    otherdata,

    t.min_date,

    t.sum_pmt

    FROM main m

    OUTER APPLY (SELECT d.datevalue AS min_date,

    SUM(p.paymentvalue) AS sum_pmt

    FROM dates d

    JOIN payments p ON p.datekeyvalue = d.datekeyvalue AND p.keyvalue = d.keyvalue

    WHERE d.datevalue = ( SELECT TOP 1 i.datevalue

    FROM dates i

    WHERE i.datevalue > @today

    AND i.keyvalue = m.keyvalue

    ORDER BY i.datevalue)

    AND d.keyvalue = m.keyvalue

    GROUP BY d.datevalue) AS t;

    --Option 2

    WITH CTE AS(

    SELECT m.keyvalue,

    m.otherdata,

    d.datevalue,

    p.paymentvalue,

    RANK() OVER( PARTITION BY m.keyvalue ORDER BY d.datevalue) r

    FROM main m

    LEFT

    JOIN dates d ON m.keyvalue = d.keyvalue

    AND d.datevalue > @today

    LEFT

    JOIN payments p ON d.datekeyvalue = p.datekeyvalue

    AND d.keyvalue = p.keyvalue

    )

    SELECT keyvalue,

    otherdata,

    datevalue,

    SUM( paymentvalue) next_payment

    FROM CTE

    WHERE r = 1

    GROUP BY keyvalue,

    otherdata,

    datevalue

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks, I will use the first one!

    I keep forgetting about CROSS APPLY and OUTER APPLY and try to solve everything with joins...

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply