Grouping, eliminate un-matched pairs?

  • Morning all,

    I think that (subject) describes this scenario???

    task - for a given post date range, find variance in days (post vs dos), and sum(volume), sum(charges), datediff(days, dos,post)

    schema -

    pid (int) , dos (date),  post (date) , part (varchar), volume (int) , charges (money)

    data -

    1 , 1/1/2007 , 1/2/2007, parta , 2, $40.00

    1 , 1/1/2007 , 1/2/2007, parta , 2, $40.00

    1 , 1/1/2007 , 1/4/2007, parta , 1, $20.00

    1 , 1/1/2007 , 1/4/2007, parta , 1, $20.00

    1 , 1/1/2007 , 1/4/2007, parta , 4, $80.00

    1 , 1/1/2007 , 1/6/2007, parta , 3, $60.00

    1 , 1/1/2007 , 1/7/2007, parta , -3, -$60.00

    1 , 1/3/2007 , 1/4/2007, parta , 1, $20.00

    1 , 1/3/2007 , 1/4/2007, parta , 1, $20.00

    output

    pid , dos , post ,  lag(days post-dos ) , part , sum(volume), sum(charges)

    1 , 1/1/2007 , 1/2/2007 , 1, parta , 4 , $ 80.00

    1 , 1/1/2007 , 1/4/2007 , 3, parta , 6 , $100.00

    1 , 1/3/2007 , 1/4/2007 , 1, parta , 2 , $ 40.00

    Easy? Just group by pid, dos, post, part and sum the volume,charges, right? If so, you wouldn't be reading this. I have to drop out post of 1/6 and 1/7 because they cancel. I have the logic to include only when having sum(charges) > $0 and that used to work when reversal posted same day. Now when the pair are offset by a day, The reversal on 1/7 drops out, sum(charges) > 0,  out but leaves the post of 1/6 in.

    Can't use temp tables as this ends up in Crystal pivot also, as well as SQL CSV. Any ideas on how to eliminate the reversal while still including the post date in the grouping?

     

  • This will work on the data given but any other alternative may not

    SELECT a.pid,a.dos,a.post,DATEDIFF(day,a.dos,a.post) AS [lag],a.part,SUM(a.volume),SUM(a.charges)

    FROM

    a

    LEFT JOIN

    b

    ON b.pid=a.pid

      AND b.dos=a.dos

      AND b.part=a.part

      AND (b.post=a.post+1 OR b.post=a.post-1)

      AND (b.volume+a.volume)=0

      AND (b.charges+a.charges)=0

    WHERE b.pid IS NULL

    GROUP BY a.pid,a.dos,a.post,DATEDIFF(day,a.dos,a.post),a.part

    HAVING SUM(a.charges) <> 0

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David;

    Did you want to Left Join

    b , then placing

    b on LEFT side of each pair in clause, and still ask when

    b is null?I'm not familiar with using Left Join that way.

     

     

     

     

     

  • Yes, the left join is intentional.

    I used to find if there was a corresponding entry for the previous/following day with the oposite values. So for the data you posted each row would have null values for b except for

    1 , 1/1/2007 , 1/6/2007, parta , 3, $60.00

    1 , 1/1/2007 , 1/7/2007, parta , -3, -$60.00

    where each would have the corresponding other row.

    By using 'WHERE b.pid IS NULL' I exclude these rows as they are not required as you requested

    As I also stated this will work on the data posted but may not if the data varies from this

    p.s.

    Run this query on the above data

    SELECT a.*,b.*

    FROM

    a

    LEFT JOIN

    b

    ON b.pid=a.pid

      AND b.dos=a.dos

      AND b.part=a.part

      AND (b.post=a.post+1 OR b.post=a.post-1)

      AND (b.volume+a.volume)=0

      AND (b.charges+a.charges)=0

    and you will see

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David;

    Ok, I'm running validations to understand. I've always used 'left join' with the table I want all data from (a) on the left side of the '='  as (a.filed = b.filed) , and check for right side (b.filed = null). Placing the b on the left interested me.

    Would simply changing the (b.post = a.post+1 or b.post = a.post -1) to (b.post <> a.post) work to catch more that one day span?

     

  • quote...with the table I want all data from (a) on the left side of the '='...

    My preference is to put the JOIN'd table alias first but to SQL it makes no difference

    (a.filed = b.filed)  is the same as (b.filed = a.filed)

    quotecatch more that one day span

    Yes, providing the rest of the columns match

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David;

    Thanks. I know I read somewhere in SQL book about the left/right side of the join syntax, and took for granted that impacted the parser.

    I've made those change and validating and it's looking great! I can use this logic elsewhaere and thats the bonus. This pulls out corrections to mistakes that impact financial reports.  Upto now the reversal was always posted same day. Now there can be days/months for the reversal to post.

    Again, thanks for the insight.

    Randy.

     

     

  • Validation turned up missing record that should have been included.

    ID PART  DOS             POST           LAG   VOL         CHARGES

    5  198     03/22/2006  04/03/2007  377   -1.0000    -115.00

    5  198     03/22/2006  04/04/2007  378   -1.0000    -115.00

    5  198     03/22/2006  04/03/2007  377   -1.0000    -115.00

    5  198     03/22/2006  04/06/2007  380    1.0000     115.00

    5  198     03/22/2006  04/06/2007  380    1.0000     115.00

    5  206     03/22/2006  04/04/2007  378   -1.0000    -191.00

    5  206     03/22/2006  04/03/2007  377    1.0000     191.00

    5  206     03/22/2006  04/03/2007  377    1.0000     191.00

    The first 5 lines for part 198 all have DOS of 3/22, even though there are 2 positive posts, but three reversals, the extra reversal should not count and not be included. The last three lines, part 206, have same DOS with 2 posts and 1 reversal, leaving one positive post, so this is a wanted record.

    The code eliminated this from the results. Too bad, so close. If you take the POST date out of the select and group, having only >0 results it's perfect, just can't calculate the lag and show the POST.

     

     

  • You could try this

    SELECT pid,part,dos,MAX(post) AS [post],DATEDIFF(day,dos,MAX(post)) AS [lag],SUM(volume),SUM(charges)

    FROM

    GROUP BY pid,dos,part

    HAVING SUM(charges) <> 0

    It will give you the greatest lag for pid,part,dos and since you have no ordering for the data (unless you failed to mention this) then the lag is imaterial as you cannot guarantee which one anyway

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David;

    Were you listening in? We just discussed this in my team meeting  an hour ago and I'm off running a preview to get signed off, based on max lag!

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply