Eliminating duplicate records in join

  • Okay, I know this is child's play for some of you, but I'm going nuts.  Here is a SELECT statement that is returning duplicate rows:

    SELECT  i.POLineKey,

     i.ShipLineKey,

     i.InvoicePostDate,

     i.InvoiceTranID,

     v.VoucherPostDate,

     v.VoucherTranID,

     i.SOLineKey,

     i.QtyShipped,

     i.SOExtCOGS,

     v.VoucherQty,

     v.VoucherUnitCost,

     v.POExtCost

    FROM  tsoDSInvoiceCOGS_SSSI i,

     tsoDSVoucherCOGS_SSSI v

    WHERE  v.POLineKey = i.POLineKey

     AND CAST(i.SOExtCOGS AS Dec(15,2)) <> CAST(v.POExtCost AS Dec(15,2))

    ORDER BY v.VoucherTranID, i.InvoicePostDate

    The reason it is returning duplicate rows is that multiple vouchers and multiple invoices may be linked to through the same POLineKey+SOLineKey combinations.  The unique identifier on the SOLine side is i.ShipLineKey, but there is no corresponding v-table data element to use in the join (nor is there any in the database).  While the records are correlated, they are correlated ONLY by the fact that an SOLineKey may have only one (1) POLineKey associated with it.  (This is true even though multiple AP vouchers and multiple AR invoices may result.)

    For example, with one POLineKey, I have 15 correlated (unique) ShipLineKeys and 17 correlated VoucherLineKeys (with unique VoucherTranID's).  However, when I run the above query with the added constraint WHERE... i.POLineKey = 2266, I get 149 rows in the Result set (multiple rows because of the the multiple combinations of lines around the v.POLineKey = i.POLineKey.  In reality, I should get a MAXIMUM of 17 rows in the result set.

    The true correspondence between the Invoice lines and Voucher lines is related to the proximity of InvoicePostDate and VoucherPostDate.  Typically, they are going to be within a few days of one another.

    Now, I'm convinced that I need to do some kind of thing where I join the i-table to itself and use GROUP BY or somthing to get it done.  When I was working my way through the chapter 6 of The Guru's Guide to Transact-SQL, I thought I understood all that stuff about joining a table to itself through the subqueries to elinate duplicate records, but after struggling with this for several hours, I've decided I do NOT know how to put it in practice.

    I'm sure some of you out there have some brilliant resolution to this dilemma that goes well beyond my present skill level.

    Thanks, in advance, for your help.

     


    Richard D. Cushing
    Sr. Consultant
    SYNGERGISTIC SOFTWARE SOLUTIONS, LLC

  • Unfortunately I know some accounting, so this is very confusing to me. 

    Please post the DDL and some sample data with an example of the desired result set.  If your DDL does not make the relationships between the tables clear, please explain that, too.



    --Jonathan

  • Unfortunately I know some accounting, so this is very confusing to me.

    Hey Jonathan, that's more than the guys at Enron and MCI Worldcom can say

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • that's more than the guys at Enron and MCI Worldcom can say

    They're preparing to say (on the stand): "Unfortunately I don't know any accounting, so this is very confusing to me." 



    --Jonathan

  • SELECT *

    FROM tsoDSInvoiceCOGS_SSSI

    WHERE POLineKey = 2266

    ORDER BY InvoicePostDate

    RESULTS:

    SOLineKey   POLineKey   ShipLineKey QtyShipped         InvcUnitCost      InvoicePostDate                                        InvoiceTranDate                                        InvoiceTranID SOExtCOGS        

    ----------- ----------- ----------- ------------------ ----------------- ------------------------------------------------------ ------------------------------------------------------ ------------- -----------------

           5399        2266        8229         2.00000000          11.38000 2001-08-23 00:00:00.000                                2001-08-23 00:00:00.000                                0000002655-IN             22.76

           5399        2266        8558         1.00000000         111.38000 2001-08-27 00:00:00.000                                2001-08-27 00:00:00.000                                0000002783-IN            111.38

           5399        2266       15934         1.00000000         111.38000 2001-10-19 00:00:00.000                                2001-10-18 00:00:00.000                                0000004485-IN            111.38

           5399        2266       17726         1.00000000 NULL              2001-11-05 00:00:00.000                                2001-11-05 00:00:00.000                                0000004975-IN NULL

           5399        2266       19817         1.00000000 NULL              2001-11-21 00:00:00.000                                2001-11-21 00:00:00.000                                0000005546-IN NULL

           5399        2266       22841         1.00000000         111.38000 2001-12-18 00:00:00.000                                2001-12-18 00:00:00.000                                0000006304-IN            111.38

           5399        2266       27387         1.00000000         111.38000 2002-01-31 00:00:00.000                                2002-01-31 00:00:00.000                                0000007518-IN            111.38

           5399        2266       29781         1.00000000         116.96000 2002-02-14 00:00:00.000                                2002-02-14 00:00:00.000                                0000007922-IN            116.96

           5399        2266       38609         1.00000000         116.96000 2002-04-26 00:00:00.000                                2002-04-26 00:00:00.000                                0000010145-IN            116.96

           5399        2266       42037         1.00000000         116.96000 2002-05-30 00:00:00.000                                2002-05-30 00:00:00.000                                0000011050-IN            116.96

           5399        2266       44142         1.00000000         116.96000 2002-06-18 00:00:00.000                                2002-06-18 00:00:00.000                                0000011581-IN            116.96

           5399        2266       44434         1.00000000         116.96000 2002-06-21 00:00:00.000                                2002-06-21 00:00:00.000                                0000011683-IN            116.96

           5399        2266       46922         1.00000000         116.96000 2002-07-16 00:00:00.000                                2002-07-16 00:00:00.000                                0000012349-IN            116.96

           5399        2266       50893         1.00000000         116.96000 2002-08-26 00:00:00.000                                2002-08-26 00:00:00.000                                0000013374-IN            116.96

           5399        2266       53548         1.00000000         116.96000 2002-09-17 00:00:00.000                                2002-09-17 00:00:00.000                                0000013984-IN            116.96

    (15 row(s) affected)

    ========

    SELECT *

    FROM tsoDSVoucherCOGS_SSSI

    WHERE POLineKey = 2266

    ORDER BY VoucherTranDate

    POLineKey   VoucherUnitCost   VoucherPostDate                                        VoucherTranDate                                        VoucherTranID      DropShip VoucherQty         POExtCost        

    ----------- ----------------- ------------------------------------------------------ ------------------------------------------------------ ------------------ -------- ------------------ -----------------

           2266         111.38000 2001-08-21 00:00:00.000                                2001-07-02 00:00:00.000                                697184FL08-IN             1         1.00000000            111.38

           2266         111.38000 2001-08-21 00:00:00.000                                2001-07-23 00:00:00.000                                697184FL09-IN             1         1.00000000            111.38

           2266         111.38000 2001-08-27 00:00:00.000                                2001-08-14 00:00:00.000                                697184FL10-IN             1         1.00000000            111.38

           2266         111.38000 2001-10-18 00:00:00.000                                2001-10-03 00:00:00.000                                697184FL11-IN             1         1.00000000            111.38

           2266         111.38000 2001-11-05 00:00:00.000                                2001-10-24 00:00:00.000                                697184FL12-IN             1         1.00000000            111.38

           2266         111.38000 2001-11-21 00:00:00.000                                2001-11-14 00:00:00.000                                697184FL13-IN             1         1.00000000            111.38

           2266         111.38000 2001-12-18 00:00:00.000                                2001-12-05 00:00:00.000                                697184FL14-IN             1         1.00000000            111.38

           2266         111.38000 2002-01-18 00:00:00.000                                2002-01-03 00:00:00.000                                697184FL15-IN             1         1.00000000            111.38

           2266         111.38000 2002-01-18 00:00:00.000                                2002-01-09 00:00:00.000                                697184FL16-IN             1         1.00000000            111.38

           2266         111.38000 2002-02-14 00:00:00.000                                2002-01-31 00:00:00.000                                697184FL17-IN             1         1.00000000            111.38

           2266         111.38000 2002-06-18 00:00:00.000                                2002-03-13 00:00:00.000                                697184FL18-IN             1         1.00000000            111.38

           2266         111.38000 2002-04-26 00:00:00.000                                2002-04-11 00:00:00.000                                697184FL19-IN             1         1.00000000            111.38

           2266         111.38000 2002-05-29 00:00:00.000                                2002-05-13 00:00:00.000                                697184FL20-IN             1         1.00000000            111.38

           2266         111.38000 2002-06-21 00:00:00.000                                2002-06-10 00:00:00.000                                697184FL21-IN             1         1.00000000            111.38

           2266         111.38000 2002-07-16 00:00:00.000                                2002-07-09 00:00:00.000                                697184FL22-IN             1         1.00000000            111.38

           2266         111.38000 2002-08-20 00:00:00.000                                2002-08-13 00:00:00.000                                697184FL23-IN             1         1.00000000            111.38

           2266         116.96000 2002-09-17 00:00:00.000                                2002-09-09 00:00:00.000                                697184FL24-IN             1         1.00000000            116.96

    (17 row(s) affected)

    ==========

    Does that help?

    And regarding those Enron guys, I'd just like to be there lawyers:

    Former Enron guy: "How much is it going to cost me for you to defend me?"

    Attorney: "A million dollars a day."

    Former Enron guy: "Well, as you know, I never understood money matters too well, so I guess that's a good deal.  In fact, I can probably make money spending it at that rate.  What do you think?"

     


    Richard D. Cushing
    Sr. Consultant
    SYNGERGISTIC SOFTWARE SOLUTIONS, LLC

  • Actually, that doesn't help much.  That could (with some nasty manual reformatting) be turned into sample data, but we (well, I, anyway) need more than just that.  Script out and post the DDL for the tables, and show the desired result set from the sample data, please.



    --Jonathan

  • Well, here goes.  See if this will get what you need:

    CREATE TABLE [tsoDSVoucherCOGS_SSSI]

    (

     [POLineKey] [int] NOT NULL ,

     [VoucherUnitCost] [decimal](15, 5) ,

     [VoucherPostDate] [datetime] ,

     [VoucherTranDate] [datetime] ,

     [VoucherTranID] [char] (25) ,

     [VoucherQty] [decimal](15, 3)  ,

     [POExtCost] [decimal](15, 2)

    ) ON [PRIMARY]

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '69.62', '7/31/2002', '7/2/2002', '287171FL37-IN     ', '1', '69.62');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '73.1', '8/14/2002', '8/6/2002', '458373FL01-IN     ', '1', '73.1');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '73.1', '9/5/2002', '8/29/2002', '458373FL02-IN     ', '1', '73.1');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '73.1', '10/10/2002', '10/2/2002', '458373FL03-IN     ', '1', '73.1');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '73.1', '11/13/2002', '11/4/2002', '458373FL04-IN     ', '1', '73.1');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '73.1', '12/31/2002', '12/12/2002', '458373FL05-IN     ', '1', '73.1');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '73.1', '1/14/2003', '12/31/2002', '458373FL06-IN     ', '1', '73.1');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '75.3', '2/6/2003', '1/29/2003', '458373FL07-IN     ', '1', '75.3');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '75.3', '3/10/2003', '2/27/2003', '458373FL08-IN     ', '1', '75.3');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '75.3', '4/9/2003', '4/1/2003', '458373FL09-IN     ', '1', '75.3');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '75.3', '5/12/2003', '5/1/2003', '458373FL10-IN     ', '1', '75.3');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '75.3', '6/12/2003', '6/2/2003', '458373FL11-IN     ', '1', '75.3');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '75.3', '7/15/2003', '7/1/2003', '458373FL12-IN     ', '1', '75.3');

    INSERT INTO tsoDSVoucherCOGS_SSSI (POLineKey, VoucherUnitCost, VoucherPostDate, VoucherTranDate, VoucherTranID, VoucherQty, POExtCost) VALUES ('10763', '75.3', '8/8/2003', '7/30/2003', '458373FL13-IN     ', '1', '75.3');

    CREATE TABLE [tsoDSInvoiceCOGS_SSSI]

    (

     [SOLineKey] [int] NOT NULL ,

     [POLineKey] [int] NOT NULL ,

     [ShipLineKey] [int] ,

     [QtyShipped] [decimal](16, 8) ,

     [InvcUnitCost] [decimal](15, 5) ,

     [InvoicePostDate] [datetime] ,

     [InvoiceTranDate] [datetime] ,

     [InvoiceTranID] [char] (13)  ,

     [SOExtCOGS] [decimal](15, 2) 

    ) ON [PRIMARY]

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '48523', '1', '73.1', '7/31/2002', '7/31/2002', '0000012759-IN', '73.1');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '49953', '1', '73.1', '8/14/2002', '8/14/2002', '0000013130-IN', '73.1');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '52302', '1', '73.1', '9/5/2002', '9/5/2002', '0000013691-IN', '73.1');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '56227', '1', '73.1', '10/10/2002', '10/10/2002', '0000014664-IN', '73.1');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '59705', '1', '73.1', '11/18/2002', '11/18/2002', '0000015585-IN', '73.1');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '64236', '1', '73.1', '12/31/2002', '12/31/2002', '0000016718-IN', '73.1');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '65993', '1', '75.3', '1/14/2003', '1/14/2003', '0000017136-IN', '75.3');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '68760', '1', '75.3', '2/6/2003', '2/6/2003', '0000017890-IN', '75.3');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '72137', '1', '75.3', '3/10/2003', '3/10/2003', '0000018726-IN', '75.3');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '75403', '1', '75.3', '4/9/2003', '4/9/2003', '0000019647-IN', '75.3');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '78231', '1', '75.3', '5/12/2003', '5/12/2003', '0000020389-IN', '75.3');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '81564', '1', '75.3', '6/16/2003', '6/16/2003', '0000021286-IN', '75.3');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '84640', '1', '75.3', '7/15/2003', '7/15/2003', '0000021998-IN', '75.3');

    INSERT INTO tsoDSInvoiceCOGS_SSSI (SOLineKey, POLineKey, ShipLineKey, QtyShipped, InvcUnitCost, InvoicePostDate, InvoiceTranDate, InvoiceTranID, SOExtCOGS) VALUES ('37690', '10763', '87115', '1', '75.3', '8/8/2003', '8/8/2003', '0000022713-IN', '75.3');

    This is the DDL and sample data.  I'll do the result set I'd like to see in a subsequent post.


    Richard D. Cushing
    Sr. Consultant
    SYNGERGISTIC SOFTWARE SOLUTIONS, LLC

  • Hmmmmmmmm...

    Your request to have me post the desired result set led me to think through additional factors.  I think I may have figured out how to get where I want to go now.

    I know the precise posting dates (vouchers and invoices) may not match; however, I do want to compare correlated transactions posted in the same calendar month.  Therefore, I am adding the following to my WHERE clause:

    AND  (

      DATEPART(mm,i.InvoicePostDate) = DATEPART(mm,v.VoucherPostDate)

      AND

      DATEPART(yy,i.InvoicePostDate) = DATEPART(yy,v.VoucherPostDate)

            )

    I think it may get me what I'm after.  If not, rest assured I'll be back.

    Thanks for your help!


    Richard D. Cushing
    Sr. Consultant
    SYNGERGISTIC SOFTWARE SOLUTIONS, LLC

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply