Data alignment

  • Hi all,

    I have two tables that I'm trying to reconcile the dollar amounts. However, I'm encountering some issues when the data does not align perfectly based on the fields that I'm using to join the tables.

    The disparity is due to when the datebegin date does not align to the attributing "month" field, and when I join on the month field there is data loss. insight is needed on how I can alter my code to align the data to make sure I capture the total amount accurately from both tables.

    My expected query result is to have $38,786.82 from table A and $38,786.82 from table B.

    I'm getting $8,858.43 for table B.

    Your insight is greatly appreciated, thanks.

    --Problematic aligment in table

    Table A

    MonthSerial NumberDatebeginDateendPaid

    Mar-1500000000000013/16/2015 0:004/10/2015 0:00 $38,786.82

    Apr-1500000000000014/7/2015 0:004/13/2015 0:00#N/A

    Apr-1500000000000014/10/2015 0:004/30/2015 0:00 $35,600.00

    Table B

    MonthSerial NumberDatebeginDateendPaid

    Mar-1500000000000014/7/2015 0:004/10/2015 0:00 $8,838.13

    Mar-1500000000000014/7/2015 0:004/10/2015 0:00 $20.30

    Apr-1500000000000014/7/2015 0:004/10/2015 0:00 $29,928.39

    Apr-1500000000000014/10/2015 0:004/30/2015 0:00 $35,600.00

    --Perfect alignment in table:

    Table A

    MonthSerial NumberDatebeginDateendPaid

    Mar-1500000000000013/20/2015 0:003/23/2015 0:00 $10,721.14

    Table B

    MonthSerial NumberDatebeginDateendPaid

    Mar-1500000000000013/20/2015 0:003/23/2015 0:00 $10,721.14

    query:

    select

    A1.Month,

    sum(A1.Paid)PaidAmount,

    B1.Month,

    sum(B1.Paid)PaidAmount

    from Table A as A1

    left outer join TableB as B1 on

    A1.Serial Number = B1.Serial Number and

    A1.Month = B1. Month

    Group by

    A1.Month,

    B1.Month

  • Your example is a bit confusing, since you are showing different paid amounts. I'm guessing this means these are two separate examples, and not intended to relate to each other. This gets hard to understand when you are showing different sets of data that appear close, but aren't. It is better to show the data in a table (as DDL) and the expected results, as well as the actual results. Testing makes this easier.

    You do have issues here in that you have numbers, which you want to sum, and then a "N/A" value. Are these all string values in your table? It helps to have DDL

    Also, I'm not sure what you mean by align. Do you expect only a single value from the table for a month?

    If I do this for setup, it's easier to see.

    Your code is joining on fields that could be duplicated, so it's possible you won't get the correct values. What is helpful is for you to explain what you want to return from data with multiple rows in one table or the other for the same month.

    CREATE TABLE Payments_A

    ( [Month] VARCHAR(10),

    SerialNumber VARCHAR(20),

    DateBegin DATETIME2,

    DateEnd DATETIME2,

    paid MONEY

    );

    GO

    CREATE TABLE Payments_B

    ( [Month] VARCHAR(10),

    SerialNumber VARCHAR(20),

    DateBegin DATETIME2,

    DateEnd DATETIME2,

    paid MONEY

    );

    INSERT dbo.Payments_A

    ( [Month], SerialNumber, DateBegin, DateEnd, paid)

    VALUES

    ( 'Mar-15', '0000000000001', '3/16/2015 0:00', '4/10/2015 0:00', 5000.01),

    ( 'Apr-15', '0000000000001', '4/7/2015 0:00' , '4/13/2015 0:00', 0),

    ( 'Apr-15', '0000000000001', '4/10/2015 0:00', '4/30/2015 0:00', 15000.00);

    INSERT dbo.Payments_B

    ( [Month], SerialNumber, DateBegin, DateEnd, paid)

    VALUES

    ( 'Mar-15', '0000000000001', '3/16/2015 0:00', '4/10/2015 0:00', 7000.02),

    ( 'Apr-15', '0000000000001', '4/7/2015 0:00', '4/13/2015 0:00', 3000),

    ( 'Apr-15', '0000000000001', '4/10/2015 0:00', '4/30/2015 0:00', 20000.00);

    If you want to do this with testing, it's more repeatable. Here's a test for you

    USE [Sandbox]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- Comments here are associated with the test.

    -- For test case examples, see: http://tsqlt.org/user-guide/tsqlt-tutorial/

    ALTER PROCEDURE [ExampleTests].[test [CheckPaymentAmounts]]]

    AS

    BEGIN

    -- Setup table a

    EXEC tSQLt.FakeTable @TableName = 'Payments_A';

    -- SQL Prompt formatting off

    INSERT dbo.Payments_A

    ( [Month], SerialNumber, DateBegin, DateEnd, paid)

    VALUES

    ( 'Mar-15', '0000000000001', '3/16/2015 0:00', '4/10/2015 0:00', 5000.01),

    ( 'Apr-15', '0000000000001', '4/7/2015 0:00' , '4/13/2015 0:00', 0),

    ( 'Apr-15', '0000000000001', '4/10/2015 0:00', '4/30/2015 0:00', 15000.00);

    -- Setup table b

    EXEC tSQLt.FakeTable @TableName = 'Payments_B';

    INSERT dbo.Payments_B

    ( [Month], SerialNumber, DateBegin, DateEnd, paid)

    VALUES

    ( 'Mar-15', '0000000000001', '3/16/2015 0:00', '4/10/2015 0:00', 7000.02),

    ( 'Apr-15', '0000000000001', '4/7/2015 0:00', '4/13/2015 0:00', 3000),

    ( 'Apr-15', '0000000000001', '4/10/2015 0:00', '4/30/2015 0:00', 20000.00);

    -- SQL Prompt formatting on

    SELECT * FROM dbo.Payments_B

    -- create expected table

    CREATE TABLE #Expected

    ( Month_A VARCHAR(10),

    Paid_A MONEY,

    Month_B VARCHAR(10),

    Paid_B MONEY

    );

    INSERT #Expected

    ( Month_A, Paid_A, Month_B, Paid_B )

    VALUES

    ( 'Mar-15', 5000.01, 'Mar-15', 7000.02),

    ( 'Apr-15', 15000.00, 'Apr-15', 23000.00);

    SELECT

    'Month_A' = A1.[Month],

    'Paid_A' = SUM(A1.paid),

    'Month_B' = B1.[Month],

    'Paid_B' = SUM(B1.paid)

    INTO #Actual

    FROM

    dbo.Payments_A AS A1

    LEFT OUTER JOIN dbo.Payments_B AS B1

    ON A1.SerialNumber = B1.SerialNumber

    AND A1.[Month] = B1.[Month]

    GROUP BY

    A1.[Month],

    B1.[Month];

    EXEC tSQLt.AssertEqualsTable

    @Expected = N'#Expected',

    @Actual = N'#Actual',

    @FailMsg = N'The tables are not equal';

    END;

  • Well, I would I like to apologize if I was not clear in communicating my inquiry.

    I have two different tables that consist of similar data elements. The paid amounts are different due to nuances in the data. This is evident when we have a datebegin field that does not attribute to the same month between Table A and Table B. This causes a discrepancy since I'm joining on the serial number and the month fields.

    Let's take a look at table A and table B from my example. Table B has multiple rows, ideally, I need the paid amount from table B when summed to equal to $38,786.82 from table A. Table A has a datebegin of 3/16/2015 0:00 through dateend 4/10/2015 0:00. Table B only has the month of Mar-15 that matches when joined so it only accounts for dollar amounts $8,838.13 and $20.30, it should be picking up the month of Apr-15 with the dollar amount of $29,928.39.

    Ultimately, the date disconnect between the two tables is the source of my problems.

    The "N/A" should actually be a null. I tagged nulls values with an "N/A", it should have been left as null.

    'Also, I'm not sure what you mean by align. Do you expect only a single value from the table for a month?' Align means a match on the serial number and month field. Yes, I would expect only a single value from the table for a month.

    Ultimately, the date disconnect between the two tables is the source of my problems.

    Steve Jones - SSC Editor (12/31/2016)


    Your example is a bit confusing, since you are showing different paid amounts. I'm guessing this means these are two separate examples, and not intended to relate to each other. This gets hard to understand when you are showing different sets of data that appear close, but aren't. It is better to show the data in a table (as DDL) and the expected results, as well as the actual results. Testing makes this easier.

    You do have issues here in that you have numbers, which you want to sum, and then a "N/A" value. Are these all string values in your table? It helps to have DDL

    Also, I'm not sure what you mean by align. Do you expect only a single value from the table for a month?

    If I do this for setup, it's easier to see.

    Your code is joining on fields that could be duplicated, so it's possible you won't get the correct values. What is helpful is for you to explain what you want to return from data with multiple rows in one table or the other for the same month.

    CREATE TABLE Payments_A

    ( [Month] VARCHAR(10),

    SerialNumber VARCHAR(20),

    DateBegin DATETIME2,

    DateEnd DATETIME2,

    paid MONEY

    );

    GO

    CREATE TABLE Payments_B

    ( [Month] VARCHAR(10),

    SerialNumber VARCHAR(20),

    DateBegin DATETIME2,

    DateEnd DATETIME2,

    paid MONEY

    );

    INSERT dbo.Payments_A

    ( [Month], SerialNumber, DateBegin, DateEnd, paid)

    VALUES

    ( 'Mar-15', '0000000000001', '3/16/2015 0:00', '4/10/2015 0:00', 5000.01),

    ( 'Apr-15', '0000000000001', '4/7/2015 0:00' , '4/13/2015 0:00', 0),

    ( 'Apr-15', '0000000000001', '4/10/2015 0:00', '4/30/2015 0:00', 15000.00);

    INSERT dbo.Payments_B

    ( [Month], SerialNumber, DateBegin, DateEnd, paid)

    VALUES

    ( 'Mar-15', '0000000000001', '3/16/2015 0:00', '4/10/2015 0:00', 7000.02),

    ( 'Apr-15', '0000000000001', '4/7/2015 0:00', '4/13/2015 0:00', 3000),

    ( 'Apr-15', '0000000000001', '4/10/2015 0:00', '4/30/2015 0:00', 20000.00);

    If you want to do this with testing, it's more repeatable. Here's a test for you

    USE [Sandbox]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- Comments here are associated with the test.

    -- For test case examples, see: http://tsqlt.org/user-guide/tsqlt-tutorial/

    ALTER PROCEDURE [ExampleTests].[test [CheckPaymentAmounts]]]

    AS

    BEGIN

    -- Setup table a

    EXEC tSQLt.FakeTable @TableName = 'Payments_A';

    -- SQL Prompt formatting off

    INSERT dbo.Payments_A

    ( [Month], SerialNumber, DateBegin, DateEnd, paid)

    VALUES

    ( 'Mar-15', '0000000000001', '3/16/2015 0:00', '4/10/2015 0:00', 5000.01),

    ( 'Apr-15', '0000000000001', '4/7/2015 0:00' , '4/13/2015 0:00', 0),

    ( 'Apr-15', '0000000000001', '4/10/2015 0:00', '4/30/2015 0:00', 15000.00);

    -- Setup table b

    EXEC tSQLt.FakeTable @TableName = 'Payments_B';

    INSERT dbo.Payments_B

    ( [Month], SerialNumber, DateBegin, DateEnd, paid)

    VALUES

    ( 'Mar-15', '0000000000001', '3/16/2015 0:00', '4/10/2015 0:00', 7000.02),

    ( 'Apr-15', '0000000000001', '4/7/2015 0:00', '4/13/2015 0:00', 3000),

    ( 'Apr-15', '0000000000001', '4/10/2015 0:00', '4/30/2015 0:00', 20000.00);

    -- SQL Prompt formatting on

    SELECT * FROM dbo.Payments_B

    -- create expected table

    CREATE TABLE #Expected

    ( Month_A VARCHAR(10),

    Paid_A MONEY,

    Month_B VARCHAR(10),

    Paid_B MONEY

    );

    INSERT #Expected

    ( Month_A, Paid_A, Month_B, Paid_B )

    VALUES

    ( 'Mar-15', 5000.01, 'Mar-15', 7000.02),

    ( 'Apr-15', 15000.00, 'Apr-15', 23000.00);

    SELECT

    'Month_A' = A1.[Month],

    'Paid_A' = SUM(A1.paid),

    'Month_B' = B1.[Month],

    'Paid_B' = SUM(B1.paid)

    INTO #Actual

    FROM

    dbo.Payments_A AS A1

    LEFT OUTER JOIN dbo.Payments_B AS B1

    ON A1.SerialNumber = B1.SerialNumber

    AND A1.[Month] = B1.[Month]

    GROUP BY

    A1.[Month],

    B1.[Month];

    EXEC tSQLt.AssertEqualsTable

    @Expected = N'#Expected',

    @Actual = N'#Actual',

    @FailMsg = N'The tables are not equal';

    END;

  • What are the rules for matching then? Do you need to somehow keep looking for payments to a serial number and month until they are equivalent in the two tables? That's a complex query here. In 2012+, with Window functions, it would be easier, but still problematic.

    I think you need a better set of sample data, and again, need to show which results you want to have matching in a couple of samples. What you really mean here isn't data alignment, but finding out which rows in one table will sum to the value in another table.

    You have a null in Table_a, what does that mean? Is there any matching here? Or are you trying to sum the values in Table_a for a month and then find this rows in Table_b will produce that sum from the matching or next month?

  • Briceston (12/30/2016)


    Hi all,

    I have two tables that I'm trying to reconcile the dollar amounts. However, I'm encountering some issues when the data does not align perfectly based on the fields that I'm using to join the tables.

    The disparity is due to when the datebegin date does not align to the attributing "month" field, and when I join on the month field there is data loss. insight is needed on how I can alter my code to align the data to make sure I capture the total amount accurately from both tables.

    My expected query result is to have $38,786.82 from table A and $38,786.82 from table B.

    I'm getting $8,858.43 for table B.

    Your insight is greatly appreciated, thanks.

    --Problematic aligment in table

    Table A

    MonthSerial NumberDatebeginDateendPaid

    Mar-1500000000000013/16/2015 0:004/10/2015 0:00 $38,786.82

    Apr-1500000000000014/7/2015 0:004/13/2015 0:00#N/A

    Apr-1500000000000014/10/2015 0:004/30/2015 0:00 $35,600.00

    Table B

    MonthSerial NumberDatebeginDateendPaid

    Mar-1500000000000014/7/2015 0:004/10/2015 0:00 $8,838.13

    Mar-1500000000000014/7/2015 0:004/10/2015 0:00 $20.30

    Apr-1500000000000014/7/2015 0:004/10/2015 0:00 $29,928.39

    Apr-1500000000000014/10/2015 0:004/30/2015 0:00 $35,600.00

    --Perfect alignment in table:

    Table A

    MonthSerial NumberDatebeginDateendPaid

    Mar-1500000000000013/20/2015 0:003/23/2015 0:00 $10,721.14

    Table B

    MonthSerial NumberDatebeginDateendPaid

    Mar-1500000000000013/20/2015 0:003/23/2015 0:00 $10,721.14

    query:

    select

    A1.Month,

    sum(A1.Paid)PaidAmount,

    B1.Month,

    sum(B1.Paid)PaidAmount

    from Table A as A1

    left outer join TableB as B1 on

    A1.Serial Number = B1.Serial Number and

    A1.Month = B1. Month

    Group by

    A1.Month,

    B1.Month

    I can see a LOT of potential problems with this, but I think the solution lies in understanding exactly what the Month field is supposed to actually represent in each table. As an example, one might conclude that a value of Mar-15 means the calendar month of March of 2015. However, it could also mean March 16th, 2015 through April 15th, 2015, or any of a number of similar constructions. There might even be overlap, which would be a serious problem, but lets assume that that particular bug-a-boo is not present here. Once you understand that meaning for both tables, then you need to know the exact detailed meaning of each of the two date fields. What does the Datebegin field actually represent in each table, and what does the Dateend field represent in each? With this information, it MAY be possible to place one or more of those dates into an appropriate range by generating a calendar table to represent all the possible dates that should be included in any given "month", and then matching one or more of those date fields to that calendar table, but again, understanding the data and what it actually represents is the key. It's also possible that you've stumbled into someone's "mistake", and that the divergence of the data in these tables should NOT exist. Find out the info on what the data represents and get back to us...

    Edit: small typo.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Sorry for the delayed response.

    My inquiry can be disregarded the issue is with the underlying data.

    Thanks again for chiming in, it's appreciated.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply