Union

  • I understand the concept of Union and Union All. Union joins two identical tables removing duplicates. Union all joins two identical tables and keeps the all of the data in both. I would like to join two table using 3 fields as checks for duplicates.

    I have two tables. The data is divided into payroll halves. The first half is from the 1st day of the month through the 15th and the second half is from the 16th of the month through the last day of the month.

    The first table is a daily table. Data gets loaded on a daily basis. 3 days after the 15th ,the first half data (the 1st through the 15th) gets loaded from the Daily Table (Daytab) into the History Table (Histtab) but also stays in the daily data table. The next three days (the 16th through the 18th) does not get loaded into the history table since it belongs in the second half. Again, 3 days after the last day of the month, daily data from the 15th through the last day of the month gets loaded into the history table.

    Also when the data gets loaded into the history table, the date changes from the actual date in the daily table to the last day of the half in the hist table such as either the 15th or the 30th (or whatever the last day happens to be).

    I need the tables combined with none of the data duplicated. What I was wondering is if there is a way to union the tables using 3 or 4 fields as the criteria for duplicates;.when Region, State,and Half are the same, the data from the history table remains and the daily data is not retained. If the data is not the same in both tables , then the data is retained in both.

    Below are the examples of the history, daily and combined tables.

    How can I do this? Thanks

    Daily Table

    SourceAcctg Year/MonthRegionStateHalfDateAmount

    Daytab2014012WestCA1st12/5/2014$429.00

    Daytab2014012WestID1st12/14/2015$255.00

    Daytab2014012EastSC1st12/13/2014$358.00

    Daytab2014012EastFL1st12/8/2014$287.00

    Daytab2014012NorthMN1st12/7/2014$153.00

    Daytab2014012WestUT1st12/6/2014$178.00

    Daytab2014012NorthMN1st12/15/2014$178.00

    Daytab2014012NorthWY2nd12/16/2014$170.00

    Daytab2014012EastNH2nd12/18/2014$397.00

    Daytab2014012WestOR2nd12/20/2014$346.00

    Daytab2014012WestWA2nd12/20/2014$224.00

    Daytab2014012EastOH2nd12/22/2014$291.00

    Daytab2014012EastWV2nd12/18/2014$166.00

    History Table

    SourceAcctg Year/MonthRegionStateHalfDateAmount

    Histtab2014012WestCA1st12/15/2014$469.00

    Histtab2014012WestID1st12/15/2014$274.00

    Histtab2014012EastSC1st12/15/2014$365.00

    Histtab2014012EastFL1st12/15/2014$394.00

    Histtab2014012NorthMN1st12/15/2014$185.00

    Histtab2014012WestUT1st12/15/2014$178.00

    Histtab2014012NorthMN1st12/15/2014$187.00

    Combo Table

    SourceAcctg Year/MonthRegionStateHalfDateAmount

    Histtab2014012WestCA1st12/15/2014$469.00

    Histtab2014012WestID1st12/15/2014$274.00

    Histtab2014012EastSC1st12/15/2014$365.00

    Histtab2014012EastFL1st12/15/2014$394.00

    Histtab2014012NorthMN1st12/15/2014$185.00

    Histtab2014012WestUT1st12/15/2014$178.00

    Histtab2014012NorthMN1st12/15/2014$187.00

    Daytab2014012NorthWY2nd12/16/2014$170.00

    Daytab2014012EastNH2nd12/18/2014$397.00

    Daytab2014012WestOR2nd12/20/2014$346.00

    Daytab2014012WestWA2nd12/20/2014$224.00

    Daytab2014012EastOH2nd12/22/2014$291.00

    Daytab2014012EastWV2nd12/18/2014$166.00

  • Instead of doing a UNION operation, what about just a JOIN operation. You have three columns that are going to give you a relationship between the two tables. You only retrieve data from the two tables where those three columns match. If one, or more, of the tables has duplication in those three columns, then do a sub-query with a TOP 1 and an order by (avoid DISTINCT) to get just one of each value.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Why do you need to join these tables anyway? If data is stored in daily table use it from there.

    If you want to use union select all from history table and limit daily table that you only get last 1-15 days.

  • Good question. The numbers from the daily table does not tie with the history tabvle. There is a descrepancy between the tables that I cannot figure out. And I haven't been able to find out who would know.

    The data in history table matches the posted data so I would like to use those numbers in the final report. The daily data is close so I want to use that during the month, then use the actual for the final.

    Also there are other times I can use the code.

  • Maybe you could just select current month from daily table by filtering with date and rest from history table

    WITH Daytab AS

    (

    SELECT 'Daytab' AS src, 2014012 AS acctg,'West' AS region,'UT' AS state, '1st' AS Half, '12/6/2014' AS date, 178.00 AS amount UNION ALL

    SELECT 'Daytab' AS src, 2014012,'North','MN','1st','12/15/2014',178.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'North','WY','2nd','12/16/2014',170.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'East','NH','2nd','12/18/2014',397.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'North','MN','1st','1/15/2015',178.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'North','WY','2nd','1/16/2015',170.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'East','NH','2nd','1/18/2015',397.00

    )

    , Histtab AS

    (

    SELECT 'Histtab' AS src, 2014012 AS acctg,'East' AS region,'FL' AS state, '1st' AS Half, '12/15/2014' AS date, 394.00 AS amount UNION ALL

    SELECT 'Histtab' AS src, 2014012,'North','MN','1st','12/15/2014',185.00 AS amount UNION ALL

    SELECT 'Histtab' AS src, 2014012,'West','UT','1st','12/15/2014',178.00 AS amount UNION ALL

    SELECT 'Histtab' AS src, 2014012,'North','MN','2nd','12/31/2014',187.00 AS amount

    )

    , CTE AS

    (

    -- select current month data

    SELECT * FROM Daytab WHERE date > DATEADD(DAY, DATEDIFF(DAY, '1900-1-1', GETDATE()) - DAY(GETDATE()) + 1, '1900-1-1')

    UNION ALL

    -- select all of previous months

    SELECT * FROM Histtab WHERE date < DATEADD(DAY, DATEDIFF(DAY, '1900-1-1', GETDATE()) - DAY(GETDATE()) + 1, '1900-1-1')

    )

    SELECT *FROM CTE;

  • Using VPV's setup data:

    WITH Daytab AS

    (

    SELECT 'Daytab' AS src, 2014012 AS acctg,'West' AS region,'UT' AS state, '1st' AS Half, '12/6/2014' AS date, 178.00 AS amount UNION ALL

    SELECT 'Daytab' AS src, 2014012,'North','MN','1st','12/15/2014',178.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'North','WY','2nd','12/16/2014',170.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'East','NH','2nd','12/18/2014',397.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'North','MN','1st','1/15/2015',178.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'North','WY','2nd','1/16/2015',170.00 UNION ALL

    SELECT 'Daytab' AS src, 2014012,'East','NH','2nd','1/18/2015',397.00

    )

    , Histtab AS

    (

    SELECT 'Histtab' AS src, 2014012 AS acctg,'East' AS region,'FL' AS state, '1st' AS Half, '12/15/2014' AS date, 394.00 AS amount UNION ALL

    SELECT 'Histtab' AS src, 2014012,'North','MN','1st','12/15/2014',185.00 AS amount UNION ALL

    SELECT 'Histtab' AS src, 2014012,'West','UT','1st','12/15/2014',178.00 AS amount UNION ALL

    SELECT 'Histtab' AS src, 2014012,'North','MN','2nd','12/31/2014',187.00 AS amount

    )

    SELECT acctg, region, half, date, amount

    FROM

    (

    SELECT acctg, region, half, date, amount

    ,rn=ROW_NUMBER() OVER (PARTITION BY region ORDER BY date DESC)

    FROM

    (

    SELECT acctg, region, half, date, amount

    FROM Daytab

    UNION ALL

    SELECT acctg, region, half, date, amount

    FROM Histtab

    ) a

    ) a

    WHERE rn=1;

    I'm not sure this is exactly what you wanted. It would have been helpful to provide consumable sample data so I could have checked it against your desired result. But perhaps it gives you an idea for an alternative.

    The problem statement where you "want to eliminate duplicates based on three columns" lends itself quite well to such a ROW_NUMBER() solution.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply