January 14, 2015 at 11:11 am
I understand the concept of Union and Union All. Union joins two identical tables removing duplicates. Union all joins two identical tables and keeps the all of the data in both. I would like to join two table using 3 fields as checks for duplicates.
I have two tables. The data is divided into payroll halves. The first half is from the 1st day of the month through the 15th and the second half is from the 16th of the month through the last day of the month.
The first table is a daily table. Data gets loaded on a daily basis. 3 days after the 15th ,the first half data (the 1st through the 15th) gets loaded from the Daily Table (Daytab) into the History Table (Histtab) but also stays in the daily data table. The next three days (the 16th through the 18th) does not get loaded into the history table since it belongs in the second half. Again, 3 days after the last day of the month, daily data from the 15th through the last day of the month gets loaded into the history table.
Also when the data gets loaded into the history table, the date changes from the actual date in the daily table to the last day of the half in the hist table such as either the 15th or the 30th (or whatever the last day happens to be).
I need the tables combined with none of the data duplicated. What I was wondering is if there is a way to union the tables using 3 or 4 fields as the criteria for duplicates;.when Region, State,and Half are the same, the data from the history table remains and the daily data is not retained. If the data is not the same in both tables , then the data is retained in both.
Below are the examples of the history, daily and combined tables.
How can I do this? Thanks
Daily Table
SourceAcctg Year/MonthRegionStateHalfDateAmount
Daytab2014012WestCA1st12/5/2014$429.00
Daytab2014012WestID1st12/14/2015$255.00
Daytab2014012EastSC1st12/13/2014$358.00
Daytab2014012EastFL1st12/8/2014$287.00
Daytab2014012NorthMN1st12/7/2014$153.00
Daytab2014012WestUT1st12/6/2014$178.00
Daytab2014012NorthMN1st12/15/2014$178.00
Daytab2014012NorthWY2nd12/16/2014$170.00
Daytab2014012EastNH2nd12/18/2014$397.00
Daytab2014012WestOR2nd12/20/2014$346.00
Daytab2014012WestWA2nd12/20/2014$224.00
Daytab2014012EastOH2nd12/22/2014$291.00
Daytab2014012EastWV2nd12/18/2014$166.00
History Table
SourceAcctg Year/MonthRegionStateHalfDateAmount
Histtab2014012WestCA1st12/15/2014$469.00
Histtab2014012WestID1st12/15/2014$274.00
Histtab2014012EastSC1st12/15/2014$365.00
Histtab2014012EastFL1st12/15/2014$394.00
Histtab2014012NorthMN1st12/15/2014$185.00
Histtab2014012WestUT1st12/15/2014$178.00
Histtab2014012NorthMN1st12/15/2014$187.00
Combo Table
SourceAcctg Year/MonthRegionStateHalfDateAmount
Histtab2014012WestCA1st12/15/2014$469.00
Histtab2014012WestID1st12/15/2014$274.00
Histtab2014012EastSC1st12/15/2014$365.00
Histtab2014012EastFL1st12/15/2014$394.00
Histtab2014012NorthMN1st12/15/2014$185.00
Histtab2014012WestUT1st12/15/2014$178.00
Histtab2014012NorthMN1st12/15/2014$187.00
Daytab2014012NorthWY2nd12/16/2014$170.00
Daytab2014012EastNH2nd12/18/2014$397.00
Daytab2014012WestOR2nd12/20/2014$346.00
Daytab2014012WestWA2nd12/20/2014$224.00
Daytab2014012EastOH2nd12/22/2014$291.00
Daytab2014012EastWV2nd12/18/2014$166.00
January 14, 2015 at 11:48 am
Instead of doing a UNION operation, what about just a JOIN operation. You have three columns that are going to give you a relationship between the two tables. You only retrieve data from the two tables where those three columns match. If one, or more, of the tables has duplication in those three columns, then do a sub-query with a TOP 1 and an order by (avoid DISTINCT) to get just one of each value.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 14, 2015 at 12:05 pm
Why do you need to join these tables anyway? If data is stored in daily table use it from there.
If you want to use union select all from history table and limit daily table that you only get last 1-15 days.
January 14, 2015 at 12:16 pm
Good question. The numbers from the daily table does not tie with the history tabvle. There is a descrepancy between the tables that I cannot figure out. And I haven't been able to find out who would know.
The data in history table matches the posted data so I would like to use those numbers in the final report. The daily data is close so I want to use that during the month, then use the actual for the final.
Also there are other times I can use the code.
January 15, 2015 at 5:07 am
Maybe you could just select current month from daily table by filtering with date and rest from history table
WITH Daytab AS
(
SELECT 'Daytab' AS src, 2014012 AS acctg,'West' AS region,'UT' AS state, '1st' AS Half, '12/6/2014' AS date, 178.00 AS amount UNION ALL
SELECT 'Daytab' AS src, 2014012,'North','MN','1st','12/15/2014',178.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'North','WY','2nd','12/16/2014',170.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'East','NH','2nd','12/18/2014',397.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'North','MN','1st','1/15/2015',178.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'North','WY','2nd','1/16/2015',170.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'East','NH','2nd','1/18/2015',397.00
)
, Histtab AS
(
SELECT 'Histtab' AS src, 2014012 AS acctg,'East' AS region,'FL' AS state, '1st' AS Half, '12/15/2014' AS date, 394.00 AS amount UNION ALL
SELECT 'Histtab' AS src, 2014012,'North','MN','1st','12/15/2014',185.00 AS amount UNION ALL
SELECT 'Histtab' AS src, 2014012,'West','UT','1st','12/15/2014',178.00 AS amount UNION ALL
SELECT 'Histtab' AS src, 2014012,'North','MN','2nd','12/31/2014',187.00 AS amount
)
, CTE AS
(
-- select current month data
SELECT * FROM Daytab WHERE date > DATEADD(DAY, DATEDIFF(DAY, '1900-1-1', GETDATE()) - DAY(GETDATE()) + 1, '1900-1-1')
UNION ALL
-- select all of previous months
SELECT * FROM Histtab WHERE date < DATEADD(DAY, DATEDIFF(DAY, '1900-1-1', GETDATE()) - DAY(GETDATE()) + 1, '1900-1-1')
)
SELECT *FROM CTE;
January 15, 2015 at 6:12 pm
Using VPV's setup data:
WITH Daytab AS
(
SELECT 'Daytab' AS src, 2014012 AS acctg,'West' AS region,'UT' AS state, '1st' AS Half, '12/6/2014' AS date, 178.00 AS amount UNION ALL
SELECT 'Daytab' AS src, 2014012,'North','MN','1st','12/15/2014',178.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'North','WY','2nd','12/16/2014',170.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'East','NH','2nd','12/18/2014',397.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'North','MN','1st','1/15/2015',178.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'North','WY','2nd','1/16/2015',170.00 UNION ALL
SELECT 'Daytab' AS src, 2014012,'East','NH','2nd','1/18/2015',397.00
)
, Histtab AS
(
SELECT 'Histtab' AS src, 2014012 AS acctg,'East' AS region,'FL' AS state, '1st' AS Half, '12/15/2014' AS date, 394.00 AS amount UNION ALL
SELECT 'Histtab' AS src, 2014012,'North','MN','1st','12/15/2014',185.00 AS amount UNION ALL
SELECT 'Histtab' AS src, 2014012,'West','UT','1st','12/15/2014',178.00 AS amount UNION ALL
SELECT 'Histtab' AS src, 2014012,'North','MN','2nd','12/31/2014',187.00 AS amount
)
SELECT acctg, region, half, date, amount
FROM
(
SELECT acctg, region, half, date, amount
,rn=ROW_NUMBER() OVER (PARTITION BY region ORDER BY date DESC)
FROM
(
SELECT acctg, region, half, date, amount
FROM Daytab
UNION ALL
SELECT acctg, region, half, date, amount
FROM Histtab
) a
) a
WHERE rn=1;
I'm not sure this is exactly what you wanted. It would have been helpful to provide consumable sample data so I could have checked it against your desired result. But perhaps it gives you an idea for an alternative.
The problem statement where you "want to eliminate duplicates based on three columns" lends itself quite well to such a ROW_NUMBER() solution.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply