November 14, 2013 at 10:34 am
Hello all,
I am looking to build a query with a result set that combines multiple queries to appear as one result set where I can then do a calculation in a separate column in that result set. I believe I will need to use a CTE but am not sure how I can build it with multiple queries.
Below is some sample data which has been refined for the concept I'm trying to achieve.
CREATE Table [dbo].[tempCustomerLedgerEntry]
(
[PostingDate] [DateTime] NOT NULL,
[SalesLCY] [Decimal] (38, 20) NOT NULL,
[SalespersonCode] [varchar] (20) NOT NULL,
[OrderNum][int]NOT NULL
)
Insert Into [dbo].[tempCustomerLedgerEntry]
(
[PostingDate],[SalesLCY],[SalespersonCode],[OrderNum]
)
VALUES
( '2012-01-09', '1000.09', 'KC', '1234')
,( '2012-01-19', '1109.05', 'KC', '1235')
,( '2012-02-14', '5500.98', 'KC', '1237')
,( '2012-03-18', '10542.88', 'KC', '1238')
,( '2013-01-19', '99000.86', 'KC', '1239')
,( '2013-02-15', '105000.89','KC', '1240')
Select * from [dbo].[tempCustomerLedgerEntry]
CREATE TABLE [dbo].[tempCreditMemo]
(
[PostingDate] [DATETIME] NOT NULL,
[CreditAmount] [decimal](38, 20) NOT NULL,
[CreditMemoNum] [Int] NOT NULL,
[OrderNum][Int] NOT NULL
)
Insert into [dbo].[tempCreditMemo]
(
[PostingDate],[CreditAmount],[CreditMemoNum], [OrderNum]
)
VALUES
( '2012-01-09', '5.01', '8821', '1234')
,( '2012-01-20', '10.00','8822', '1235')
,( '2012-02-14', '200.01','8823', '1237')
,( '2012-03-25', '300.10','8824', '1238')
,( '2013-01-29', '100.52','8825', '1239')
,( '2013-02-25', '10.99', '8826', '1240')
select * from [dbo].[tempCreditMemo]
CREATE TABLE [dbo].[tempSalesInvoice]
(
[PostingDate] [DATETIME] NOT NULL,
[SaleAmount] [decimal](38, 20) NOT NULL,
[SalesInvNum] [Int] NOT NULL,
[OrderNum][Int] NOT NULL
)
Insert into [dbo].[tempSalesInvoice]
(
[PostingDate],[SaleAmount],[SalesInvNum], [OrderNum]
)
VALUES
( '2012-01-09', '1500.01', '6624', '1234')
,( '2012-01-20', '1200.00','6625', '1235')
,( '2012-02-14', '5750.00','6626', '1237')
,( '2012-03-25', '10542.88','6627', '1238')
,( '2013-01-29', '99000.86','6628', '1239')
,( '2013-02-25', '105000.89', '6629', '1240')
select * from [dbo].[tempSalesInvoice]
select * from [dbo].[tempCreditMemo]
Select * from [dbo].[tempCustomerLedgerEntry]
/*
Truncate table [dbo].[tempCustomerLedgerEntry]
go
Truncate table [dbo].[tempCreditMemo]
go
Truncate table [dbo].[tempSalesInvoice]
go
*/
--Below are the result sets I am looking to combine
--I will also add an additional column the performs
--the calculation of Amount1-(Amount3-Amount2) as ProfitCalc
select Year(PostingDate) as PostingYear
, SUM(SalesLCY) as Amount1
from [dbo].[tempCustomerLedgerEntry]
group by Year(PostingDate)
order by Year(PostingDate) desc
select Year(PostingDate) as PostingYear
, SUM(CreditAmount) as Amount2
from [dbo].[tempCreditMemo]
group by Year(PostingDate)
order by Year(PostingDate) desc
select Year(PostingDate) as PostingYear
, SUM(SaleAmount) as Amount3
from [dbo].[tempSalesInvoice]
group by Year(PostingDate)
order by Year(PostingDate) desc
The final result would look like the following below.
Posting Year Amount1 Amount2 Amount3 ProfitCalc
2013 204001.75 111.51 204001.75 111.51
2012 18153 515.12 18992.89 -839.89
Please let me know your thoughts on the best way to run such a query. Thanks.
November 14, 2013 at 11:21 am
If I'm understanding this correctly, it looks like you probably want to create a single CTE that holds the contents of all 3 of your select statements. You can combine the results of multiple select statements with UNION ALL. Then you can simply do your final grouping and selecting from the CTE.
Does that make sense? Is that the type of solution you are looking for? Do you need to see example code? Please let me know. Thanks! 🙂
November 14, 2013 at 12:14 pm
You can actually just do the UNION ALL stuff and your calculations without a CTE too.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 14, 2013 at 1:41 pm
Yes that does make sense and I will try to create the code tonight. If I run into any problems I'll let you know.
November 14, 2013 at 1:42 pm
TheSQLGuru (11/14/2013)
You can actually just do the UNION ALL stuff and your calculations without a CTE too.
I did originally try with union but I was running into an issue. I'll give it another look soon. Thanks for the reply.
November 14, 2013 at 2:01 pm
Be sure you are clear on the difference between UNION, and UNION ALL. Give a holler if you need any more help. 🙂
November 14, 2013 at 9:12 pm
autoexcrement (11/14/2013)
Be sure you are clear on the difference between UNION, and UNION ALL. Give a holler if you need any more help. 🙂
In case you aren't UNION will do a SORT/DISTINCT under the covers to remove duplicates - a VERY expensive operation. UNION ALL will not do this. In cases where you can't possibly have dupes or you don't care about them, ALWAYS explicitly user UNION ALL!!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 15, 2013 at 10:03 am
UNIONing your queries won't quite get you to where you want to go.
What you want is often called a cross-tab, and here's a way to do it in T-SQL, using your DDL and sample data:
; WITH cte AS (
SELECT YEAR(PostingDate) AS PostingYear
, SUM(SalesLCY) AS Amount
, 1 AS sourceTable
FROM [dbo].[tempCustomerLedgerEntry]
GROUP BY YEAR(PostingDate)
UNION ALL
SELECT YEAR(PostingDate) AS PostingYear
, SUM(CreditAmount) AS Amount
, 2 AS sourceTable
FROM [dbo].[tempCreditMemo]
GROUP BY Year(PostingDate)
UNION ALL
SELECT Year(PostingDate) AS PostingYear
, SUM(SaleAmount) AS Amount
, 3 AS sourceTable
FROM [dbo].[tempSalesInvoice]
GROUP BY Year(PostingDate)
)
SELECT PostingYear
,MAX(CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END) AS Amount1
,MAX(CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END) AS Amount2
,MAX(CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END) AS Amount3
,MAX(CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END) - (MAX(CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END) - MAX(CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END)) AS ProfitCalc
FROM cte
GROUP BY PostingYear
ORDER BY PostingYear DESC
You'll have to do some conversion to decimal types with smaller scale (places to the right of the decimal point) if you don't want all those trailing zeros in the results, which leads to an important question:
Why does your database store what seem to be currency values with only two places to the right of the decimal as decimal(38,20)? Do you actually have values that have 20 significant digits to the right of the decimal point, or are these values all dollars and cents? You could store a value of six times the U.S. GDP for 2012 ($15,684 billion dollars) in dollars and cents as decimal(16, 2). The advantage of this (in addition to not ending up with a bunch of non-significant zeros to the right of the decimal point in the results of simple arithmetic) is that a decimal(16,2) value requires only 9 bytes of storage while a decimal(38,20) value requires 17 bytes. That's eight unnecessary bytes for every value, which adds up to a LOT of wasted disk space and memory!
Jason Wolfkill
November 16, 2013 at 11:01 pm
Thanks for posting WolfKillJ!
Your code worked really well. Correct me if I'm wrong about the logic behind it.
You create the CTE object to start. In each of the original queries you add a column named source table to act as a place holder to later be used in the final aggregate functions. Union All was chosen since we wanted all values and not just distinct values filtered. Then you close the CTE and select from it. In the select list you used PostingYear to later be grouped on. Then the Amount columns from each individual query are used in the MAX aggregate function. The Case statements choose the place holder column to pick the unique amount column and then alias that column to differentiate it in the result set. Then the last column is the calculation where the MAX aggregate is used with CASE to put in the business logic. Finally the group by postingyear and order by postingyear for sorting.
I am a little confused by the case statement. Why couldn't a normal select be used and then have the columns in the final group by?
To answer your questions about the DDL. The decimal places chosen was done over 10 years ago and I don't know why. this is a database I now manage and I can only attribute it to convenience for the DEV team or lack of thought to the design from a storage and performance perspective.
There are no values that I have come across which have 20 significant digits to the right of the decimal. We do work with multiple currencies other than USD but I've yet to see a real life need for it. Would it be a good idea to truncate 17 or 18 of the digits?
There are many other tables and columns that have this issue. Our ERP is Navision and its history goes back to being developed in the Denmark before Microsoft bought it.
It is heavily customizable but developers may not have control over the data types they create for new columns. One day I will try to convert some of them on the backend of our test database just to see if the front end will work. All of the points you make about the data type hit home. I have so many issues with performance and moving tables around to different filegroups just to accommodate the space issues. Problems with inefficient design are endemic to this database in almost every aspect you can imagine. Now I'm part of a team that is building a datawarehouse and I'm discovering a lot more of these issues. --End Rant!
Thanks again for your post! It was great to see a CTE in action with some real data.
wolfkillj (11/15/2013)
UNIONing your queries won't quite get you to where you want to go.What you want is often called a cross-tab, and here's a way to do it in T-SQL, using your DDL and sample data:
; WITH cte AS (
SELECT YEAR(PostingDate) AS PostingYear
, SUM(SalesLCY) AS Amount
, 1 AS sourceTable
FROM [dbo].[tempCustomerLedgerEntry]
GROUP BY YEAR(PostingDate)
UNION ALL
SELECT YEAR(PostingDate) AS PostingYear
, SUM(CreditAmount) AS Amount
, 2 AS sourceTable
FROM [dbo].[tempCreditMemo]
GROUP BY Year(PostingDate)
UNION ALL
SELECT Year(PostingDate) AS PostingYear
, SUM(SaleAmount) AS Amount
, 3 AS sourceTable
FROM [dbo].[tempSalesInvoice]
GROUP BY Year(PostingDate)
)
SELECT PostingYear
,MAX(CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END) AS Amount1
,MAX(CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END) AS Amount2
,MAX(CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END) AS Amount3
,MAX(CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END) - (MAX(CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END) - MAX(CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END)) AS ProfitCalc
FROM cte
GROUP BY PostingYear
ORDER BY PostingYear DESC
You'll have to do some conversion to decimal types with smaller scale (places to the right of the decimal point) if you don't want all those trailing zeros in the results, which leads to an important question:
Why does your database store what seem to be currency values with only two places to the right of the decimal as decimal(38,20)? Do you actually have values that have 20 significant digits to the right of the decimal point, or are these values all dollars and cents? You could store a value of six times the U.S. GDP for 2012 ($15,684 billion dollars) in dollars and cents as decimal(16, 2). The advantage of this (in addition to not ending up with a bunch of non-significant zeros to the right of the decimal point in the results of simple arithmetic) is that a decimal(16,2) value requires only 9 bytes of storage while a decimal(38,20) value requires 17 bytes. That's eight unnecessary bytes for every value, which adds up to a LOT of wasted disk space and memory!
November 18, 2013 at 8:45 am
kwoznica (11/16/2013)
Thanks for posting WolfKillJ!Your code worked really well. Correct me if I'm wrong about the logic behind it.
You create the CTE object to start. In each of the original queries you add a column named source table to act as a place holder to later be used in the final aggregate functions. Union All was chosen since we wanted all values and not just distinct values filtered. Then you close the CTE and select from it. In the select list you used PostingYear to later be grouped on. Then the Amount columns from each individual query are used in the MAX aggregate function. The Case statements choose the place holder column to pick the unique amount column and then alias that column to differentiate it in the result set. Then the last column is the calculation where the MAX aggregate is used with CASE to put in the business logic. Finally the group by postingyear and order by postingyear for sorting.
I am a little confused by the case statement. Why couldn't a normal select be used and then have the columns in the final group by?
To answer your questions about the DDL. The decimal places chosen was done over 10 years ago and I don't know why. this is a database I now manage and I can only attribute it to convenience for the DEV team or lack of thought to the design from a storage and performance perspective.
There are no values that I have come across which have 20 significant digits to the right of the decimal. We do work with multiple currencies other than USD but I've yet to see a real life need for it. Would it be a good idea to truncate 17 or 18 of the digits?
There are many other tables and columns that have this issue. Our ERP is Navision and its history goes back to being developed in the Denmark before Microsoft bought it.
It is heavily customizable but developers may not have control over the data types they create for new columns. One day I will try to convert some of them on the backend of our test database just to see if the front end will work. All of the points you make about the data type hit home. I have so many issues with performance and moving tables around to different filegroups just to accommodate the space issues. Problems with inefficient design are endemic to this database in almost every aspect you can imagine. Now I'm part of a team that is building a datawarehouse and I'm discovering a lot more of these issues. --End Rant!
Thanks again for your post! It was great to see a CTE in action with some real data.
Glad it works for you, kwoznica.
It may help you make sense of the logic in my query if you replace the SELECT . . . FROM cte GROUP BY PostingYear ORDER BY PostingYear
with a simple SELECT * FROM cte
and examine the results.
You'll see that you get a row for each unique PostingYear/sourceTable combination with the Amount column consisting of the SUM total for that PostingYear/sourceTable:
PostingYearAmountsourceTable
201218153.000000000000000000001
2013204001.750000000000000000001
2012515.120000000000000000002
2013111.510000000000000000002
201218992.890000000000000000003
2013204001.750000000000000000003
Since you need a single row per postingYear, you'll need to pivot these rows to columns. You could use the PIVOT relational operator, but I find the PIVOT syntax to be confusing and the concept itself limiting, which is why I prefer the MAX()-based cross-tab construct. This technique relies on the fact that T-SQL aggregate functions ignore NULLs. To see why the CASE expression is necessary, run this SELECT from the CTE:
SELECT PostingYear
,CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END AS Amount1
,CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END AS Amount2
,CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END AS Amount3
FROM cte
and examine the results:
PostingYearAmount1Amount2Amount3
201218153.00000000000000000000NULLNULL
2013204001.75000000000000000000NULLNULL
2012NULL515.12000000000000000000NULL
2013NULL111.51000000000000000000NULL
2012NULLNULL18992.89000000000000000000
2013NULLNULL204001.75000000000000000000
You'll see that there is only one non-NULL value for each Amount column per PostingYear. Grouping the results by PostingYear will give you a single row per PostingYear and taking the MAX() of the Amount columns will give you the one non-NULL value for each column, and of course, the ProfitCalc column just does the math on the same values. Et voila:
PostingYear Amount1 Amount2 Amount3 ProfitCalc
2013 204001.75000000000000000000 111.51000000000000000000204001.75000000000000000000 111.51000000000000000000
201218153.00000000000000000000515.1200000000000000000018992.89000000000000000000-324.77000000000000000000
Let me know if you still have questions. Sorry about the lousy formatting of the results - it's a pain trying to make them look right in this forum.
As an aside, I often break down complex T-SQL in this way - "unwrapping" the code in steps to examine intermediate result sets and the results of functions and expressions that are fed into other functions and expressions - when troubleshooting or just trying to understand the logic.
Jason Wolfkill
November 18, 2013 at 8:46 am
kwoznica (11/16/2013)
To answer your questions about the DDL. The decimal places chosen was done over 10 years ago and I don't know why. this is a database I now manage and I can only attribute it to convenience for the DEV team or lack of thought to the design from a storage and performance perspective.There are no values that I have come across which have 20 significant digits to the right of the decimal. We do work with multiple currencies other than USD but I've yet to see a real life need for it. Would it be a good idea to truncate 17 or 18 of the digits?
There are many other tables and columns that have this issue. Our ERP is Navision and its history goes back to being developed in the Denmark before Microsoft bought it.
It is heavily customizable but developers may not have control over the data types they create for new columns. One day I will try to convert some of them on the backend of our test database just to see if the front end will work. All of the points you make about the data type hit home. I have so many issues with performance and moving tables around to different filegroups just to accommodate the space issues. Problems with inefficient design are endemic to this database in almost every aspect you can imagine. Now I'm part of a team that is building a datawarehouse and I'm discovering a lot more of these issues. --End Rant!
Thanks again for your post! It was great to see a CTE in action with some real data.
One of the great things about building a data warehouse is that you can design it to use appropriate data types even where the underlying OLTP/ERP systems don't and manage the conversions in the ETL process. I'd encourage you to push as hard as you can to keep bad data architecture choices from your ERP system from creeping into the data warehouse. After all, the idea of a data warehouse is to improve reporting and analysis - replicating poor design from the underlying system frustrates this goal. I would also suggest that you push for some requirements analysis that might allow you to implement some even better design choices - for example, in many cases, dropping the fractional part of currency values makes no appreciable difference in aggregations, so unless someone really needs the fractional part, storing those values as integers can save even more storage and memory.
Jason Wolfkill
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply