April 19, 2013 at 9:02 am
Hi,
My question is about calculating a computed column in SSIS. Unfortunately, the column is based on another column in the same query which, at its turn, is a subquery.
I'd prefer a generic answer, but I know some people can't answer unlesss they see the script, so here they are.
Table X1 (scores of volunteer per week and class):
CREATE TABLE [dbo].[tblPAScores](
[ID] [int] IDENTITY(1,1) NOT NULL,
[VolunteerID] [varchar](50) NOT NULL,
[NoWeek] [int] NULL,
[PA] [numeric](5, 2) NULL,
[Class] [varchar](50) NULL,
CONSTRAINT [PK_tblPAScores] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The view on previous table:
SELECT TOP (100) PERCENT VolunteerID, Class, NoWeek, PA,
(SELECT COUNT(*) AS Expr1
FROM dbo.tblPAScores
WHERE (PA > X1.PA) AND (Class = X1.Class) AND (NoWeek = X1.NoWeek)) + 1 AS PA1R,
(SELECT COUNT(VolunteerID) AS Expr1
FROM dbo.tblPAScores AS tblPAScores_1
WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) AS N
FROM dbo.tblPAScores AS X1
ORDER BY NoWeek, VolunteerID
• N: calculates the number of people for a week and class (column 'N')
• PA1R: calculates how many people have higher PA score than you in that week and class
In the next column, I'd need to divide N by PA1R, it's that simple. Why? The target is, in terms of PA, calculate what percentage of people does better than yourself (say, 30%), same as you (40%) and worse (say 30%) fior a given week and class.
To get this, I'd need dividing N by PA1R (1/N, (PA1R-1/N), etc).
Questions:
1) Is it possible to have a computed column based on previous columns which are subqueries?
2) If not, any alternative solution (pivot tables or the like)? Do I have to run a megasubquery (sort of the subquery1 / subquery 2)
Thanks in advance, a.
May 2, 2013 at 9:00 am
a_ud (4/19/2013)
Hi,Questions:
1) Is it possible to have a computed column based on previous columns which are subqueries?
Hi,
It is possible to have computed columns in a view. If you tried directly 1/N; PA1R/N then you probably had an error message.
create view vwPAScores
as
SELECT TOP (100) PERCENT VolunteerID, Class, NoWeek, PA,
(SELECT COUNT(*) AS Expr1
FROM dbo.tblPAScores
WHERE (PA > X1.PA) AND (Class = X1.Class) AND (NoWeek = X1.NoWeek)) + 1 AS PA1R,
(SELECT COUNT(VolunteerID) AS Expr1
FROM dbo.tblPAScores AS tblPAScores_1
WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) AS N,
1/(SELECT COUNT(VolunteerID) AS Expr1 FROM dbo.tblPAScores AS tblPAScores_1 WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) as [1_divide_N],
(((SELECT COUNT(*) AS Expr1 FROM dbo.tblPAScores WHERE (PA > X1.PA) AND (Class = X1.Class) AND (NoWeek = X1.NoWeek)) + 1)-1)/(SELECT COUNT(VolunteerID) AS Expr1 FROM dbo.tblPAScores AS tblPAScores_1 WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) as [PA1R-1_divide_N]
FROM dbo.tblPAScores AS X1
ORDER BY NoWeek, VolunteerID
Igor Micev,My blog: www.igormicev.com
May 3, 2013 at 9:07 am
a_ud (4/19/2013)
Hi,My question is about calculating a computed column in SSIS. Unfortunately, the column is based on another column in the same query which, at its turn, is a subquery.
I'd prefer a generic answer, but I know some people can't answer unlesss they see the script, so here they are.
Table X1 (scores of volunteer per week and class):
CREATE TABLE [dbo].[tblPAScores](
[ID] [int] IDENTITY(1,1) NOT NULL,
[VolunteerID] [varchar](50) NOT NULL,
[NoWeek] [int] NULL,
[PA] [numeric](5, 2) NULL,
[Class] [varchar](50) NULL,
CONSTRAINT [PK_tblPAScores] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The view on previous table:
SELECT TOP (100) PERCENT VolunteerID, Class, NoWeek, PA,
(SELECT COUNT(*) AS Expr1
FROM dbo.tblPAScores
WHERE (PA > X1.PA) AND (Class = X1.Class) AND (NoWeek = X1.NoWeek)) + 1 AS PA1R,
(SELECT COUNT(VolunteerID) AS Expr1
FROM dbo.tblPAScores AS tblPAScores_1
WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) AS N
FROM dbo.tblPAScores AS X1
ORDER BY NoWeek, VolunteerID
• N: calculates the number of people for a week and class (column 'N')
• PA1R: calculates how many people have higher PA score than you in that week and class
In the next column, I'd need to divide N by PA1R, it's that simple. Why? The target is, in terms of PA, calculate what percentage of people does better than yourself (say, 30%), same as you (40%) and worse (say 30%) fior a given week and class.
To get this, I'd need dividing N by PA1R (1/N, (PA1R-1/N), etc).
Questions:
1) Is it possible to have a computed column based on previous columns which are subqueries?
2) If not, any alternative solution (pivot tables or the like)? Do I have to run a megasubquery (sort of the subquery1 / subquery 2)
Thanks in advance, a.
Hi again, @a_ud,
By posting your code rather than asking for a generic answer, you've allowed me to give you what should be a better solution. Your view definition could be
SELECT VolunteerID, Class, NoWeek, PA,
Y.PA1R,
Z.N,
((Z.N-Y.PA1R)/(Z.N * 1.0)) * 100 AS pct
FROM dbo.tblPAScores AS X1
OUTER APPLY (SELECT COUNT(*) + 1 AS PA1R
FROM dbo.tblPAScores X2
WHERE (X2.PA > X1.PA) AND (X2.Class = X1.Class) AND (X2.NoWeek = X1.NoWeek)) Y
OUTER APPLY (SELECT COUNT(X3.VolunteerID) AS N
FROM dbo.tblPAScores X2
WHERE (X2.Class = X1.Class) AND (X2.NoWeek = X1.NoWeek)) Z
The correlated subqueries in the OUTER APPLYs generate columns for PA1R and N for each row, and you can use those columns as many different times and ways as you'd like in the SELECT columns.
One note - you'll see that I set up the calculation of the percent with better scores with a seemingly superfluous "* 1.0". Since COUNT() returns an int, N and PA1R will both be ints. Therefore, (N-PA1R)/N will return an int, which will always be 0. Adding the "* 1.0" introduces a decimal datatype into the mix so that all the int values will be implicitly converted to decimal values before the calculation is performed, allowing you to get values in the domain 0.0 - 1.0, as expected.
Also, you'll see that I removed the TOP(100) PERCENT . . . ORDER BY from my code. It's generally better practice to order the results in the SELECT statement that references your view. You can run into some unexpected performance problems when you write views to return ordered result sets. When the query optimizer expands the view definition into the query, the ORDER BY in the view may confound the optimizer and lead to an undesirable execution plan.
Jason Wolfkill
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply