August 18, 2009 at 1:22 pm
Hi Pals,
Need urgent help.
I have below source table from Excl fle.
drop table test
create table test
(dist varchar(10),
school varchar(10),
history varchar(100),
science varchar(100),
commerce varchar(100)
)
DELETE from test
GO
insert test(dist,school,history,science,commerce) values('D1','S1','100','400',null)
insert test(dist,school,history,science,commerce) values('D1','S1','300','504','305')
insert test(dist,school,history,science,commerce) values('D1','S1','358','650',null)
GO
update statistics test
GO
select * from test
dist schoolhistorysciencecommerce
D1 S1100400
D1 S1300504305
D1 S1358650
clearly from the above output it is clear the 3 students have written "History" exam
3 students written science exam but only 1 student has written commerce exam.
Note : Looking at the source data, i can say total 3 students have written the exams
and if iMart_Graduation take the averages of each subject i will get the below results.
After pivoting also i need to able to get back the total no of students.
and if iMart_Graduation calculate the averages then i should get back the original values.
distschoolhistorysciencecommerce
D1S1100400
D1S1300504305
D1S1358650
---- --- ---
252.666518 305
----- --- ----
-- After unpivoting on 3 subjects, am storing data into table "stg"
Hi Pals,
Need help!
create table stg
(dist varchar(10),
school varchar(10),
subjct varchar(50),
student_cnt int,
score varchar(10)
)
insert into stg
select dist,
school,
subjct,
0 as student_cnt,
score
from (select dist,school,case when history IS null then 0 else history end as history ,case when science IS null then 0 else science end as science, case when commerce is null then 0 else commerce end as commerce from test) p
unpivot
(score for subjct in (history,science,commerce)
) AS Unpvt
select * from stg
distschoolsubjct student_cntscore
D1S1history0100
D1S1science0400
D1S1commerce00
D1S1history0300
D1S1science0504
D1S1commerce0305
D1S1history0358
D1S1science0650
D1S1commerce00
now from the staging am pulling data to dataware house as follows.
drop table trg
create table trg
(dist varchar(10),
school varchar(10),
subjct varchar(20),
student_cnt int,
avg_score float
)
truncate table trg
insert into trg
select dist,
school,
subjct,
COUNT(*) as student_cnt,
AVG(convert(decimal(9,2),score)) avg_score
from stg
group by dist,school,subjct
select * from trg
distschoolsubjctstudent_cntavg_score
D1S1commerce3101.666666666667
D1S1history3252.666666666667
D1S1science3518
Original Values from Source
distschoolhistorysciencecommerce
D1S1100400
D1S1300504305
D1S1358650
---- --- ---
252.666518 305
----- --- ----
It is very much clear above 2 result sets that , avg for history , science is correct but the average of commerce is wrong.It has to be 305/1 which is 305. instead it is performing 305/3 =>101.6666
But using this data i can get back by no of students taken the exams
i.e
select SUM(student_cnt)/3 from trg
--3
but my average is wrong!!!!!!!!!!
For this purpose, what i have done is, am ignoring the students not given the respective exam so that my averages will always be corrct.
This is what is have done
truncate table stg
insert into stg
select * from
(
select dist,
school,
subjct,
0 as student_cnt,
score
from (select dist,school,history,science,commerce from test) p
unpivot
(score for subjct in (history,science,commerce)
) AS Unpvt
) a where len(a.score)>0
-- 9 recs
truncate table trg
select * from stg
/*
distschoolsubjctstudent_cntscore
D1S1history0100
D1S1science0400
D1S1history0300
D1S1science0504
D1S1commerce0305
D1S1history0358
D1S1science0650
*/
-- only 7 recs
insert into trg
select dist,
school,
subjct,
COUNT(*) as student_cnt,
AVG(convert(decimal(9,2),score)) avg_score
from stg
group by dist,school,subjct
select * from trg
distschoolsubjctstudent_cntavg_score
D1S1commerce1305
D1S1history3252.666666
D1S1science3518
This time am getting the exact values
Original Values from Source
distschoolhistorysciencecommerce
D1S1100400
D1S1300504305
D1S1358650
---- --- ---
252.666518 305
----- --- ----
But now again problem, using this logic am able to get the Averages correctly but now
suppose i want to display the no of students attended for the Exam i.e. i need to get "3" as total.
How to get it???????????
Please suggest me the correct way so that i can get no of students from trg as well as averages correct.
Thanks in advance!
August 18, 2009 at 2:58 pm
mahesh.vsp (8/18/2009)
Hi Pals,
First, we don't need a history of the universe to help you solve your problem. You could have easily started with the unpivoted table rather than from the very beginning.
select dist,
school,
subjct,
0 as student_cnt,
score
from (select dist,school,case when history IS null then 0 else history end as history ,case when science IS null then 0 else science end as science, case when commerce is null then 0 else commerce end as commerce from test) p
unpivot
(score for subjct in (history,science,commerce)
) AS Unpvt
There is a shorthand way of implementing your CASE statements.
SELECT dist, school, IsNull(history, 0) AS history
-- etc., etc.
HOWEVER, you are much better off if you simply forgo the Null treatment, because the UNPIVOT will automatically remove rows with a Null Score, which is what you want.
I prefer having some sort of student ID available, because you can do a distinct count if necessary. It's not necessary here.
select dist,
school,
subjct,
COUNT(*) as student_cnt,
AVG(convert(decimal(9,2),score)) avg_score
from stg
group by dist,school,subjct
With the above corrections made, you can simply do the following.
SELECT dist, school, subjct, Count(studentID), Avg(score)
FROM stg
GROUP BY dist, school, subjct
And just to put everything together. NOTE: I used a CTE instead of a temp table for the staging table and I changed the datatype of the "score" columns to integer.
DROP TABLE #test
create table #test
(dist varchar(10),
school varchar(10),
history int,
science int,
commerce int
)
GO
insert #test(dist,school,history,science,commerce) values('D1','S1','100','400',null)
insert #test(dist,school,history,science,commerce) values('D1','S1','300','504','305')
insert #test(dist,school,history,science,commerce) values('D1','S1','358','650',null)
GO
SELECT *
FROM #test
;
WITH stg AS (
select dist,
school,
subjct,
studentID,
0 as student_cnt,
score
from (
select
dist
, school
, history
, science
, commerce
, Row_Number() OVER ( ORDER BY Dist ) AS StudentID
from #test) p
unpivot
(score for subjct in (history,science,commerce)
) AS Unpvt
)
SELECT dist, school, subjct, Count(studentID), Avg(score)
FROM stg
GROUP BY dist, school, subjct
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 18, 2009 at 6:50 pm
Hi Drew,
You are right on. using this method we can get the correct averages.
Now we are placed with final output as below. Right?
Using the below output, how can i get the total count of students who has the given the exam for which i need to get the values as 3.
distschoolsubjct(No column name)(No column name)
D1S1 commerce1 305
D1S1 history3 252
D1S1 science3 518
August 18, 2009 at 6:53 pm
i.e . irrespective of district and school. i need the overall students attended for the exam. From source file,it is clear we are having 3 records. Right? that i need to get it from the trg table.
August 18, 2009 at 7:07 pm
Why i am asking because suppose if i have multiple years data and i want to show the report as follows then,
how can i do that.
Assuming my final data in the trg table is as follows:
1991 D1 S1 commerce 1 305
1991 D1 S1 history 3 252
1991 D1 S1 science 3 518
1992 D1 S1 commerce 1 305
1992 D1 S1 history 3 252
1992 D1 S1 science 3 518
1993 D1 S1 commerce 1 305
1993 D1 S1 history 3 252
1993 D1 S1 science 3 518
1994 D1 S1 commerce 1 305
1994 D1 S1 history 3 252
1994 D1 S1 science 3 518
Final Report
------------
Year NofStudents History Science Commerce
1991 3 252 518 305
1992 3 252 518 305
1993 3 252 518 305
1994 3 252 518 305
Note : the values in history , science & commerce are averages.
August 18, 2009 at 7:18 pm
Now my ddl / dml will be as follows
DROP TABLE #test
create table #test
( yr int,
dist varchar(10),
school varchar(10),
history int,
science int,
commerce int
)
GO
insert #test(yr,dist,school,history,science,commerce) values(1991,'D1','S1','100','400',null)
insert #test(yr,dist,school,history,science,commerce) values(1991,'D1','S1','300','504','305')
insert #test(yr,dist,school,history,science,commerce) values(1991,'D1','S1','358','650',null)
insert #test(yr,dist,school,history,science,commerce) values(1992,'D1','S1','100','400',null)
insert #test(yr,dist,school,history,science,commerce) values(1992,'D1','S1','300','504','305')
insert #test(yr,dist,school,history,science,commerce) values(1992,'D1','S1','358','650',null)
insert #test(yr,dist,school,history,science,commerce) values(1993,'D1','S1','100','400',null)
insert #test(yr,dist,school,history,science,commerce) values(1993,'D1','S1','300','504','305')
insert #test(yr,dist,school,history,science,commerce) values(1993,'D1','S1','358','650',null)
insert #test(yr,dist,school,history,science,commerce) values(1994,'D1','S1','100','400',null)
insert #test(yr,dist,school,history,science,commerce) values(1994,'D1','S1','300','504','305')
insert #test(yr,dist,school,history,science,commerce) values(1994,'D1','S1','358','650',null)
GO
SELECT *
FROM #test
;
WITH stg AS (
select
yr,
dist,
school,
subjct,
studentID,
0 as student_cnt,
score
from (
select yr,
dist
, school
, history
, science
, commerce
, Row_Number() OVER ( ORDER BY Dist ) AS StudentID
from #test) p
unpivot
(score for subjct in (history,science,commerce)
) AS Unpvt
)
SELECT yr,dist, school, subjct, Count(studentID) as studentcnt, Avg(score) avgscore
FROM stg
GROUP BY yr,dist, school, subjct
August 18, 2009 at 8:34 pm
mahesh.vsp (8/18/2009)
Using the below output, how can i get the total count of students who has the given the exam for which i need to get the values as 3.
This is where that student ID that I mentioned comes in handy. You just do a separate query to return a distinct count of the student IDs. If you need to return these all in one query, then I would just use a subquery in the select statement to return the total count of students (by district and school if necessary).
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 18, 2009 at 8:53 pm
I modified the code to show you how this would be accomplished. Since your test data only has one school and one district, the numbers by school and by district and for all students are the same.
DROP TABLE #test
create table #test
(dist varchar(10),
school varchar(10),
history decimal(9,2),
science decimal(9,2),
commerce decimal(9,2)
)
GO
insert #test(dist,school,history,science,commerce) values('D1','S1','100','400',null)
insert #test(dist,school,history,science,commerce) values('D1','S1','300','504','305')
insert #test(dist,school,history,science,commerce) values('D1','S1','358','650',null)
GO
WITH stg AS (
select dist,
school,
subjct,
studentID,
score
from (
select
dist
, school
, history
, science
, commerce
, Row_Number() OVER ( ORDER BY Dist ) AS StudentID
from #test) p
unpivot
(score for subjct in (history,science,commerce)
) AS Unpvt
)
SELECT dist, school, subjct, Count(studentID), Avg(score)
, ( SELECT Count(DISTINCT studentID) AS BySchool FROM stg AS sub WHERE sub.dist = stg.dist and sub.school = stg.school ) AS BySchool
, ( SELECT Count(DISTINCT studentID) AS ByDistrict FROM stg AS sub WHERE sub.dist = stg.dist ) AS ByDistrict
, ( SELECT Count(DISTINCT studentID) AS TotalStudents FROM stg ) AS TotalStudents
FROM stg
GROUP BY dist, school, subjct
Unfortunately, the WITH ROLLUP and WITH CUBE can't handle distinct counts or this would be much easier. If using an Analysis Services cube was an option, it would also be easier to pull these numbers.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 18, 2009 at 9:24 pm
Drew,
Here is what i actual want!!!!
Am not getting how to achieve this dynamically.
I need your help!
The reason why i am stressing to get the src record count is,i have a requirement in which usng my "trg" table average values i must able to calculate the Avg of 3 subjects.
select sum(avgscore*student)/3
from trg
where subject in ('history','science','commerce')
I dnt want to hard the value 3 in my query, i want it to be dynamic
(which is nothing but the number of record count in my src tbl/flat
file)
Hope you understood my problem!!!!!
DROP TABLE #test
create table #test
(dist varchar(10),
school varchar(10),
history int,
science int,
commerce int
)
GO
insert #test(dist,school,history,science,commerce) values('D1','S1','100','400',null)
insert #test(dist,school,history,science,commerce) values('D1','S1','300','504','305')
insert #test(dist,school,history,science,commerce) values('D1','S1','358','650',null)
GO
select * from #test
This is what is there in my source, using this i want calculate Averages of HISTORY,SCIENCE,COMMERCE, HIST_SCIENCE_COMMERCE
distschoolhistorysciencecommerce
D1S1100400NULL
D1S1300504305
D1S1358650NULL
Something like this
distschoolhistorysciencecommerce HIST_SCIENCE_COMMERCE
D1S1100400NULL 100+400
D1S1300504305 300+504+305
D1S1358650NULL 358+650
select AVG(history),
AVG(science),
AVG(commerce),
AVG(isnull(history,0)+isnull(science,0)+isnull(commerce,0))
from #test
/* output values */
--252518305872
The average value of i.e (avg(history+scince+commerce)) i should get from the "trg" table.
IMP NOTE: the values in my trg table is already aggregated. represents AVERAGES of each subject.
August 18, 2009 at 9:39 pm
Drew,
I need this help from you.
It is urgent!!!
Thanks in advance.
August 18, 2009 at 9:58 pm
In the above scenario, if u can hold the district count, then it would solve the problem.
Please correct me if am wrong.
August 18, 2009 at 11:09 pm
I've given you more than enough information to finish this on your own. I suggest you look at what I've already written and try to come up with the results on your own.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 19, 2009 at 12:07 am
Drew,
I understand.
One last question from my side.
Do you want me to add new columns to my trg?
Am asking this becuase we dont have any permissions to add those new columns or else can we manage through query.
Again the problem is the staging table will be deleted for every fresh year!!!!!!!!
August 19, 2009 at 7:09 am
Drew,
Thank You so much.
I resolved it by adding new columns.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply