August 13, 2009 at 10:15 pm
Hi Friends,
I have an issue while calculating the avg on the below result.
Pl find the attached INSERT STMT script.
Below are the set of sql statements to cross check the values.
-- cross check queries after executing "insert.zip" sql's.
select AVG(convert(decimal(9,2),mrks)) from test
/* i will get some value 504.27 */
-- when i perform the GROUP BY and again if i find out the Avg then i shud get back above value i.e 504.27 but am getting 500.some value.
select AVG(convert(decimal(9,2),mrks)) as c1
from TEST
group by
yr
c1,
c2,
c3,
c4,
c5,
c6,
c7,
c8,
c9
-- The above output am placing into temp table "t001" and if i calculate the Avg() on the above record i sud get 504.27 which am not getting.
select AVG(convert(decimal(9,2),mrks)) as c1 into t001 from TEST
group by
yr
c1,
c2,
c3,
c4,
c5,
c6,
c7,
c8,
c9
select AVG(convert(decimal(9,2),c1)) from t001
-- am getting 500.
I should get 504.27
After performing group by below columns and calculating the avg, am
getting less value.Dont know whether am missing some decimals or whether some records are missing while performing the calculation.
Is any record is getting skipped due to spaces or empty strings??
I tried tried tried but am unable to find this. Can we check using any
cross join or any way is some records been missing in calculaton or is
it the problem with decimals?
Can anyone help me out in figuring it out why it is so?
Thanks in Advance.
August 13, 2009 at 10:47 pm
From the codes i am getting the following results
select AVG(cast(mrks as int)) AVER from TEST
AVER
504
select AVG(cast(mrks as decimal(9,2))) AVER from TEST
AVER
504.271448
August 13, 2009 at 11:06 pm
Arun,
After performing the below steps , i should get back 504.27. That is my requirement.
-- The above output am placing into temp table "t001" and if i calculate the Avg() on the above record i sud get 504.27 which am not getting.
select AVG(convert(decimal(9,2),mrks)) as c1 into t001 from TEST
group by
yr
c1,
c2,
c3,
c4,
c5,
c6,
c7,
c8,
c9
select AVG(convert(decimal(9,2),c1)) from t001
August 14, 2009 at 1:31 am
The TEST table having some the duplicate records
So that the variance occurs
create table #main
(
slno int,
col1 varchar(10),
col2 varchar(10)
)
insert into #main
select 1,'10','10'
union all
select 2,'20','20'
union all
select 2,'30','30'
union all
select 2,'30','30'
select avg(cast(col2 as decimal(9,2))) from #main
--22.500000
select avg(AVR) from (
select avg(cast(col2 as decimal(9,2)))AVR from #main
group by slno)as X
--18.333333
select avg(AVR) from (
select avg(cast(col2 as decimal(9,2)))AVR from #main
group by slno,col1)as X
--20.000000
August 14, 2009 at 7:23 pm
Thanks You very much Arun!
August 16, 2009 at 12:33 am
Hi Arun,
I have put up some meaningful data for you.
CREATE TABLE ExcelData
(yr int,
CollegeName varchar(20),
race varchar(20),
gender varchar(20),
history varchar(10),
science varchar(10),
commerce varchar(10)
)
insert into ExcelData
select 2003,'0156211','B','M',230,390,NULL
union all
select 2003,'0156211','B','M',230,390,NULL
union all
select 2003,'0176111','B','M',710,640,NULL
union all
select 2003,'0176111','B','M',710,640,NULL
union all
select 2003,'2222222','B','M',500,500,NULL
select * from ExcelData
yrCollegeNameracegenderhistorysciencecommerce
20030156211BM230390NULL
20030156211BM230390NULL -- duplicate rec
20030176111BM710640NULL
20030176111BM710640NULL -- duplicate rec
20032222222BM500500NULL
select COUNT(*) from ExcelData
--5
-- Calculate avg in ExcelData
select AVG(convert(decimal(9,2),history)) as history_avg,
AVG(convert(decimal(9,2),science)) as science_avg,
AVG(convert(decimal(9,2),commerce)) as commerce_avg
from ExcelData
history_avgscience_avgcommerce_avg
476.000000512.000000NULL
drop table [staging_tbl]
CREATE TABLE [dbo].[staging_tbl](
[currentSchoolYear] [varchar](10) NULL,
[Dist] [varchar](10) NULL,
school [varchar](10) NULL,
block_id [varchar](10) NULL,
race varchar(10),
gender varchar(10),
[Subject] [varchar](10) NULL,
[STUDENT_COUNT] [varchar](10) NULL,
marks [varchar](10) NULL
)
SELECT * FROM staging_tbl
-- INSERT DATA
INSERT INTO staging_tbl
select yr as currentSchoolYear,
substring(CollegeName,1,3) as Dist,
substring(CollegeName,4,2) as SCHOOL_ID,
substring(CollegeName,6,2) as block_id,
race,
gender,
[subject],
0 as STUDENT_COUNT, /* initially zero */
marks as marks
from (
select CollegeName,yr,race,gender,case when history IS NULL then '0' else history end as history,case when science IS null then '0' else science end as science,case when [commerce] IS null then '0' else commerce end as commerce from ExcelData) p
unpivot
(marks for [subject] in (history,science,commerce)
) AS Unpvt
SELECT * FROM staging_tbl
-- Extract data
-- HISTORY Average marks
SELECT * FROM staging_tbl WHERE Subject = 'HISTORY'
20030156211BMhistory0230
20030156211BMhistory0230
20030176111BMhistory0710
20030176111BMhistory0710
20032222222BMhistory0500
----
476.000000
-----
SELECT AVG(CONVERT(decimal(9,2),marks)) FROM staging_tbl where [subject] = 'history'
476.000000
-- Science Average marks
SELECT * FROM staging_tbl WHERE Subject = 'SCIENCE'
20030156211BMscience0390
20030156211BMscience0390
20030176111BMscience0640
20030176111BMscience0640
20032222222BMscience0500
SELECT AVG(CONVERT(decimal(9,2),marks)) FROM staging_tbl where [subject] = 'SCIENCE'
512.000000 -- compare with excel Yes! Match
-- COMMERCE marks
select * from staging_tbl where Subject='COMMERCE'
20030156211BMcommerce00
20030156211BMcommerce00
20030176111BMcommerce00
20030176111BMcommerce00
20032222222BMcommerce00
SELECT AVG(CONVERT(decimal(9,2),marks)) FROM staging_tbl where [subject] = 'COMMERCE'
0.000000
SELECT * FROM staging_tbl
select currentSchoolYear,
Dist,
school,
race,
gender,
[Subject],
COUNT(*) as student_cnt,
AVG(convert(decimal(9,2),marks)) as avg_score into TargetTbl /*fct tbl*/
from staging_tbl
group by currentSchoolYear,
Dist,
school,
race,
gender,
[Subject]
select * from TargetTbl order by [subject]
currentSchoolYearDistschoolracegenderSubjectstudent_cntavg_score
200301562BMcommerce20.000000
200301761BMcommerce20.000000
200322222BMcommerce10.000000
200322222BMhistory1500.000000
200301761BMhistory2710.000000
200301562BMhistory2230.000000
200301562BMscience2390.000000
200301761BMscience2640.000000
200322222BMscience1500.000000
--Now again Calculate Average subject wise and i need to get back the original values which i have caluclated in "Excel" and "Staging table"
select AVG(convert(decimal(9,2),avg_score)) from TargetTbl
where [subject] = 'history'
480.000000 Actaul Value: 476.000000
select AVG(convert(decimal(9,2),avg_score)) from TargetTbl
where [subject]= 'science'
510.000000 Actaul Value: 512.000000
select AVG(convert(decimal(9,2),avg_score)) from TargetTbl
where [subject]= 'commerce'
0.000000 Actaul Value: 0.00000
By the above 3 results it is clear that due to duplicate records there is a mismatch
in Actual average.
Is there any solution or work around so that i can get back my Averages as Actual ones.
Can we introduce any key or logically rownumber to fix this??????
Without changing the structure is there are good way to fix this???
Current the Key for the Target table is (currentSchoolYear,Dist,school,race,gender,subject).
Any help would be greatly appreciated!!!
Thanks in Advance.
August 16, 2009 at 8:43 am
Casting the number to a decimal with 2 digits after the decimal point FIRST is going to instroduce a lot of error. You really should be calculating the average first using floats, and then cast the ending result back to a decimal if need be.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
August 16, 2009 at 8:50 am
If the duplicates are not appropriate - why not remove them prior to averaging? You're in 2005 so a CTE preceding your average should also help drop the duplicates out. That said - it IS possible for 2 students to get the same grade, so would the "duplicates" not be appropriate?
;with ExcelDataCTE as (
select *, row_number over (Partition by yr, CollegeName, race, gender, history, science, commerce) RN
from ExcelData )
Select yr, CollegeName, race, gender, average(history), average(science), average(commerce)
from ExceldataCTE
where rn=1
Your average looks to be off because you are comparing a "weighted" average to a straight average. Both are valid measurements under the right circumstances - you just have to decide which one IS valid.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
August 16, 2009 at 11:24 am
Hi Miller,
Thanks for the input. Thanks for the CTE. Am still working for the work around.
As you said, both the averages are correct in differenct circumstances but the thing is, we need to get the exact average value while showing in Reports.
August 17, 2009 at 5:30 pm
Hi,
Statistically speaking, what you're doing doesn't make a lot of sense - when you introduce the group by variables you get an average value for each distinct set of the group by variables that exist in the table. Taking the average value of those values is basically meaningless. What you need to do to calculate your overall average from these individual averages is also calculate/store the counts of results as well, like:
SELECT A, B, C, AVG(Score) AS Aver, COUNT(Score) as Cnt
INTO #TempTable
FROM SomeTable
GROUP BY A, B, C
Then, to reproduce your overall average value of 504.27 (or whatever it is), you'll need to calculate it thus:
SELECT SUM(Aver*Cnt)/SUM(Cnt) AS OverallAver
FROM #TempTable
I think you'll find that this will match your original average (SELECT AVG(SCORE) AS Aver FROM SomeTable) within the margin of error that storing them with 2 decimal places will give you.
The only time taking the average of averages will give you the same result is if there are equal numbers of observations being used to calculate the individual averages when including the group by (i.e. if the all the Cnts calculated for each combination of A, B, and C above are equal).
Hope that helps,
Frank
August 17, 2009 at 7:51 pm
Hi Frank,
You are right on!
I have done the same thing yesterday!
SUM(T.STUDENT_COUNT*CONVERT(Decimal(9,2),SCORE))/SUM(T.STUDENT_COUNT) As AvgScore
The thing is, the nature of the client data is like that.
Thank You All.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply