Problem with Averages

  • Hi Friends,

    I have an issue while calculating the avg on the below result.

    Pl find the attached INSERT STMT script.

    Below are the set of sql statements to cross check the values.

    -- cross check queries after executing "insert.zip" sql's.

    select AVG(convert(decimal(9,2),mrks)) from test

    /* i will get some value 504.27 */

    -- when i perform the GROUP BY and again if i find out the Avg then i shud get back above value i.e 504.27 but am getting 500.some value.

    select AVG(convert(decimal(9,2),mrks)) as c1

    from TEST

    group by

    yr

    c1,

    c2,

    c3,

    c4,

    c5,

    c6,

    c7,

    c8,

    c9

    -- The above output am placing into temp table "t001" and if i calculate the Avg() on the above record i sud get 504.27 which am not getting.

    select AVG(convert(decimal(9,2),mrks)) as c1 into t001 from TEST

    group by

    yr

    c1,

    c2,

    c3,

    c4,

    c5,

    c6,

    c7,

    c8,

    c9

    select AVG(convert(decimal(9,2),c1)) from t001

    -- am getting 500.

    I should get 504.27

    After performing group by below columns and calculating the avg, am

    getting less value.Dont know whether am missing some decimals or whether some records are missing while performing the calculation.

    Is any record is getting skipped due to spaces or empty strings??

    I tried tried tried but am unable to find this. Can we check using any

    cross join or any way is some records been missing in calculaton or is

    it the problem with decimals?

    Can anyone help me out in figuring it out why it is so?

    Thanks in Advance.

  • From the codes i am getting the following results

    select AVG(cast(mrks as int)) AVER from TEST

    AVER

    504

    select AVG(cast(mrks as decimal(9,2))) AVER from TEST

    AVER

    504.271448

  • Arun,

    After performing the below steps , i should get back 504.27. That is my requirement.

    -- The above output am placing into temp table "t001" and if i calculate the Avg() on the above record i sud get 504.27 which am not getting.

    select AVG(convert(decimal(9,2),mrks)) as c1 into t001 from TEST

    group by

    yr

    c1,

    c2,

    c3,

    c4,

    c5,

    c6,

    c7,

    c8,

    c9

    select AVG(convert(decimal(9,2),c1)) from t001

  • The TEST table having some the duplicate records

    So that the variance occurs

    create table #main

    (

    slno int,

    col1 varchar(10),

    col2 varchar(10)

    )

    insert into #main

    select 1,'10','10'

    union all

    select 2,'20','20'

    union all

    select 2,'30','30'

    union all

    select 2,'30','30'

    select avg(cast(col2 as decimal(9,2))) from #main

    --22.500000

    select avg(AVR) from (

    select avg(cast(col2 as decimal(9,2)))AVR from #main

    group by slno)as X

    --18.333333

    select avg(AVR) from (

    select avg(cast(col2 as decimal(9,2)))AVR from #main

    group by slno,col1)as X

    --20.000000

  • Thanks You very much Arun!

  • Hi Arun,

    I have put up some meaningful data for you.

    CREATE TABLE ExcelData

    (yr int,

    CollegeName varchar(20),

    race varchar(20),

    gender varchar(20),

    history varchar(10),

    science varchar(10),

    commerce varchar(10)

    )

    insert into ExcelData

    select 2003,'0156211','B','M',230,390,NULL

    union all

    select 2003,'0156211','B','M',230,390,NULL

    union all

    select 2003,'0176111','B','M',710,640,NULL

    union all

    select 2003,'0176111','B','M',710,640,NULL

    union all

    select 2003,'2222222','B','M',500,500,NULL

    select * from ExcelData

    yrCollegeNameracegenderhistorysciencecommerce

    20030156211BM230390NULL

    20030156211BM230390NULL -- duplicate rec

    20030176111BM710640NULL

    20030176111BM710640NULL -- duplicate rec

    20032222222BM500500NULL

    select COUNT(*) from ExcelData

    --5

    -- Calculate avg in ExcelData

    select AVG(convert(decimal(9,2),history)) as history_avg,

    AVG(convert(decimal(9,2),science)) as science_avg,

    AVG(convert(decimal(9,2),commerce)) as commerce_avg

    from ExcelData

    history_avgscience_avgcommerce_avg

    476.000000512.000000NULL

    drop table [staging_tbl]

    CREATE TABLE [dbo].[staging_tbl](

    [currentSchoolYear] [varchar](10) NULL,

    [Dist] [varchar](10) NULL,

    school [varchar](10) NULL,

    block_id [varchar](10) NULL,

    race varchar(10),

    gender varchar(10),

    [Subject] [varchar](10) NULL,

    [STUDENT_COUNT] [varchar](10) NULL,

    marks [varchar](10) NULL

    )

    SELECT * FROM staging_tbl

    -- INSERT DATA

    INSERT INTO staging_tbl

    select yr as currentSchoolYear,

    substring(CollegeName,1,3) as Dist,

    substring(CollegeName,4,2) as SCHOOL_ID,

    substring(CollegeName,6,2) as block_id,

    race,

    gender,

    [subject],

    0 as STUDENT_COUNT, /* initially zero */

    marks as marks

    from (

    select CollegeName,yr,race,gender,case when history IS NULL then '0' else history end as history,case when science IS null then '0' else science end as science,case when [commerce] IS null then '0' else commerce end as commerce from ExcelData) p

    unpivot

    (marks for [subject] in (history,science,commerce)

    ) AS Unpvt

    SELECT * FROM staging_tbl

    -- Extract data

    -- HISTORY Average marks

    SELECT * FROM staging_tbl WHERE Subject = 'HISTORY'

    20030156211BMhistory0230

    20030156211BMhistory0230

    20030176111BMhistory0710

    20030176111BMhistory0710

    20032222222BMhistory0500

    ----

    476.000000

    -----

    SELECT AVG(CONVERT(decimal(9,2),marks)) FROM staging_tbl where [subject] = 'history'

    476.000000

    -- Science Average marks

    SELECT * FROM staging_tbl WHERE Subject = 'SCIENCE'

    20030156211BMscience0390

    20030156211BMscience0390

    20030176111BMscience0640

    20030176111BMscience0640

    20032222222BMscience0500

    SELECT AVG(CONVERT(decimal(9,2),marks)) FROM staging_tbl where [subject] = 'SCIENCE'

    512.000000 -- compare with excel Yes! Match

    -- COMMERCE marks

    select * from staging_tbl where Subject='COMMERCE'

    20030156211BMcommerce00

    20030156211BMcommerce00

    20030176111BMcommerce00

    20030176111BMcommerce00

    20032222222BMcommerce00

    SELECT AVG(CONVERT(decimal(9,2),marks)) FROM staging_tbl where [subject] = 'COMMERCE'

    0.000000

    SELECT * FROM staging_tbl

    select currentSchoolYear,

    Dist,

    school,

    race,

    gender,

    [Subject],

    COUNT(*) as student_cnt,

    AVG(convert(decimal(9,2),marks)) as avg_score into TargetTbl /*fct tbl*/

    from staging_tbl

    group by currentSchoolYear,

    Dist,

    school,

    race,

    gender,

    [Subject]

    select * from TargetTbl order by [subject]

    currentSchoolYearDistschoolracegenderSubjectstudent_cntavg_score

    200301562BMcommerce20.000000

    200301761BMcommerce20.000000

    200322222BMcommerce10.000000

    200322222BMhistory1500.000000

    200301761BMhistory2710.000000

    200301562BMhistory2230.000000

    200301562BMscience2390.000000

    200301761BMscience2640.000000

    200322222BMscience1500.000000

    --Now again Calculate Average subject wise and i need to get back the original values which i have caluclated in "Excel" and "Staging table"

    select AVG(convert(decimal(9,2),avg_score)) from TargetTbl

    where [subject] = 'history'

    480.000000 Actaul Value: 476.000000

    select AVG(convert(decimal(9,2),avg_score)) from TargetTbl

    where [subject]= 'science'

    510.000000 Actaul Value: 512.000000

    select AVG(convert(decimal(9,2),avg_score)) from TargetTbl

    where [subject]= 'commerce'

    0.000000 Actaul Value: 0.00000

    By the above 3 results it is clear that due to duplicate records there is a mismatch

    in Actual average.

    Is there any solution or work around so that i can get back my Averages as Actual ones.

    Can we introduce any key or logically rownumber to fix this??????

    Without changing the structure is there are good way to fix this???

    Current the Key for the Target table is (currentSchoolYear,Dist,school,race,gender,subject).

    Any help would be greatly appreciated!!!

    Thanks in Advance.

  • Casting the number to a decimal with 2 digits after the decimal point FIRST is going to instroduce a lot of error. You really should be calculating the average first using floats, and then cast the ending result back to a decimal if need be.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • If the duplicates are not appropriate - why not remove them prior to averaging? You're in 2005 so a CTE preceding your average should also help drop the duplicates out. That said - it IS possible for 2 students to get the same grade, so would the "duplicates" not be appropriate?

    ;with ExcelDataCTE as (

    select *, row_number over (Partition by yr, CollegeName, race, gender, history, science, commerce) RN

    from ExcelData )

    Select yr, CollegeName, race, gender, average(history), average(science), average(commerce)

    from ExceldataCTE

    where rn=1

    Your average looks to be off because you are comparing a "weighted" average to a straight average. Both are valid measurements under the right circumstances - you just have to decide which one IS valid.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hi Miller,

    Thanks for the input. Thanks for the CTE. Am still working for the work around.

    As you said, both the averages are correct in differenct circumstances but the thing is, we need to get the exact average value while showing in Reports.

  • Hi,

    Statistically speaking, what you're doing doesn't make a lot of sense - when you introduce the group by variables you get an average value for each distinct set of the group by variables that exist in the table. Taking the average value of those values is basically meaningless. What you need to do to calculate your overall average from these individual averages is also calculate/store the counts of results as well, like:

    SELECT A, B, C, AVG(Score) AS Aver, COUNT(Score) as Cnt

    INTO #TempTable

    FROM SomeTable

    GROUP BY A, B, C

    Then, to reproduce your overall average value of 504.27 (or whatever it is), you'll need to calculate it thus:

    SELECT SUM(Aver*Cnt)/SUM(Cnt) AS OverallAver

    FROM #TempTable

    I think you'll find that this will match your original average (SELECT AVG(SCORE) AS Aver FROM SomeTable) within the margin of error that storing them with 2 decimal places will give you.

    The only time taking the average of averages will give you the same result is if there are equal numbers of observations being used to calculate the individual averages when including the group by (i.e. if the all the Cnts calculated for each combination of A, B, and C above are equal).

    Hope that helps,

    Frank

  • Hi Frank,

    You are right on!

    I have done the same thing yesterday!

    SUM(T.STUDENT_COUNT*CONVERT(Decimal(9,2),SCORE))/SUM(T.STUDENT_COUNT) As AvgScore

    The thing is, the nature of the client data is like that.

    Thank You All.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply