Help required!!

  • Hi Pals,

    Need urgent help.

    I have below source table from Excl fle.

    drop table test

    create table test

    (dist varchar(10),

    school varchar(10),

    history varchar(100),

    science varchar(100),

    commerce varchar(100)

    )

    DELETE from test

    GO

    insert test(dist,school,history,science,commerce) values('D1','S1','100','400',null)

    insert test(dist,school,history,science,commerce) values('D1','S1','300','504','305')

    insert test(dist,school,history,science,commerce) values('D1','S1','358','650',null)

    GO

    update statistics test

    GO

    select * from test

    dist schoolhistorysciencecommerce

    D1 S1100400

    D1 S1300504305

    D1 S1358650

    clearly from the above output it is clear the 3 students have written "History" exam

    3 students written science exam but only 1 student has written commerce exam.

    Note : Looking at the source data, i can say total 3 students have written the exams

    and if iMart_Graduation take the averages of each subject i will get the below results.

    After pivoting also i need to able to get back the total no of students.

    and if iMart_Graduation calculate the averages then i should get back the original values.

    distschoolhistorysciencecommerce

    D1S1100400

    D1S1300504305

    D1S1358650

    ---- --- ---

    252.666518 305

    ----- --- ----

    -- After unpivoting on 3 subjects, am storing data into table "stg"

    Hi Pals,

    Need help!

    create table stg

    (dist varchar(10),

    school varchar(10),

    subjct varchar(50),

    student_cnt int,

    score varchar(10)

    )

    insert into stg

    select dist,

    school,

    subjct,

    0 as student_cnt,

    score

    from (select dist,school,case when history IS null then 0 else history end as history ,case when science IS null then 0 else science end as science, case when commerce is null then 0 else commerce end as commerce from test) p

    unpivot

    (score for subjct in (history,science,commerce)

    ) AS Unpvt

    select * from stg

    distschoolsubjct student_cntscore

    D1S1history0100

    D1S1science0400

    D1S1commerce00

    D1S1history0300

    D1S1science0504

    D1S1commerce0305

    D1S1history0358

    D1S1science0650

    D1S1commerce00

    now from the staging am pulling data to dataware house as follows.

    drop table trg

    create table trg

    (dist varchar(10),

    school varchar(10),

    subjct varchar(20),

    student_cnt int,

    avg_score float

    )

    truncate table trg

    insert into trg

    select dist,

    school,

    subjct,

    COUNT(*) as student_cnt,

    AVG(convert(decimal(9,2),score)) avg_score

    from stg

    group by dist,school,subjct

    select * from trg

    distschoolsubjctstudent_cntavg_score

    D1S1commerce3101.666666666667

    D1S1history3252.666666666667

    D1S1science3518

    Original Values from Source

    distschoolhistorysciencecommerce

    D1S1100400

    D1S1300504305

    D1S1358650

    ---- --- ---

    252.666518 305

    ----- --- ----

    It is very much clear above 2 result sets that , avg for history , science is correct but the average of commerce is wrong.It has to be 305/1 which is 305. instead it is performing 305/3 =>101.6666

    But using this data i can get back by no of students taken the exams

    i.e

    select SUM(student_cnt)/3 from trg

    --3

    but my average is wrong!!!!!!!!!!

    For this purpose, what i have done is, am ignoring the students not given the respective exam so that my averages will always be corrct.

    This is what is have done

    truncate table stg

    insert into stg

    select * from

    (

    select dist,

    school,

    subjct,

    0 as student_cnt,

    score

    from (select dist,school,history,science,commerce from test) p

    unpivot

    (score for subjct in (history,science,commerce)

    ) AS Unpvt

    ) a where len(a.score)>0

    -- 9 recs

    truncate table trg

    select * from stg

    /*

    distschoolsubjctstudent_cntscore

    D1S1history0100

    D1S1science0400

    D1S1history0300

    D1S1science0504

    D1S1commerce0305

    D1S1history0358

    D1S1science0650

    */

    -- only 7 recs

    insert into trg

    select dist,

    school,

    subjct,

    COUNT(*) as student_cnt,

    AVG(convert(decimal(9,2),score)) avg_score

    from stg

    group by dist,school,subjct

    select * from trg

    distschoolsubjctstudent_cntavg_score

    D1S1commerce1305

    D1S1history3252.666666

    D1S1science3518

    This time am getting the exact values

    Original Values from Source

    distschoolhistorysciencecommerce

    D1S1100400

    D1S1300504305

    D1S1358650

    ---- --- ---

    252.666518 305

    ----- --- ----

    But now again problem, using this logic am able to get the Averages correctly but now

    suppose i want to display the no of students attended for the Exam i.e. i need to get "3" as total.

    How to get it???????????

    Please suggest me the correct way so that i can get no of students from trg as well as averages correct.

    Thanks in advance!

  • mahesh.vsp (8/18/2009)


    Hi Pals,

    First, we don't need a history of the universe to help you solve your problem. You could have easily started with the unpivoted table rather than from the very beginning.

    select dist,

    school,

    subjct,

    0 as student_cnt,

    score

    from (select dist,school,case when history IS null then 0 else history end as history ,case when science IS null then 0 else science end as science, case when commerce is null then 0 else commerce end as commerce from test) p

    unpivot

    (score for subjct in (history,science,commerce)

    ) AS Unpvt

    There is a shorthand way of implementing your CASE statements.

    SELECT dist, school, IsNull(history, 0) AS history

    -- etc., etc.

    HOWEVER, you are much better off if you simply forgo the Null treatment, because the UNPIVOT will automatically remove rows with a Null Score, which is what you want.

    I prefer having some sort of student ID available, because you can do a distinct count if necessary. It's not necessary here.

    select dist,

    school,

    subjct,

    COUNT(*) as student_cnt,

    AVG(convert(decimal(9,2),score)) avg_score

    from stg

    group by dist,school,subjct

    With the above corrections made, you can simply do the following.

    SELECT dist, school, subjct, Count(studentID), Avg(score)

    FROM stg

    GROUP BY dist, school, subjct

    And just to put everything together. NOTE: I used a CTE instead of a temp table for the staging table and I changed the datatype of the "score" columns to integer.

    DROP TABLE #test

    create table #test

    (dist varchar(10),

    school varchar(10),

    history int,

    science int,

    commerce int

    )

    GO

    insert #test(dist,school,history,science,commerce) values('D1','S1','100','400',null)

    insert #test(dist,school,history,science,commerce) values('D1','S1','300','504','305')

    insert #test(dist,school,history,science,commerce) values('D1','S1','358','650',null)

    GO

    SELECT *

    FROM #test

    ;

    WITH stg AS (

    select dist,

    school,

    subjct,

    studentID,

    0 as student_cnt,

    score

    from (

    select

    dist

    , school

    , history

    , science

    , commerce

    , Row_Number() OVER ( ORDER BY Dist ) AS StudentID

    from #test) p

    unpivot

    (score for subjct in (history,science,commerce)

    ) AS Unpvt

    )

    SELECT dist, school, subjct, Count(studentID), Avg(score)

    FROM stg

    GROUP BY dist, school, subjct

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Hi Drew,

    You are right on. using this method we can get the correct averages.

    Now we are placed with final output as below. Right?

    Using the below output, how can i get the total count of students who has the given the exam for which i need to get the values as 3.

    distschoolsubjct(No column name)(No column name)

    D1S1 commerce1 305

    D1S1 history3 252

    D1S1 science3 518

  • i.e . irrespective of district and school. i need the overall students attended for the exam. From source file,it is clear we are having 3 records. Right? that i need to get it from the trg table.

  • Why i am asking because suppose if i have multiple years data and i want to show the report as follows then,

    how can i do that.

    Assuming my final data in the trg table is as follows:

    1991 D1 S1 commerce 1 305

    1991 D1 S1 history 3 252

    1991 D1 S1 science 3 518

    1992 D1 S1 commerce 1 305

    1992 D1 S1 history 3 252

    1992 D1 S1 science 3 518

    1993 D1 S1 commerce 1 305

    1993 D1 S1 history 3 252

    1993 D1 S1 science 3 518

    1994 D1 S1 commerce 1 305

    1994 D1 S1 history 3 252

    1994 D1 S1 science 3 518

    Final Report

    ------------

    Year NofStudents History Science Commerce

    1991 3 252 518 305

    1992 3 252 518 305

    1993 3 252 518 305

    1994 3 252 518 305

    Note : the values in history , science & commerce are averages.

  • Now my ddl / dml will be as follows

    DROP TABLE #test

    create table #test

    ( yr int,

    dist varchar(10),

    school varchar(10),

    history int,

    science int,

    commerce int

    )

    GO

    insert #test(yr,dist,school,history,science,commerce) values(1991,'D1','S1','100','400',null)

    insert #test(yr,dist,school,history,science,commerce) values(1991,'D1','S1','300','504','305')

    insert #test(yr,dist,school,history,science,commerce) values(1991,'D1','S1','358','650',null)

    insert #test(yr,dist,school,history,science,commerce) values(1992,'D1','S1','100','400',null)

    insert #test(yr,dist,school,history,science,commerce) values(1992,'D1','S1','300','504','305')

    insert #test(yr,dist,school,history,science,commerce) values(1992,'D1','S1','358','650',null)

    insert #test(yr,dist,school,history,science,commerce) values(1993,'D1','S1','100','400',null)

    insert #test(yr,dist,school,history,science,commerce) values(1993,'D1','S1','300','504','305')

    insert #test(yr,dist,school,history,science,commerce) values(1993,'D1','S1','358','650',null)

    insert #test(yr,dist,school,history,science,commerce) values(1994,'D1','S1','100','400',null)

    insert #test(yr,dist,school,history,science,commerce) values(1994,'D1','S1','300','504','305')

    insert #test(yr,dist,school,history,science,commerce) values(1994,'D1','S1','358','650',null)

    GO

    SELECT *

    FROM #test

    ;

    WITH stg AS (

    select

    yr,

    dist,

    school,

    subjct,

    studentID,

    0 as student_cnt,

    score

    from (

    select yr,

    dist

    , school

    , history

    , science

    , commerce

    , Row_Number() OVER ( ORDER BY Dist ) AS StudentID

    from #test) p

    unpivot

    (score for subjct in (history,science,commerce)

    ) AS Unpvt

    )

    SELECT yr,dist, school, subjct, Count(studentID) as studentcnt, Avg(score) avgscore

    FROM stg

    GROUP BY yr,dist, school, subjct

  • mahesh.vsp (8/18/2009)


    Using the below output, how can i get the total count of students who has the given the exam for which i need to get the values as 3.

    This is where that student ID that I mentioned comes in handy. You just do a separate query to return a distinct count of the student IDs. If you need to return these all in one query, then I would just use a subquery in the select statement to return the total count of students (by district and school if necessary).

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I modified the code to show you how this would be accomplished. Since your test data only has one school and one district, the numbers by school and by district and for all students are the same.

    DROP TABLE #test

    create table #test

    (dist varchar(10),

    school varchar(10),

    history decimal(9,2),

    science decimal(9,2),

    commerce decimal(9,2)

    )

    GO

    insert #test(dist,school,history,science,commerce) values('D1','S1','100','400',null)

    insert #test(dist,school,history,science,commerce) values('D1','S1','300','504','305')

    insert #test(dist,school,history,science,commerce) values('D1','S1','358','650',null)

    GO

    WITH stg AS (

    select dist,

    school,

    subjct,

    studentID,

    score

    from (

    select

    dist

    , school

    , history

    , science

    , commerce

    , Row_Number() OVER ( ORDER BY Dist ) AS StudentID

    from #test) p

    unpivot

    (score for subjct in (history,science,commerce)

    ) AS Unpvt

    )

    SELECT dist, school, subjct, Count(studentID), Avg(score)

    , ( SELECT Count(DISTINCT studentID) AS BySchool FROM stg AS sub WHERE sub.dist = stg.dist and sub.school = stg.school ) AS BySchool

    , ( SELECT Count(DISTINCT studentID) AS ByDistrict FROM stg AS sub WHERE sub.dist = stg.dist ) AS ByDistrict

    , ( SELECT Count(DISTINCT studentID) AS TotalStudents FROM stg ) AS TotalStudents

    FROM stg

    GROUP BY dist, school, subjct

    Unfortunately, the WITH ROLLUP and WITH CUBE can't handle distinct counts or this would be much easier. If using an Analysis Services cube was an option, it would also be easier to pull these numbers.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Drew,

    Here is what i actual want!!!!

    Am not getting how to achieve this dynamically.

    I need your help!

    The reason why i am stressing to get the src record count is,i have a requirement in which usng my "trg" table average values i must able to calculate the Avg of 3 subjects.

    select sum(avgscore*student)/3

    from trg

    where subject in ('history','science','commerce')

    I dnt want to hard the value 3 in my query, i want it to be dynamic

    (which is nothing but the number of record count in my src tbl/flat

    file)

    Hope you understood my problem!!!!!

    DROP TABLE #test

    create table #test

    (dist varchar(10),

    school varchar(10),

    history int,

    science int,

    commerce int

    )

    GO

    insert #test(dist,school,history,science,commerce) values('D1','S1','100','400',null)

    insert #test(dist,school,history,science,commerce) values('D1','S1','300','504','305')

    insert #test(dist,school,history,science,commerce) values('D1','S1','358','650',null)

    GO

    select * from #test

    This is what is there in my source, using this i want calculate Averages of HISTORY,SCIENCE,COMMERCE, HIST_SCIENCE_COMMERCE

    distschoolhistorysciencecommerce

    D1S1100400NULL

    D1S1300504305

    D1S1358650NULL

    Something like this

    distschoolhistorysciencecommerce HIST_SCIENCE_COMMERCE

    D1S1100400NULL 100+400

    D1S1300504305 300+504+305

    D1S1358650NULL 358+650

    select AVG(history),

    AVG(science),

    AVG(commerce),

    AVG(isnull(history,0)+isnull(science,0)+isnull(commerce,0))

    from #test

    /* output values */

    --252518305872

    The average value of i.e (avg(history+scince+commerce)) i should get from the "trg" table.

    IMP NOTE: the values in my trg table is already aggregated. represents AVERAGES of each subject.

  • Drew,

    I need this help from you.

    It is urgent!!!

    Thanks in advance.

  • In the above scenario, if u can hold the district count, then it would solve the problem.

    Please correct me if am wrong.

  • I've given you more than enough information to finish this on your own. I suggest you look at what I've already written and try to come up with the results on your own.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Drew,

    I understand.

    One last question from my side.

    Do you want me to add new columns to my trg?

    Am asking this becuase we dont have any permissions to add those new columns or else can we manage through query.

    Again the problem is the staging table will be deleted for every fresh year!!!!!!!!

  • Drew,

    Thank You so much.

    I resolved it by adding new columns.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply