How to create a rolling average?

  • Hello There,

    I am sure this question isn't as hard as I am making it but this newbie still needs some assistance.

    How would I go about create a rolling / moving average?  I found out that it is called "windowing" and it will be available in SQLServer 2005.  However, I need a work around for now.

    For example, I have 7 exam grades, each exam is taken on a different day. For the last 5 tests find the average. If the average is below the average of ALL previous groups of 5 Grades return the set of: student_name, class_name , grade , exam_date , teacher_name for the last 5 exams.

    If I have seven grades, then my first group of 5 grades would be grades 7 to 3, the second group would be grades 6 to 2, and the third group would be grades 5 to 1 and in this case there would not be a fourth group of five. I have to determine the number of groups and their averages then compare them to the lastest average and then report out the required info.

  • Sorry to dissapoint you, but the question seems to me much harder than you think. It is possible that I misunderstood the question, but I think that even in this case, you could learn something from my post: the use of subqueries. In some cases, you might replace them with temporary tables (they are bad, but not too bad), but please don't start using cursors in SQL, because row-based processing is much slower than set-based processing.

    First, let's create a table and add some sample data:

    CREATE TABLE Exams (
     student_name varchar(10),
     class_name varchar(20),
     exam_date smalldatetime,
     teacher_name varchar(15),
     grade numeric(5,2),
     unique (student_name, class_name),
     unique (exam_date, student_name)
    )
    INSERT INTO Exams VALUES ('John', 'Mathematics', '20040601', 'Smith', 7)
    INSERT INTO Exams VALUES ('John', 'Database systems', '20040602', 'Smith', 9)
    INSERT INTO Exams VALUES ('John', 'Algorithms', '20040603', 'Smith', 8)
    INSERT INTO Exams VALUES ('John', 'English', '20040604', 'Baker', 5)
    INSERT INTO Exams VALUES ('John', 'Geography', '20040605', 'Miller', 6)
    INSERT INTO Exams VALUES ('John', 'History', '20040606', 'Johnson', 5)
    INSERT INTO Exams VALUES ('John', 'Biology', '20040607', 'Harrison', 5)
    INSERT INTO Exams VALUES ('Mary', 'Mathematics', '20040601', 'Smith', 5)
    INSERT INTO Exams VALUES ('Mary', 'French literature', '20040602', 'Laurent', 9)
    INSERT INTO Exams VALUES ('Mary', 'English', '20040604', 'Baker', 10)
    INSERT INTO Exams VALUES ('Mary', 'Geography', '20040605', 'Miller', 8)
    INSERT INTO Exams VALUES ('Mary', 'History', '20040606', 'Johnson', 8)
    INSERT INTO Exams VALUES ('Mary', 'Biology', '20040607', 'Harrison', 6)
    INSERT INTO Exams VALUES ('Mary', 'Chemistry', '20040610', 'Harrison', 8)

    Now, for each exam, let's compute the average of the previous 5 exams (including this one):

    SELECT *, (
     SELECT AVG(Grade)
     FROM (
      SELECT TOP 5 * FROM Exams B
      WHERE A.student_name=B.student_name AND A.exam_date>=B.exam_date
      ORDER BY exam_date DESC
     &nbsp X HAVING COUNT(*)=5
    ) AS average_of_last_5_exams
    FROM Exams A 
    student_name class_name           exam_date                         teacher_name    grade   average_of_last_5_exams                  
    ------------ -------------------- --------------------------------- --------------- ------- ----------------------- 
    John         Mathematics          2004-06-01 00:00:00               Smith           7.00    NULL
    John         Database systems     2004-06-02 00:00:00               Smith           9.00    NULL
    John         Algorithms           2004-06-03 00:00:00               Smith           8.00    NULL
    John         English              2004-06-04 00:00:00               Baker           5.00    NULL
    John         Geography            2004-06-05 00:00:00               Miller          6.00    7.000000
    John         History              2004-06-06 00:00:00               Johnson         5.00    6.600000
    John         Biology              2004-06-07 00:00:00               Harrison        5.00    5.800000
    Mary         Mathematics          2004-06-01 00:00:00               Smith           5.00    NULL
    Mary         French literature    2004-06-02 00:00:00               Laurent         9.00    NULL
    Mary         English              2004-06-04 00:00:00               Baker           10.00   NULL
    Mary         Geography            2004-06-05 00:00:00               Miller          8.00    NULL
    Mary         History              2004-06-06 00:00:00               Johnson         8.00    8.000000
    Mary         Biology              2004-06-07 00:00:00               Harrison        6.00    8.200000
    Mary         Chemistry            2004-06-10 00:00:00               Harrison        8.00    8.000000
    (14 row(s) affected)

    Let's select the worst 5-exams average:

    SELECT student_name, MIN(average_of_last_5_exams) as worst_average
    FROM (
     SELECT *, (
      SELECT AVG(Grade)
      FROM (
       SELECT TOP 5 * FROM Exams B
       WHERE A.student_name=B.student_name AND A.exam_date>=B.exam_date
       ORDER BY exam_date DESC
      &nbsp X HAVING COUNT(*)=5
     &nbsp AS average_of_last_5_exams
     FROM Exams A 
    ) Y WHERE average_of_last_5_exams IS NOT NULL GROUP BY student_name 
    student_name worst_average       
    ------------ ------------- 
    John         5.800000
    Mary         8.000000
    (2 row(s) affected)

    And find out the day when this occurred (if there is more than one day with the worst average, we select the first one):

    SELECT Z.student_name, MIN(exam_date) as worst_day FROM (
     SELECT *, (
      SELECT AVG(Grade)
      FROM (
       SELECT TOP 5 * FROM Exams B
       WHERE A.student_name=B.student_name AND A.exam_date>=B.exam_date
       ORDER BY exam_date DESC
      &nbsp X HAVING COUNT(*)=5
     &nbsp AS average_of_last_5_exams
     FROM Exams A 
    ) Y2 INNER JOIN (
     SELECT student_name, MIN(average_of_last_5_exams) as worst_average
     FROM (
      SELECT *, (
       SELECT AVG(Grade)
       FROM (
        SELECT TOP 5 * FROM Exams B
        WHERE A.student_name=B.student_name AND A.exam_date>=B.exam_date
        ORDER BY exam_date DESC
       &nbsp X HAVING COUNT(*)=5
      &nbsp AS average_of_last_5_exams
      FROM Exams A 
     &nbsp Y WHERE average_of_last_5_exams IS NOT NULL GROUP BY student_name 
    ) Z ON Y2.student_name=Z.student_name AND Y2.average_of_last_5_exams=Z.worst_average
    GROUP BY Z.student_name
    student_name worst_day    
    ------------ ------------------- 
    John         2004-06-07 00:00:00
    Mary         2004-06-06 00:00:00
    (2 row(s) affected)

    And finally, select the last 5 exams before the worst day:

    SELECT D.* FROM Exams D INNER JOIN (
     SELECT Z.student_name, MIN(exam_date) as worst_day FROM (
      SELECT *, (
       SELECT AVG(Grade)
       FROM (
        SELECT TOP 5 * FROM Exams B
        WHERE A.student_name=B.student_name AND A.exam_date>=B.exam_date
        ORDER BY exam_date DESC
       &nbsp X HAVING COUNT(*)=5
      &nbsp AS average_of_last_5_exams
      FROM Exams A 
     &nbsp Y2 INNER JOIN (
      SELECT student_name, MIN(average_of_last_5_exams) as worst_average
      FROM (
       SELECT *, (
        SELECT AVG(Grade)
        FROM (
         SELECT TOP 5 * FROM Exams B
         WHERE A.student_name=B.student_name AND A.exam_date>=B.exam_date
         ORDER BY exam_date DESC
        &nbsp X HAVING COUNT(*)=5
       &nbsp AS average_of_last_5_exams
       FROM Exams A 
      &nbsp Y WHERE average_of_last_5_exams IS NOT NULL GROUP BY student_name 
     &nbsp Z ON Y2.student_name=Z.student_name AND Y2.average_of_last_5_exams=Z.worst_average
     GROUP BY Z.student_name
    ) W ON D.student_name=W.student_name AND D.exam_date<=W.worst_day
    WHERE (
     SELECT COUNT(*) FROM Exams E WHERE E.student_name=D.student_name
     AND E.exam_date BETWEEN D.exam_date AND W.worst_day
    )<=5
    student_name class_name           exam_date                         teacher_name    grade   
    ------------ -------------------- --------------------------------- --------------- ------- 
    John         Algorithms           2004-06-03 00:00:00               Smith           8.00
    John         English              2004-06-04 00:00:00               Baker           5.00
    John         Geography            2004-06-05 00:00:00               Miller          6.00
    John         History              2004-06-06 00:00:00               Johnson         5.00
    John         Biology              2004-06-07 00:00:00               Harrison        5.00
    Mary         Mathematics          2004-06-01 00:00:00               Smith           5.00
    Mary         French literature    2004-06-02 00:00:00               Laurent         9.00
    Mary         English              2004-06-04 00:00:00               Baker           10.00
    Mary         Geography            2004-06-05 00:00:00               Miller          8.00
    Mary         History              2004-06-06 00:00:00               Johnson         8.00
    (10 row(s) affected)

    Did I scare you completely with these queries ? I hope not.

    If I misunderstood your problem, please provide more explanation and the expected results based on this sample data (or other sample data with the same structure).

    Razvan

    PS. Where you see a it is really just a ) but the forum software seems to mess it up somehow...

     

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply