group by tally vs triggers

Question

Post reply

group by tally vs triggers

bogusemail28

SSC Journeyman

Points: 80
More actions
August 11, 2014 at 4:14 pm

#297877

I am writing a program that requires a few fields to be calculated and tallied and stored in a results column. I have created views that do this. Works great. What bothers me about this approach is that over a lifetime it will require lots of calculations over and over when the views are accessed. I could set up a result field in a table and set up triggers to tally and update the result field every time key component data is inserted or updated. What bothers me about that approach is that over a lifetime it will require maintenance and checking to see if the result column has not become corrupted. It seems to be a no-win question of which is the lesser evil? Is there an accepted practice for which approach is best to take?
group by in view - less maintenance but lots of repeat work
trigger - more maintenance but less repeat work

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

Not enough information.

What do you mean by "tallied"?

Any sample data (table def, sample data and expected result) available to demonstrate what you're looking for?

Based on your -rather vague- description I might vote for a computed column...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

bogusemail28 SSC Journeyman Points: 80 More actions · Answer 2

For example let's say you have several test questions and student answers and a student score.

1. You create a trigger and a score field. Every time a student updates an answer in the test the trigger tallies up the sum of all the answers and updates the field named score accordingly.

2. You create a view that shows student results. The view has a group by clause and a sum that is displayed as score.

Both of these approaches produce the same result: student score on a test

Which is better software development practice?

mickyT SSChampion Points: 10360 More actions · Answer 3

Hi

You may also want to look at indexed views.

Edit: added link

bogusemail28 SSC Journeyman Points: 80 More actions · Answer 4

I can do that. Right now I need to know if my design decision to go with views over triggers was the right one.

mickyT SSChampion Points: 10360 More actions · Answer 5

I suppose it comes down to where you want the performance hit to happen.

If you have a table that has a lot of changes happening to it but is infrequently queried, a straight view is probably the way to go.

If the table only has a few changes but is frequently queried, a trigger or indexed view is probably the best bet.

I brought up the indexed views as I'm not a big fan of triggers and they may suit your situation.

My gut feel given the student example was an indexed view.

bogusemail28 SSC Journeyman Points: 80 More actions · Answer 6

Reliability issues bother me more than performance hit issues. Search this forum and you will find lots of complications that occur with triggers. Some of the calculations I need to do are fairly complex. The nice thing about a view is select statements do not crash or raise a fuss if something is wrong.

Regarding the performance hit. It would help to know what goes on under the hood of MS SQL. If a view has 100,000 records and one of them changes does that mean the entire view needs to be recalculated as it is used? Is calculating group by values in a view limited only to the confines of the where clause or is it done for all values in the component tables every time the view is accessed? Where does MS SQL store the cache? RAM memory or in some sort of disk swap space area? Is a view cache or the index that you mentioned exist after a reboot or does MS SQL have to work to recreate it every time? These are the types of questions that I would need to make a decision about my application. I use my views with very limited confined bounds. The data within the confines does not change often. However data in the whole view might change on a near constant basis. For example the telephone directory does not change often on your block but changes occasionally in your neighborhood and constantly in the city. What scope of change causes views to be recalculated?

mickyT SSChampion Points: 10360 More actions · Answer 7

Where does MS SQL store the cache? Is a view cache or the index that you mentioned exist after a reboot or does MS SQL have to work to recreate it every time?

An indexed view has a clustered index. This is physically stored and will remain without having to be recreated every time. When a query is done on the view the clustered index is used for the results rather than the base table.

If a view has 100,000 records and one of them changes does that mean the entire view needs to be recalculated as it is used?

My understanding is that when you update the base table, only the items in view's clustered index that require a update will be done.

Is calculating group by values in a view limited only to the confines of the where clause or is it done for all values in the component tables every time the view is accessed? RAM memory or in some sort of disk swap space area? These are the types of questions that I would need to make a decision about my application. I use my views with very limited confined bounds. The data within the confines does not change often. However data in the whole view might change on a near constant basis. For example the telephone directory does not change often on your block but changes occasionally in your neighborhood and constantly in the city. What scope of change causes views to be recalculated?

I've mocked up the student result example you mentioned. It has 1,000,000 rows in the results table and 10,000 rows returned by the summary view.

The tables clustered index takes up 3114 pages and the views clustered index takes up 37 pages.

-- Create result table

CREATE TABLE result (

StudentID INT NOT NULL,

ExamID INT NOT NULL,

QuestionID INT NOT NULL,

Score INT NOT NULL,

CONSTRAINT result_pk PRIMARY KEY (StudentID, ExamID, QuestionID)

);

-- Add some data

WITH cteTally AS (

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N

FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E1(N),

(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E2(N),

(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E3(N)

),

ExamQuestion AS (

SELECT e.N ExamID,

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) QuestionID

FROM (SELECT TOP 10 N FROM cteTally) e,

(SELECT TOP 100 N FROM cteTally) q

)

INSERT INTO result

SELECT s.N StudentID,

ExamID,

QuestionID,

CAST(RAND(CAST(NEWID() AS VARBINARY(30))) * 15 AS INT) Score

FROM (SELECT TOP 1000 N FROM cteTally) s,

ExamQuestion eq

;

GO

-- Create view

CREATE VIEW resultSummary

WITH SCHEMABINDING

AS

SELECT StudentID,

ExamID,

SUM(Score) AS FinalMark,

COUNT_BIG(*) AS TotalQuestions

FROM dbo.result

GROUP BY StudentID, ExamID;

GO

-- Index the view

CREATE UNIQUE CLUSTERED INDEX resultSummary_pk ON resultSummary(StudentID, ExamID);

GO

Doing the following simple update on it

UPDATE result

SET score = score * 2

WHERE StudentID = 1 and ExamID = 1

Gives the following IO stats

Table 'resultSummary'. Scan count 0, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 3, logical reads 410, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'result'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And the attached execution plan showing the actual number of rows going through the update.

Hope this helps

bogusemail28 SSC Journeyman Points: 80 More actions · Answer 8

bogusemail28

SSC Journeyman

Points: 80

August 12, 2014 at 2:38 pm

#1737360

Wow! Thanx.

Michael L John One Orange Chip Points: 27079 More actions · Answer 9

Regarding the performance hit. It would help to know what goes on under the hood of MS SQL. If a view has 100,000 records and one of them changes does that mean the entire view needs to be recalculated as it is used?

A view does not get recalculated, unless it is an indexed view. And then, only the index gets changed.

Good starting point:

https://www.simple-talk.com/sql/learn-sql-server/sql-server-indexed-views-the-basics/

Is calculating group by values in a view limited only to the confines of the where clause or is it done for all values in the component tables every time the view is accessed?

SQL server does not store the values contained in a view. Think of a view as a virtual table. The values are changed only at the base table level. If a view is access frequently, it may be cached, in the same manner as a query is cached.

Where does MS SQL store the cache? RAM memory or in some sort of disk swap space area?

Both, depending upon the nature of the activity.

Is a view cache or the index that you mentioned exist after a reboot or does MS SQL have to work to recreate it every time?

You really need to do some basic learning about a database, structures, and "how things work"

Start on this site with the Stairway series.

But to answer your question, indexes are persisted. They do not change when a re-boot occurs. They are hardened to disk as transactions occur. Unless you create objects in tempdb. Then they disappear!

Again, think of a view as a set of code that is packaged in a nice convenient location. It's reusable, the code does not have to be re-written over and over. There is overhead to a view, but not in any way that you are thinking

These are the types of questions that I would need to make a decision about my application. I use my views with very limited confined bounds. The data within the confines does not change often. However data in the whole view might change on a near constant basis. For example the telephone directory does not change often on your block but changes occasionally in your neighborhood and constantly in the city. What scope of change causes views to be recalculated?

Again, there is no "recalculation". The underlying tables contain the data, the view packages them nice and neat.

Based upon what you describe, the view, or a computed column, is probably the way to go.

Put together a few tests. Create your tables, add your triggers, and insert a few million rows into the table. Then, run a few hundred selects on the tables.

Then, remove the trigger, create your view, and repeat.

Compare the results!

One more question. Do these test results need to be real-time? Can the calculation of the test results be pushed off to a separate process?

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

bogusemail28 SSC Journeyman Points: 80 More actions · Answer 10

real-time

it is not as critical as banking data but when something changes the tally needs to be updated right away