August 23, 2023 at 12:41 pm
I want to ask if it is a good idea to store monthly total values into SQL database for my application.
I'm sorry but I have to ask again for the same project I asked some months ago, but for a different question.
let me first explain the context :
I have a legacy application for employees timesheet. I have readonly access to some views that I use to import data into the database application I'm developing. More specifically, I import data related to the worked hours, hours of absence, etc. that are read from a view with fields like these:
EmployeeID - Date - Value - Desc
1 - 02 jan 2023 - 480 - worked hours
1 - 02 jan 2023 - 45 - overtime
1 - 03 jan 2023 - 440 - worked hours
1 - 03 jan 2023 - 40 - leave hours
Value represent time in minute (value 480 = 8 hours).
The application I' developing must collect some other information on monthly base and for each employee, such as, for example, the number of authorised overtime hours, the number of overtime hours paid, and so on. For these data I use a table MonthlyTimesheet.
Each month I need to calculated the monthly hours balance but also the annual progressive total.
To keep things easy, assume that the current month balance is given by the overtime hrs - leave hrs - paid hrs. In our organization leave hours are subtracted from overtime and some overtime hours may not be paid but set aside for time off.
For example:
Employee : John Doe
Month: January 2023
Overtime Hours : 20 (field computed from the value imported from the view)
Leave hrs : 1 (field computed from the value imported from the view)
Paid hrs : 15 (value insert in the MonthlyTimesheet table)
Month balance : 20-1-15 = 4hrs
In order to calculate the annual progressive balance I need the December 2022 balance!
So, here is the point, even if I can calculate monthly hours balance every time is needed, in order to calculate the annual total I need to store it somehow.
I must add that sometimes the user can update data in the legacy application (for example to correct some data input error) and this change cannot be reflected automatically into my applciation (I have no control about that). When some changes happens, the user simply re-import data from the view. This means that for that month the hour balance must be re-calculated\updated.
I think I have to add a monthly total field to the MonthlyTimesheet table. Should I add also a field for the annual progressive total ?
August 23, 2023 at 4:05 pm
Why bother storing something that you can calculate out when you query the data?
What I mean is your table should have only the required data and then build views to get the rest of the data. You have the ID, the Date, the Minutes worked, and a description. My opinion, I would normalize that a bit more and have the description changed to "work code" or something like that and have it be an integer value and map that over to a work code table. That way you can limit the number of options for employees. What if one person puts in "overtime" another puts in "over time" and another puts in "OT"? Reporting on that data becomes problematic.
But I am getting a little off topic now. So what I'd do is keep the table as is (with my suggestion for the work code thing added), and then have a view or stored procedure (SP is nice for the app to call rather than calling the view/table directly) that shows the data per month (sum(value) over (partition by month(date))) and that'll give you the total per month and do a similar thing for the quarter and year if needed. No reason to persist it to disk, especially since you'd need a trigger on the table to keep things up to date as values change.
My opinion - if the value can be calculated easily, don't bother persisting to disk. You could also use a calculated column if you don't want to put it into a view.
Another thing though, I would definitely look at your table design decisions. Concerns I see are:
1 - column named using keywords (date for example)
2 - key columns <-- which column is your PK?
3 - indexes <-- do you have a CI on the table?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
August 24, 2023 at 1:01 am
Why bother storing something that you can calculate out when you query the data?
To get sheer, blinding, speed. It's the basic method that Pre-Aggregation through Indexed Views work. Of course, it also implies that that you're going to use the result more than once.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2023 at 6:32 am
If the data is huge then go for storing in permanent table, update the table with any legacy data changes.
If it is not huge and I hope the legacy data changes not going back beyond a year, in this case a YTD view would be better, this view should also bring all the latest changes that were done to legacy data.
=======================================================================
August 24, 2023 at 6:50 am
Most likely I wasn't clear.
Sample data I shown are part of data read from a view belonging to a third-part legacy dabase . I have no control overt that but only reading access for the data I need to import.
The process should be this :
I can calculate the monthly hours balance every time, but I'm wondering which is the best way to get also the running annual total.
This value depends, obviously, on the balance of the month before and so on backwards.
I need also a way to insert the first initial values, but for that probably I will manually insert in the database a new entry with initial values (hours balance at that time) in the MonthlyTimesheet table in my database.
This table, at the moment, looks like:
If I don't enter the MonthlyHrsBalance column, how could I save the initial data from which to start all subsequent calculations?
August 24, 2023 at 10:31 am
@Emperor100
How many employees are you trying to process?
we have about 60 employees.
If it is not huge and I hope the legacy data changes not going back beyond a year, in this case a YTD view would be better, this view should also bring all the latest changes that were done to legacy data.
Legacy data changes due to input errors could refer to the last month or 2 in the worst case.
I'll take a look to what a YTD view is.
Do you think I need a monthly total column in the table?
Suppose I will start the application in september, i need a way to save the initial situation : total hours at the end of august.
Starting from that I can make sum.
Am I wrong?
Or the YTD view should calculate the initial values? I don't know now how it works.
August 24, 2023 at 3:14 pm
I mean, in the end the design is up to you, but for me, I don't see the point of storing the monthly and yearly values when you have the daily. You can use windowing functions to do the sums like I suggested originally and you wouldn't need to persist to disk. Same thing applies to any calculations you need to do. Or, depending on how you create the view/SP, you may not even need windowing functions if you can do it with "group by"'s. Looking at the columns you are trying to use though, I think that windowing functions are going to be your friend here. Alternately, you could use nested selects with self-joins, but that will not be as fast as a windowing function would be.
One big reason I like the above approach is there is no code changes required on the 3rd party app and the database side just needs a new stored procedure or 2 for calculating out the monthly and yearly values. Then if end users need to report on this, they can do it from Excel (use Excel to call the stored procedure and pull the data in) or SSRS or whatever reporting tools you have in your toolkit. No need for a custom app or extra work by the end users. The data exists and you are just doing calculations on it.
And while I agree with Jeff that it would be "blazing fast" to persist to disk, summing up 28-31 days of data for 60 users should already be "blazing fast"... same thing with 365/366 days of data. Especially with only 60 users. SQL can add up 21,900 (60*365) numbers nearly instantly if there is a good index on the data. If you lack good indexes or are not able to make good indexes, then persisting to disk may be required.
Now if persisting to disk is an absolute must, I would still not bother with an app as that is just extra overhead for end users. I would create a job on the SQL server to take the data from the view and do the calculations on it and persist it to disk. For presentation of the data to end users, I'd use a reporting solution (Excel, SSRS, PowerBI, Tableau, etc.) and you are good to go. The ONLY reason I'd use an app is if the end users need to enter some data before it gets pushed back to SQL. Plus if you use the app approach to persist it to disk AND each of the 60 employees needs to click the button, you are going to have a lot of blocking happening on that table. If you use a job to do it, a much smaller chance of blocking.
And what I meant by a YTD view was just to have a view that takes the raw data and sums it up by year.
Like I said, I'd have either 1 stored procedure or 2 views. I personally prefer the stored procedure approach as you could have the parameters set up to give results per employee, per month, or per year based on the parameter provided. 3 INT parameters, all default null. First one being the employee ID, the second being the month (3 for March for example), and the last being the year. Then your query just needs to have a properly formed WHERE clause to get the result for you and that is easy to do with an ISNULL. May not be the most efficient query (functions in WHERE clauses hurt performance), but with only 60 employees I suspect that the performance would be "good enough".
My first step in a situation like this would be to take a sample of the data, remove any PII, and put it into a test system and put it into Excel. In Excel, I would create the monthly and yearly totals so I can see what values I should be expecting. Then I'd move over to my test SQL instance and see if I can reproduce those numbers. The 4 row sample you provided isn't really enough to go in in my opinion. I would want at a minimum 2 employees and time range spanning over 2 months. Now this COULD be done with 4 rows, but I'd want something a bit more interesting when I was building my sample set and would likely want at least 100 rows. Excel can calculate it pretty easily with a SUMIFS() function and SQL can handle it with a SUM and GROUP BY or with a SUM OVER(PARTITION BY) approach.
In the end, the approach you use is up to you, but if I was doing it I only persist to disk things that are hard to calculate or are slow to calculate OR I need point in time data. By hard to calculate, I mean if I need to use a CTE or nested selects or a lot of joins or are data points that have no method to calculate them such as I cannot calculate a persons name. By slow to calculate I mean if I am doing a sum of a column in a table with a few hundred TB of data stored on a HDD, that will not be quick. By needing point in time data I mean in your scenario if you looked at the data on September 15th if you NEED September to be 0 hours and August to be the value it was on September 1st. You indicate that values can be changed going back 2 months historically so if I need the data from a specific point in time (September 1st), then I'd need to persist to disk. If point in time is not required, one big advantage to the stored procedure/view approach is that you get real-time data. You look on September 15th for the number of hours that employee 1 worked in September, it will show you that data and it will be accurate up to the second that you ran the report.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
August 24, 2023 at 8:53 pm
I think keeping monthly totals in a separate table is a good idea. It's done all the time in the insurance industry. If you keep the monthly totals it should be faster as Jeff noted. You may have a small set of data right now but you don't know what the future will hold.
If you keep monthly totals then you have a snapshot in time where 'history' doesn't change. financial departments don't like to get different results today than they got last month when rerunning for a prior month. Let's say your January 2023 is closed and you have your totals for that month, but you aren't keeping those totals in a separate table. If some thing happens and data in July is accidentally loaded with a January date. Then when you sum up your daily records for January in August you would get different results than when you ran after January and every month until the July data was loaded. If you have the monthly totals stored then History doesn't change when you rerun for January if you are using the monthly table. This may even help you find any that are accidentally added like I mentioned above. I know we all think it won't happen, but a wrong date card change could cause you a huge headache. We also have YTD totals stored on those monthly files. I'm not a huge fan of it but it was decided long before me.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
August 24, 2023 at 11:37 pm
As a bit of a sidebar, if you don't want historical data to ever change, partition the tables (file and filegroup) and make the older files "Read Only". It also has the benefit of no longer having to do index maintenance on those months and they can also be excluded from backups because they haven't changed since you set them to read only.
And while I agree with Jeff that it would be "blazing fast" to persist to disk, summing up 28-31 days of data for 60 users should already be "blazing fast"
True dat! And, to be sure, that's not what I was referring to (my demonstration example starts at 100 million rows for a year with more aggregates than Carter has pills with a return time for everything that takes something less than 16 milliseconds and is always up to date without ETL.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 25, 2023 at 5:01 pm
@Emperor100 How many employees are you trying to process?
we have about 60 employees.
This should not be a big deal.
If it is not huge and I hope the legacy data changes not going back beyond a year, in this case a YTD view would be better, this view should also bring all the latest changes that were done to legacy data.
Legacy data changes due to input errors could refer to the last month or 2 in the worst case.
I'll take a look to what a YTD view is.
That is again not a huge window...good.
Do you think I need a monthly total column in the table?
Suppose I will start the application in september, i need a way to save the initial situation : total hours at the end of august.
Starting from that I can make sum. Am I wrong?
Or the YTD view should calculate the initial values? I don't know now how it works.
Yes, you can add a total column for the month of August, something similar to a bank balance of the month-end.
YearMonth - OpeningBalance - Col1 - Col2 - Col3............ ClosingBalance
Closing balance of August will become the opening balance for the month of September... and add whatever columns that you would want to look for end users or should be displayed in reports.
This will repeat for each month for future needs.
=======================================================================
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply