August 14, 2009 at 1:27 pm
I have a program that lets the user build a chart based on criteria that they select. Then, they can choose to save that criteria to run later, and they can save the current results to be able to compare data at a later time. In it's most simplified form, the criteria consists of users, equipment that the user has, and skills.
Each user can have multiple pieces of equipment, each equipment can have multiple users, and each user-equipment combination has ratings (0 - 9) for each skill (about 500 of them). The user generating the chart selects which users, equipment, and skills they want to see, and if they want to see the chart display the data grouped by the user, equipment, or skill. I have a stored procedure that pulls up all of the individual ratings and then groups and averages based on what the user selects, and then returns the averages.
Currently, when a user saves a chart, the stored procedure is ran, and the averages are stored in a table. What I'm wondering is, should I be storing the aggregate data, or is it a better practice to store all of the selected data as a snapshot?
The pros I see of storing the aggregate data are:
1) Space - Criteria that might require calculating thousands and thousands of ratings might only generate 10 averages that needed to be stored.
2) Retreival Speed - No need to recalculate when pulling up a saved set of data
The cons are:
1) Flexibiltiy - If a user saves a chart displaying averages for each user, they couldn't change the view to see what the equipment averages were.
2) Saving Speed - Looking at some of the saved criteria currently being used, some of it takes 15 to 20 seconds to gather and calculate all the data, and it would obviously take that long to save it every time.
3) No Drill-Down - Along with flexibility, there's no way to look back at a saved chart to see why a certain average is lower or higher than expected.
The pros and cons I see of storing the raw data would just be the opposite of the aggregate data.
Are there any recommended practices to use when working with saved data? Are there any other options than just saving raw or aggregate data?
Thanks for any suggestions.
August 14, 2009 at 4:42 pm
To answer the question of storing aggregates vs. storing all the data, you simply have to answer another question: Is the ability to drill down or get more information than the aggregate something you want to provide? If the answer is yes, then you need to store more than the aggregate data.
To decide best how to store the data, you need to figure out exactly what you want to allow them to do, or why they need to do it. The better you understand what they're using something for, the more you can refine your design. For instance, if they are only interested in when things change, you may be able to do it with change tacking rather than bulk data storage.
Also, it's a little unclear where all this data is coming from. Is it stored in a database you have access to, are they creating this data each time they use the report?
August 18, 2009 at 1:26 pm
Garadin (8/14/2009)
Also, it's a little unclear where all this data is coming from. Is it stored in a database you have access to, are they creating this data each time they use the report?
Yes. It's stored in the main database. It is information that employees enter about themselves, and then managers run the reports on that data. There's almost always at least one person updating their ratings at any given time, so the data is changing thousands of time per day.
I'm starting to get a better feel for what the managers are asking for, but I think they're still confused on what it is they need in the first place. :rolleyes:
August 19, 2009 at 9:21 am
Having the data gives you a lot more options. Just throwing out a couple ideas of things you could do with it in that case:
Set up an a scheduled job to copy the current data into a logging table at regular times (like every n months).
Set up auditing so you can track changes to the data. (Something like UserID, Skill Changed, Prev Value, New Value, Date Changed, Changed By, etc.)
The way you handle it depends on the amount of time you want to devote to it, the size of the data, the frequency of changes, the kind of reports/information you want from it, etc.
In my opinion, unless
A. They are making a very large number of changes to the data, and
B. You don't care about things like when ratings were changed, history of changes, trending of changes, progression etc
Auditing is the way to go. This gives you the most options for writing reports/historical analysis of your data, and likely takes up the least room. That said, some of the reports would likely be *harder* to write than if you just did a data dump every so often, but you do have the data to write them(where as with the dump, you just don't have the data for many of them). You also may incur a bit of overhead doing the extra writes to a logging table.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply