Storage Modes in analysis Services 2005
Overview
In SSAS 2005, a cube, a partition within a cubeor a dimension can store data and aggregations in numerous ways, each with its
own pros and cons. While Microsoft has retained the storage modes available in
AS 2000 (MOLAP, HOLAP and ROLAP), it has added several new storage modes, which
make an SSAS implementation a more robust and flexible solution in a production
environment. A new concept: proactive caching has been introduced, and as the
name implies, using this new feature enables the administrator to better control
the frequency and the way cube updates are being performed.
What exactly do we store?
SSAS 2005, being an OLAPengine, stores cumulative data of the measures defined in the cube. For example,
if we have a cube with a customer dimension and a time dimension, we might have
a measure like: sales amount which will indicate the dollar figure of the
sales per customer in a certain date and time. This data is stored as part of
the cube. On top of the measures data, each cube can have its own
aggregations. Aggregations are pre-calculated summaries of data for various
combinations of dimension slicing. SSAS 2005 needs to store both the measures
data and the aggregation data on disk. This enables the OLAP engine to retrieve
answers to users queries faster. In the remainder of this article, Ill try to
explain what types of storage modes SSAS 2005 offers to store this data.
1. MOLAP (Multi dimensional OLAP)
This is the most commonstorage mode. It implies that the cubes data and aggregations will be stored in
a multidimensional format. This is the most efficient way to store data and will
provide the user with optimal query performance. The downside with using this
storage mode is that it creates a completely offline data set. Once the cube
retrieved the data from the underlying relational database and processed it,
there is no further connection to the relational database. Any subsequent
updates in the database will not be reflected in the cube, unless it is
re-processed. Now, for most data warehouse solutions, a daily or even a weekly
re-process of the cubes is more then enough. For environments like these, it is
highly recommended to use the MOLAP storage mode.
2. Scheduled MOLAP
This is somewhat similar to the MOLAPstorage mode, in that it stores its data in a multidimensional format. However,
the content of the cube/partition cache will be automatically updated every 24
hours. In a way, using this storage mode already utilize the proactive caching
feature. It is important to note that theres no need to create a SQL Agent job
or an IS package to process the cube. The cubes cache will be refreshed
automatically every 24 hours.
3. Automatic MOLAP
Again, data is stored in amultidimensional format. There is a basic change, though, when compared with the
previous 2 storage modes. Using an automatic MOLAP storage mode means that
Analysis Services needs to listen to notifications from the underlying
relational database. Once an update has occurred in the database, an event is
raised and sent to Analysis services. This type of messaging system can be
achieved by using Notification Services, for example. When the Analysis Services
Server has received the notification, it will automatically refresh the cubes
cache, to reflect the new changes in the database. Users will still be able to
access the cube while the cache is being refreshed, but they will be using the
old cache. Once the cache is refreshed, users will be redirected to the new
data. This type of storage provides the users with an almost up-to-date cube
(the target latency is 2 hours from update to process). This obviously depends
on the cube size and the magnitude of changes in the underlying database.
4. Medium Latency MOLAP
Things are starting to getinteresting as we approach the latency variance (which we touched briefly in
the previous section). Initially data is stored in a multidimensional format.
The Analysis Services server is listening to notifications from the underlying
relational database. When an update is performed in the database, the cube
switches to a Real Time ROLAP storage mode. We will touch on this storage mode
shortly, but I will briefly note that this storage mode means: no
multidimensional storage at all and the cubes data is being retrieved directly
from the underlying relational database. The timing of the switch to ROLAP is
determined by the latency variable. By default, cache will be updated after 4
hours from the time the relational database was updated. This switch should not
last for long though, as in parallel, the cubes MOLAP cache gets processed to
reflect the latest changes. Once the cache is processed, the cube switches back
to the MOLAP storage mode. You would want to use this type of storage when you
realize that the underlying database goes through updates occasionally, and
users do want to have an up-to-date cube. However, they still require
performance to be reasonable. Having the cube reflecting data which is accurate
up to the last 4 hours is reasonable for the users and provides the correct
trade-off between performance and real-time data. If the users agree to these
terms, then this is the preferred storage mode to use.
5. Low Latency MOLAP
This mode is similar in its behaviorto the Medium Latency MOLAP mode, the only change is (yes, you may have guessed
it by now) the latency. Instead of allowing a latency of 4 hours, this mode will
allow only up to 30 minutes of latency. This implies a more frequent switch to
the ROLAP mode and even poorer query performance, as the cube is being processed
more often. The users do get to see a more updated version of the data though.
6. Real Time HOLAP
HOLAP stands for hybrid OLAP. So far wehave discussed having the data and aggregations stored in a multidimensional
format. In the hybrid storage mode, data is maintained in its relational format
(i.e. data resides in the relational database), while aggregations are stored in
a multidimensional format. What this means is that the data is always
real-time data. The Analysis Services server still listens to notifications,
and when changes in the relational database occur, the aggregations (which are
still stored in a multidimensional format) are refreshed. During this process,
the cube is switched to the infamous ROLAP mode, until aggregation processing is
complete. It is easy to see that users are enjoying real-time data with the
added benefit of MOLAP aggregations to improve query performance. Still,
performance is not as good as it used to be when compared to data stored in a
multidimensional format, as well as the aggregations.
7. Real Time ROLAP
This is the last storage mode availableand we are actually quite familiar with it already. Data and aggregations are
stored in a relational format. This means zero latency for users, who will
always be able to access the cube and retrieve real-time data. Query performance
is the poorest here though as no MOLAP objects are available at all.
Just a bit more before finishing
You can apply thevarious storage modes on cubes, partitions and dimensions. Some of the features
will not be available on some of the objects. When using proactive caching,
there are several settings which you can manually tune, like the latency, the
listening intervals and the notification types. You can also fine-tune the
processing options and instruct the SSAS engine whether to fully process a
partition or to incrementally process it.
Conclusion
The MOLAP, HOLAP and ROLAP storage modes wereavailable in the Analysis Services 2000 version. The important addition to the
2005 version is the proactive caching feature, which enables the user to have a
real-time version of the data while still enjoying the query performance
benefits of the MOLAP storage mode.