June 15, 2012 at 8:20 am
I am normally on the OLTP side of things but I am wanting to learn the new BI features of SQL Server 2012. What are typical recommendations for setting up a DW in 2012? For instance, are you planning on using both AlwaysOn to create a read-only copy for OLTP reporting and then also creating a DW in SSAS? Or if you do a DW, do you skip the AlwaysOn OLTP reporting and only use SSAS?
When using the new Tabular data model in SSAS, can you report directly off of OLTP tables? Or is the recommendation to get the data into a DW format similar to how AdventureWorks is translated to AdventureWorksDW? If you need to get into a DW format, are you using CDC via SSIS?
I have been playing with the new Tabular model/PowerView functionality (great live demos are here http://www.microsoft.com/sqlserver/en/us/learning-center/virtual-labs.aspx ) and am rather impressed.
June 15, 2012 at 2:39 pm
Shoot can someone move this to the strategies and ideas BI section?
June 17, 2012 at 1:43 pm
Here is a good place to start to get some idea of what Microsoft set forth as documented Best Practices when it comes to setting up a DW. It talks about CDC, SSIS, PowerPivot and other topics you raised so I think it may answer a lot of your questions, or at least give you somewhat of a background to begin some proof-of-concept work.
The document is for Microsoft for SQL Server 2008 R2. I do not see that one has been produced specific to SQL Server 2012, however the majority of the concepts will still pertain to the newer version:
Best Practices for Data Warehousing with SQL Server 2008 R2
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 18, 2012 at 6:42 pm
JJ-469859 (6/15/2012)
... are you planning on using both AlwaysOn to create a read-only copy for OLTP reporting and then also creating a DW in SSAS? Or if you do a DW, do you skip the AlwaysOn OLTP reporting and only use SSAS?When using the new Tabular data model in SSAS, can you report directly off of OLTP tables? Or is the recommendation to get the data into a DW format similar to how AdventureWorks is translated to AdventureWorksDW? If you need to get into a DW format, are you using CDC via SSIS?
.
Thanks for the link to that doc, interesting read but be careful given it comes from the vendor.
Unforetunately there are no simple answers to your questions. I think ultimately it comes down to how many users are on your system and how much they thrash the OLTP database and what sort of expectations the organisation has of the BI system and their budget.
The main purposes of a BI system are to provide very fast reporting, data analysis and ultimately prediction. There is a hierarchy that goes something like this:
data -> information -> knowledge -> wisdom -> prediction
Obviously running reports against a production OLTP system is not ideal, and using a read-only copy will help but the queries that drive the reports will still require joining a lot of tables and any analysis queries will run really really slowly, but it might be enough to stop there.
A star/snowflake style of schema in a Data warehouse is enough to provide faster reporting and it's a more accessible model for end users. You could use SSIS to pump data from the production OLTP systems into the DW, but usually via a staging area/landing zone. The idea is to keep production DB and DW downtime to a minimum. You could stop there.
The next step beyond that would be to create an OLAP cube from the DW. Technically you could create an OLAP cube directly from the OLTP production database but it's not advised.
When using the new Tabular data model in SSAS, can you report directly off of OLTP tables
Technically no. SSAS is an OLAP cube solution. It creates another structure (cube) separate from the source database (ideally the DW). It's a semantic model of the data designed to be very quick, have friendly names and obvious data relationships. It also has roll-up and other types of aggregations pre-calculated so data analysis is fast. When you use SSAS you are never directly using the source tables whether they are from the DB or the DW. The question of Tabular vs Multidimensional model in SSAS has been well covered by others especially James Serra:
http://www.jamesserra.com/archive/2012/04/sql-server-2012-multidimensional-vs-tabular/
To get data into a DW format, you have many choices. You could use java to write modules that control sql scripts for example. Using SSIS is probably the best choice because it creates all the procedural flow for you and makes it easy to deploy and do source control. CDC is one option yes, you could use the CDC change tables as a data source for SSIS.
Another option I've seen is to use log shipping to the staging area and then populate the DW from there.
The ETL process is probably the most tricky because you are not only trying to do an incremental (delta) load into the DW, you are also taking the opportunity to clean, merge, filter, sort and reject data.
My advice to you starting out is to do it in small steps. A complete BI solution requires an enterprise DW (combine data from all systems, HRIS/CRM/ERP etc etc) and can take a very long time to complete and fine tune.
1.Basic) Reporting direct from OLTP production DB
2. Easy) Reporting from a mirror/ always on / other clone
3. Interim solution) Use a reporting database. A bunch of specific purpose reporting tables. I use one of these to report from an oracle prod DB. It's similar to a DW but not as structured.
4. Advanced) Use a DW
5. Pro) Use a cube
Be careful implementing a cube as a reporting solution. The wrong way to do it is to store every possible level of granularity and every possible dimension. The strength of cubes is in analysis. Cubes provide aggregations and the ability to slice and dice through accumulated data to create predictive models. Using a cube to run detailed operational reports will lose the performance benefits and you may as well report direct from the DW or source DB. By all means create SSRS reports against a cube, I do it all the time. I just find that MDX queries that start to get complex tend to run very slowly.
June 19, 2012 at 7:21 am
Wow thanks for the thorough replies and links! The only question is that in the link provided (http://www.jamesserra.com/archive/2012/04/sql-server-2012-multidimensional-vs-tabular/) it says that Tabular uses the existing data model and does not require creating a star schema. Can you expand on why a star schema is basically recommended instead of going directly off of OLTP?
June 19, 2012 at 5:58 pm
I'm probably not the expert to talk about the normalised vs star schema debate. It's a difference in philosophy. Bill Inmon is a proponent of the normalised data warehouse and Ralph Kimball is kind of the inventor of the denormalised star schema. They both have books and others have written about it.
Most people in practice these days somewhat follow the Kimball methodology.
From my point of view, when you are creating a multidimensional cube, you are untangling the relational database into defined, purposeful segments. These "dimensions" are things like product information or geographical information or date/time information against which you can analyse measures like sales figures. You have to sort those things out so that the cube works effectively so it makes sense to have that extra step in the middle where you denormalise the data into a star schema.
By using a staging area, you are also reducing impact on the production database. ETL generally involves a lot of slow queries so you have to think about the impact to production.
You also realise that operational databases generally accumulate a lot of junk and bad data in them. That can mostly be prevented by a good front end application, however, as DBAs we know the reality - garbage in garbage out. The ETL process of moving data from the operational database(s) to a data warehouse (or datamart etc) is a good way of selecting/rejecting, cleaning, filtering & sorting this bad data and also being selective about what data you keep and how you keep it.
You also have the option to decide how you will store historical data. There are complications arising from that such as the slowly changing dimension problem. http://en.wikipedia.org/wiki/Slowly_changing_dimension
Operational databases generally don't store historical data, it's usually current information.
SQL server 2012 is still quite new and I haven't had a chance to play with tabular model SSAS just yet so I won't comment on that.
Although, in that link to James Serra's blog there is a very interesting comment from Cathy Dumas about the star schema & tabular ssas model. Basically, it depends 😉
June 20, 2012 at 12:11 am
JJ-469859 (6/19/2012)
Wow thanks for the thorough replies and links! The only question is that in the link provided (http://www.jamesserra.com/archive/2012/04/sql-server-2012-multidimensional-vs-tabular/) it says that Tabular uses the existing data model and does not require creating a star schema. Can you expand on why a star schema is basically recommended instead of going directly off of OLTP?
A star schema is denormalized and allows thus for faster reading, as less joins are needed to fetch the data. The performance gain outweighs the storage loss because of the amount of deduplicated data.
A tabular model can be build upon a normalized schema, but some SSAS experts suggest that you still build the tabular model on top of a denormalized data warehouse, as this is much easier to understand by end users.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 20, 2012 at 7:14 am
Thanks everyone! This is perfect!
June 20, 2012 at 7:57 pm
You need to apply the appropriate tools and techniques for the problem at hand. Some major considerations are whether you are trying to integrate data from many disparate sources, is some of the data coming from "vendor controlled" databases, how much historical versus "real time" information is needed, etc.
You would do well to check into Data Vault Modeling. You can start at .
In the Kimbal vs Inmon debate, one thing that is often overlooked is that Kimbal is concerned with Events (aka observations), while Inmon is looking at Entities, Relationships and Attributes (ER modeling). If you keep this in mind, it makes it easier to use either or both techniques to advantage.
The Microsoft BI tools are slanted toward Kimbal (esp. SSAS), but can work for Inmon.
Finally, scalability in a complex environment is the reason to go with separate OLTP, DW(s) and data marts. You are best building in small, targeted project, but remembering that you're going to want to build out without creating a rat's nest of plumbing or monster databases.
Good luck.
David Lathrop
DBA
WA Dept of Health
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply