June 17, 2011 at 8:37 am
I am tasked with setting up a small data mart on a SQL instance apart from the operational database. The intent is to provide a small group of business power users a place where they can run ad hoc queries in SSMS and create their own SSRS reports.
They will need tables containing denormalized versions of the data in the operational database and I am evaluating my options for building the data mart.
Has anyone ever replicated VIEWs to send data to a data mart? How does that method stack up against creating an SSIS package to move the data? I am just looking for comments about these and any other strategies on how to populate the data mart in the least painful way possible.
Thanks.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 17, 2011 at 9:08 am
opc.three (6/17/2011)
I am tasked with setting up a small data mart on a SQL instance apart from the operational database. The intent is to provide a small group of business power users a place where they can run ad hoc queries in SSMS and create their own SSRS reports.They will need tables containing denormalized versions of the data in the operational database and I am evaluating my options for building the data mart.
Has anyone ever replicated VIEWs to send data to a data mart? How does that method stack up against creating an SSIS package to move the data? I am just looking for comments about these and any other strategies on how to populate the data mart in the least painful way possible.
I would set a proper ETL process, denormalization process may involve the population of staging tables before loading FACT and DIM tables on the Datamart side. I'm not really sure how "views" will play a role on that, a SSIS packages based solution usually does the trick.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.June 17, 2011 at 9:30 am
PaulB-TheOneAndOnly (6/17/2011)
I would set a proper ETL process...
Define proper? In what way is a replicated VIEW improper?
...SSIS packages based solution usually does the trick.
This is an option. The SSIS package could look like this:
1) Open connection to operational DB and issue the same query that would be in the VIEW to get data.
2) Open a connection to the data mart and load the data into a staging table.
3) Open a connection to the data mart and issue a MERGE statement to sync the source data with the destination data.
There are a number of ways to do an SSIS package. Please convince me replication is no better than an SSIS-based ETL process...and if you think SSIS is better what would the SSIS process look like?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 17, 2011 at 11:52 am
Still not sure how a view would add any value to an ETL process.
In regards to a proper ETL process, here is how I see it.
--- Database Architecture
1-- Source Database - Transactional, Operational database.
2-- Data Warehouse Database.
2.1 Staging - Staging truncate/load tables.
2.2 Datamart - FACTual and DIMensional tables.
--- ETL Architecture (for each ETLable table)
(E)xtract
- Identify needed rows on source table
- Truncate Staging table.
- Copy identified rows into Stating table.
(T)ransform
- Massage Staging data as needed.
Note: Depending on complexity this can either be done during Extract phase, after Extract is done on top of same Staging table or after Extract is done on top of a second set of Staging tables with the same structure FACT and DIM tables have.
(L)oad
- Merge Staging - already transformed data - into Datamart's FACT and DIM tables.
Hope this helps.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.June 17, 2011 at 12:10 pm
I think we're having a terminology disconnect here...
I am talking about using SQL Server transaction replication to replicate the resultset of a VIEW from a publisher (the operational database) to a subscriber database (the data mart) as opposed to having an SSIS package (or some other process) copy the data (either using the view or the raw select statement that would be in the view) from the operational database to the data mart.
This article says you can replicate "Indexed Views as Tables":
http://msdn.microsoft.com/en-us/library/ms152559.aspx
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 17, 2011 at 12:45 pm
opc.three (6/17/2011)
I think we're having a terminology disconnect here...I am talking about using SQL Server transaction replication to replicate the resultset of a VIEW from a publisher (the operational database) to a subscriber database (the data mart) as opposed to having an SSIS package (or some other process) copy the data (either using the view or the raw select statement that would be in the view) from the operational database to the data mart.
oh yes, you are correct.
I would go with replication IF business specs are asking for a near-real-time Datamart, if business specs are Okay with let's say a once-a-day Datamart load then I'll lean towards a classical ETL process driven by some sort of scheduled-batch job, in this context SSIS packages are a good solution.
Moreover, most ETL processes ask for identifying selected rows on operational database and transport that data into the Datamart, like new Customers and changes to existing Customers rather than replicating a whole table - like Customers table.
The use of replication as part of the ETL process may result in adding a layer of complexity to the operational OLTP system so to allow replication to move only the data that is needed on the Datamart side. Either way, data on the Datamart side will most likely be in star-schema rather than in 3NF as most likely the operational data is.
You can call me either traditionalist or plain lazy but, if no near-real-time Datamart updates are needed I would go with the proven, easy to maintain batch ETL process approach.
Others may dissagree so, this is open for comments 🙂
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply