Massive parallel processing (MPP) is the future for data warehousing.
So what is MPP? SQL Server is a Symmetric Multiprocessing (SMP) solution, which essentially means it uses one server. MPP provides scalability and query performance by running independent servers in parallel. That is the quick definition. For more details, read What MPP means to SQL Server Parallel Data Warehouse.
Microsoft has an MPP appliance called the Analytics Platform System (APS). If you are building a new data warehouse that will be of any decent size in the next few years (i.e. 1TB or greater), it is really a “no brainer” to purchase a MPP solution over a SMP solution,
Looking at the value over time for a MPP appliance vs a SMP solution:
Comparison of Cumulative Cash Flows of Customer Experience Project using a Big Data Solution (MPP) vs. a Traditional Data Warehouse Appliance
Source: Wikibon 2011
The financial metrics of the two approaches were overwhelmingly in favor of the Big Data Solution (MPP) approach:
- Big Data Approach:
- Cumulative 3-year Cash Flow – $152M,
- Net Present Value – $138M,
- Internal Rate of Return (IRR) – 524%,
- Breakeven – 4 months.
- Traditional DW Appliance Approach:
- Cumulative 3-year Cash Flow – $53M,
- Net Present Value – $46M,
- Internal Rate of Return (IRR) – 74%,
- Breakeven – 26 months.
The bottom line is that for big data projects, the traditional data warehouse approach is more expensive in IT resources, takes much longer to do, and provides a less attractive return-on-investment.
Getting into the specific reasons to choose MPP (i.e. APS) over SMP when building a data warehouse:
- Be proactive instead of reactive (solve future performance problems now and not when they rear their ugly head)
- The hardware cost to upgrade to powerful servers that are clustered (for high availability) can be more than a quarter rack of APS (which has high availability built-in)
- You can use the same SQL Server EE licenses for APS
- You can use your current premium support – just add hours for APS
- So you may need to just need to purchase hardware and there are three vendors to choose from: HP, Dell, and Quanta. Each solution uses commodity hardware than can be de-racked and repurposed if for whatever reason the MPP solution did not work out
- It makes data warehouse design easier as you don’t need the typical “band aids” and work-arounds with a SMP solution to get the required performance: star schema’s, aggregate tables, cubes, data marts, multiple data warehouse servers, etc. This also reduces the complexity of ETL and therefore makes it easier to maintain
- Data warehouse development is quicker because of APS speed: http://www.jamesserra.com/archive/2014/10/non-obvious-apspdw-benefits/
- Knocks down any potential hardware barriers down the road as it is long-term solution where you can easily scale your hardware by sliding in a new rack instead of fork-lifting to a new server (i.e. purchase and build and tune, backup and restore to the new hardware, move over security, repoint users to the new server)
- You get 30x-100x performance improvement over SMP and when scaling your hardware you get linear performance benefits (i.e. double your hardware you double your performance) as opposed to the 20-30% performance improvement when fork-lifting to a new server
- Use can use your existing SQL Server skillset as APS is very much like SQL Server so little new training is needed
- You don’t have to say “no” anymore to end-users when they ask for more data because of the reasons: we don’t have room on the SAN, we can’t give you the query performance, we are bumping up into our maintenance window
- You need non-relational data down the road. Use can use Hadoop which is a platform designed and optimized for new forms of data, and then use PolyBase for easy access. You can create a data lake in Hadoop for the end-user to mine data and let you know what is useful
- If your data warehouse has been around a while this may be a good time to re-engineer and development will be so much quicker with MPP
- Even if you don’t have performance problems now but see the value in big data and analytics and want to be ready for it, especially IoT
- Tons of additional benefits that you don’t get with SMP: See Parallel Data Warehouse (PDW) benefits made simple
If you are currently housing a data warehouse on SMP, it will almost always be worth the migration effort to switch to a MPP solution. Remember that old saying: “pay me now or pay me later”!
But there are some reasons where MPP may not be a good fit or should be supplemented with a SMP solution and/or a SSAS cube:
- High volume transactional workloads (OLTP). MPP is for data warehousing (heavy reads and batch writes)
- Small company with small budget
- Thousands of concurrent users
- Super-fast dashboard query performance
- Chatty workload
- 24/7 SLA
- Need replication (true real-time updates)
Another benefit is APS is an appliance solution, meaning it is prebuilt with software, hardware, and networking components already installed. Think of it as “Big data in a box”. Some of the appliance benefits:
- Deploy in hours, not weeks
- Save time by implementing a turnkey solution complete with hardware and software
- Gain confidence and piece-of-mind by deploying a pre-tested and tuned data warehouse optimized for your specific workload
- Reduce operational costs and simplify management
- Reduce energy costs and environmental impact through balanced infrastructure and performance engineering
- Maximize reliability through the use of industry standard infrastructure and software components
- React quickly with unprecedented database agility
- No individualized patching of servers
- Much lower overall TCO
More info:
Financial Comparison of Big Data MPP Solution and Data Warehouse Appliance