What are we talking about?
Recently Microsoft released a new version of SQL Server 2008 R2 called Parallel Data Warehouse Edition. there has been a lot of buzz about this new architecture because it is Microsoft’s entry into the Massive Parallel Processing (MPP) Scale out data warehousing arena. Typically Microsoft has offered SQL Server in a SMP or Symmetric Multi-processing architecture where all the CPUS memory and storage are in one physical architecture while the database operations take place entirely within on instance of SQL Server.
Parallel Data Warehousing or PDW is an appliance based architecture that provides significant scale out capabilities based on the technology the acquired from Datallegro Corp in 2008. This MPP architecture provides more scalable and predictable performance for significantly greater workloads up into the 100's of terabytes. Microsoft’s implementation is particularly exciting because PDW provides a much lower cost per terabyte since you can implement it with commodity hardware instead of a proprietary system like Teradata or Neteeza
PDW Works by controlling several different physical servers each running their own instance of SQL Server 2008 R2. The database and it’s tables are spread across these physical servers but appear as one database and table(s) to the end user. The appliance or brain of the PDW manages query execution and the meta data for what is stored and processed on what portion of the PDW.
See an overview of this in the diagram below.
Why do we need this?
PDW is important because it opens the possibilities of large scale data processing in a much more economical package? How economical you ask? Well let’s say I won’t have one in my garage. The price tag is still around $1mUS to get going, but that is a lot of hardware, licensing and processing power for the money. The entry level package is two racks of gear including storage, network, it’s own domain controller etc.. We’ll talk more about the architecture in the next article, for now we want to focus on what PDW is and why its so exciting!
Business these days are processing large volumes of data and the definition of “large volume of data” grows every day. With PDW now the SQL server community has a comparable architecture to that of Teradata or Neteeza. The major difference though is instead of adding a node to Teradata for approximately $850K, the PDW uses HP and Dell off the self hardware making expansion much more cost effective and drastically increasing ROI.
This introduction is the kick-start to a series I’m writing covering all aspects of the new PDW. If you have areas you would like to know more about, please email me or post comments so I can make sure to follow up with the product teams to get your more information. We are excited to be one of the few partners already working with PDW so we want to help you understand and see how this package could be beneficial for your enterprise.
Upcoming articles in the series
1. Architecture overview of PDW
2. Intro to new PDW Objects and Schema Features
3. Working with PDW Database Objects
4. Partitioning with PDW and Querying your PDW
5. Working with PDW Databases
6. How to get PDW in your environment
7. Fast Track Architecture vs. PDW
Thanks for checking out this introduction to the series. Please post comments and feel free to email me with questions. In the meantime, check out Microsoft’s PDW site for SQL Server at:
http://www.microsoft.com/sqlserver/2008/en/us/parallel-data-warehouse.aspx
Thanks for stopping in. Keep making your business intelligent!
Adam