Introduction
I was asked to add a database with PolyBase External Tables to Always On Availability Group. I thought it would be a straightforward approach, but it got tricky when I learned that one of the prerequisites of PolyBase is that a master key is needed. This article describes how to add a database with PolyBase External Objects to an Availability Group.
Explanation
Our application databases were hosted on SQL Server 2016, and the application team used ETL jobs to import data from ODBC data sources, like Vertica, to SQL Server 2016. When the application team learned about the SQL Server 2019 features, like PolyBase, I was asked to upgrade the production 3-node cluster from SQL Server 2016 to 2019.
First, let me explain what Polybase is. Polybase is a feature that allows one to query external data sources using T-SQL. Starting in SQL Server 2019, Polybase can be used to query external data from ODBC compatible sources through the ODBC connector. This becomes handy when application joins data across multiple technologies using T-SQL.
Additionally, I was asked to add the database with PolyBase External Tables to an Always On Availability Group so that the external tables could be accessed from all 3 nodes.
Scenario
Firstly, I installed SQL Server 2019 on all three nodes with the required features, including PolyBase. After bringing up all the database services, I used the below query to see if PolyBase was successfully installed on all three nodes.
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
This sp_configure query returned run_value "1" which meant that PolyBase was enabled.
exec sp_configure @configname = 'PolyBase enabled'
I then created a database named 'testDB' that would host external tables.
CREATE DATABASE testDB GO
PolyBase requires a database scoped credential to create and access external objects. Before creating a database scoped credential, a master key must be created. The query shown here will create a master key on 'testDB' database with password 'password@123'.
USE testDB GO CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password@123';
A database scoped credential contains the username and password required to connect to the external data source. This username and password will be used to authenticate to the external data source. The query below creates a database scoped credential named 'testcredential' with user 'testuser' and password 'password@123'
CREATE DATABASE SCOPED CREDENTIAL testcredential WITH IDENTITY = 'testuser', Secret ='password@123';
Now, let us look at the creation of external data source. The query shown below connects to Vertica server named 'testserver1' and database named 'testDB' using the username 'testuser' and password 'password@123'. This query also specifies that a database credential named 'testcredential' is used for CREDENTIAL option.
Note: Please change these values (server name, username, database and password) according to your environment.
CREATE EXTERNAL DATA SOURCE Sample_ds WITH ( LOCATION = 'odbc://testserver1:5432' ,CONNECTION_OPTIONS = 'Driver={Vertica};UID=testuser;Database=testDB;Servername=testserver1;PWD=password@123; PreferredAddressFamily=none; ' ,PUSHDOWN = ON ,CREDENTIAL = testcredential );
After the successful creation of master key, database scoped credential and external data source, it's time to create external tables. Let us look at the below query which creates a sample external table named 'Table1' to fetch the data from 'Dim_Key' table in 'testDB' database on Vertica server.
CREATE EXTERNAL TABLE [Table1] ( [Key] [bigint] NULL, [Source] [nvarchar](50) NULL, [Id] [bigint] NULL ) WITH (DATA_SOURCE = [Sample_ds],LOCATION = N'[testDB].dbo.Dim_Key') GO
This query shows that the data is queryable from the external table created.
Here is the execution plan of the above select query. The first image below shows that the query is using a Remote Query and the second image shows its details. We can see that the Physical and Logical Operations are set to "Remote Query" and execution mode is "Row" based. The bottom highlighted portion of the second image indicates that the data is being queried from remote source directly.
Adding the Database to Always On Availability Group
Now that the database is ready to be added to Always On Availability Group, let us add the database 'testDB' to an existing Availability Group 'Test_AG'. When adding the database to Always On Availability Group, we can notice that the checkbox associated with 'testDB' is grayed out with “Password required” status. The password is required here because the database 'testDB' is encrypted by the Database Master Key.
After filling the master key password in the Password field, the status changed to “Meets prerequisites”. I then selected the database 'testDB' and clicked 'Next'.
To add the database to an Availability Group, we can either use the Automatic Seeding or Join only options. Let us proceed with Automatic Seeding option.
The below screenshot shows that the database 'TestDB' is now a part of AlwaysOn Availability Group.
After running a select on the external table 'Table1' on the primary replica, below was the output.
However, the same query on secondary replicas resulted with the below error message.
Msg 15581, Level 16, State 7, Line 1 Please create a master key in the database or open the master key in the session before performing this operation.
This error message means that the database master key is either not present or is not opened. In one of the earlier steps, we already created the database master key on the primary server and were able to access the data. However, a database master key is specific to a server and cannot be replicated across secondary nodes.
In order to add the database master key with same credentials as primary server on secondary servers, we need to execute the stored procedure, sp_control_dbmasterkey_password. This stored procedure creates the database master key credentials on secondary servers and we don't need to explicitly open master key as long as this master key value is present.
Exec sp_control_dbmasterkey_password @dbname='TestDB', @password='password@123', @action=N'add'
Here is the result on all secondaries after executing the stored procedure 'sp_control_dbmasterkey_password' .
Additionally, adding a password enabled database to an Availability Group can be done using the wizard or T-SQL. The only difference is that the wizard prompts for a password, and we cannot proceed without inputting one.
USE [master] GO ALTER AVAILABILITY GROUP [Test_AG] ADD DATABASE [testDB];
Conclusion
PolyBase is definitely a good feature to explore, and I am sure it saves a lot of time from writing ETL jobs. The ability to add this database to an Always On Availability Group saves the additional effort of maintaining external tables on multiple servers. I thought this was worth sharing with others as it would be helpful for those who try to add PolyBase External Objects to Availability Groups.