Power BI and Excel options for Hadoop

Below I have attempted to list the various options for reporting off of Hadoop (HDInsight, HDP, Cloudera) using Power BI Desktop and Excel. Some of the data sources prompt you to choose the Data Connectivity mode of either Import or DirectQuery. For those that don’t you are limited to Import mode:

- Power BI via Microsoft Azure HDInsight Spark (Import or DirectQuery) This is in Beta. This works if HDInsight is using Blob storage or Azure Data Lake Storage (ADLS)
- Power BI via Microsoft Azure HDInsight (Import). This works if HDInsight is connected to Blob storage. If HDInsight is connected to ADLS, use the “Power BI via HDFS” connector instead
- Power BI via Spark (Import or DirectQuery) This is in Beta. This is for HDP/Cloudera and uses a different authentication level then the HDInsight Spark connector. NOTE: Cloudera recommends against using SparkSQL with CDH. SparkSQL uses a component called thrift server which is not supported by Cloudera (see here). For interactive applications they recommend Impala. There strategy is described here.
- Power BI via Impala (Import or DirectQuery). This is in Beta
- Power BI via SQL DW using PolyBase (Import). Note that DirectQuery is an option, but not supported with SQL DW so will import the data instead, which could greatly slow query performance
- Power BI via SQL Server 2016 using PolyBase (Import for HDInsight, DirectQuery creates MapReduce job in cluster for Cloudera/HDP, other sources will import the data instead)
- Power BI via HDFS (Import). This uses WebHDFS.
- Power BI via ODBC (if Hive ODBC, creates Hive job in cluster, so can be slow. For very fast querying, use Hive LLAP ODBC)

- Excel via Data Connection Wizard -> ODBC DSN -> Hive ODBC (Import)
- Excel via SQL Server 2016 using PolyBase (Import)
- Excel via PowerQuery/Hadoop File (HDFS) (Import). This uses WebHDFS
- Excel via PowerQuery/ODBC (Import)
- Excel via PowerQuery/SQL Server 2016 using PolyBase (Import)
- Excel via PowerQuery/Microsoft Azure HDInsight (Import)

“Import” mode requires all data from Hadoop to be copied into Power BI/Excel and then filtered. So import will first load your data (“snapshot”) into a local Power BI model and then run reports against that data, which means that interactive visualizations are not live but rather using data from the last snapshot.

With “DirectQuery” mode, the query is passed down into the data source and only the results are returned back (see Use DirectQuery in Power BI Desktop). So when using DirectQuery you are using live data. Note that Power BI Q&A and Quick Insights is not available for DirectQuery datasets (see here for other limitations).

EXCEPTION: If in Power BI you use the “Edit Queries” option, it will actually try to push down the query if you choose “Import” mode. So the difference in this case from DirectQuery mode is that Import will fallback to running locally if it can’t execute part of the query on the server, while DirectQuery will tell you the query isn’t supported. So an Import can take hours instead of failing fast like DirectQuery will do.

By using PolyBase, you use a CREATE EXTERNAL TABLE to point to the data stored in a Hadoop cluster. Thereafter, the Hadoop data is seen as a regular table, hiding complexity from the end user.

I also wanted to mention that Power BI can connect to Azure Blob Storage and Azure Data Lake Store (beta), and Excel can connect to Azure Blob Storage.

There are many more data sources that Power BI Desktop can connect to.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Power BI and Excel options for Hadoop

Rate

Share

Share

Rate

Power BI and Excel options for Hadoop

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts