It’s been a while since I typed up a blog post but life, even in the midst of a pandemic, has been busy and crazy. As you can imagine, that customers I support in the education industry have been forced to pivot on a dime in order to continue to operate. My customers, in school districts and higher education institutions, are proving to be nimble and adaptive by adopting Azure cloud technologies such as Power BI and the Azure data platform in a time of high stress.
But I wanted to share one of the neat things I’ve recently worked on with one of my academic medical customers. My customer came to me with a specific challenge related to their adoption of Power BI and I wanted to take a few minutes to walk through a summary of the challenge and how we solved the problem.
The Challenge
The customer is using Power BI to report on data from Service Now via APIs. So the customer was able to quickly connect Power BI to Service Now data and begin reporting on relevant datasets very quickly. The challenge, however, is that querying multiple years of data via the API was less than desirable for a variety of reasons.
The customer was interested in exploring the incremental refresh capabilities of Power BI, but were worried about using Power BI’s native incremental refresh capability since query folding (if you’re new to query folding, read this here) is not supported by Power BI’s web connector. So the customer team reached out to me with the challenge and I worked up an example that I think will help them meet their requirement.
The Solution
The solution I came up relies on the fact that the Service Now API accepts a date as a parameter. But since I don’t have access to Service Now, I used WeatherAPI.com for demonstration purposes. There are a few different APIs you can use, but in my example I’m using their history API. If you want to play along, you can register for a Free account at WeatherAPI.com which gives you access the 7 days worth of history of weather data (the docs say 30 days but I could only query 7 days worth of history).
Calling the Data Source API
The first step was to create a function in Power BI that would query the weather API and accept a date as a parameter. To query the API data source, we’ll use the Web.Contents function with the Query argument, and this is really important (Chris Webb has a very nice blog post on this that I found helpful).
Take a look at my query below and notice the Web.Contents function. Because I need to dynamically construct the URL including the query string parameters, using the Query argument allows me to dynamically add the parameters while still maintaining a valid base URL, which is important if you want Power BI to be able to validate the connection prior to the data refresh.
(dtquery as text) =>
let
Source = Xml.Tables(
Web.Contents("http://api.weatherapi.com/v1/history.xml",
[Query=])),
#"Pivoted Column" = Table.Pivot(Source, List.Distinct(Source[Name]), "Name", "Table"),
#"Expanded location" = Table.ExpandTableColumn(#"Pivoted Column", "location", {"name", "region", "country", "lat", "lon", "tz_id", "localtime_epoch", "localtime"}, {"location.name", "location.region", "location.country", "location.lat", "location.lon", "location.tz_id", "location.localtime_epoch", "location.localtime"}),
#"Expanded forecast" = Table.ExpandTableColumn(#"Expanded location", "forecast", {"Table"}, {"Table"}),
#"Expanded Table" = Table.ExpandTableColumn(#"Expanded forecast", "Table", {"hour"}, {"hour"}),
#"Expanded hour" = Table.ExpandTableColumn(#"Expanded Table", "hour", {"time_epoch", "time", "temp_c", "temp_f", "is_day", "condition", "wind_mph", "wind_kph", "wind_degree", "wind_dir", "pressure_mb", "pressure_in", "precip_mm", "precip_in", "humidity", "cloud", "feelslike_c", "feelslike_f", "windchill_c", "windchill_f", "heatindex_c", "heatindex_f", "dewpoint_c", "dewpoint_f", "will_it_rain", "chance_of_rain", "will_it_snow", "chance_of_snow", "vis_km", "vis_miles", "gust_mph", "gust_kph"}, {"time_epoch", "time", "temp_c", "temp_f", "is_day", "condition", "wind_mph", "wind_kph", "wind_degree", "wind_dir", "pressure_mb", "pressure_in", "precip_mm", "precip_in", "humidity", "cloud", "feelslike_c", "feelslike_f", "windchill_c", "windchill_f", "heatindex_c", "heatindex_f", "dewpoint_c", "dewpoint_f", "will_it_rain", "chance_of_rain", "will_it_snow", "chance_of_snow", "vis_km", "vis_miles", "gust_mph", "gust_kph"}),
#"Expanded condition" = Table.ExpandTableColumn(#"Expanded hour", "condition", {"text"}, {"text"})
in
#"Expanded condition"
When the Power BI dataset is refreshed, the query string parameters are added to the base URL. I can see this if I take a look at the query diagnostics for the steps that invokes the function. For the weather API, I need to include the following parameters:
1. key = API key
2. q = location zip code
3. dt = date
Notice the parameters “key”, “q”, and “dt” being added to the base URL with corresponding values.
With the function created, I’m ready to create my query that will call my function and create a table in my Power BI data model that will be configured for incremental refresh.
Building the Query & Calling the Function
First, I connected to a table in an Azure SQL Database that contains a list of dates and the corresponding year, month, and day values. Part of the reason I’m using Azure SQL as a data source is because it supports query folding.
Next I created the RangeStart and RangeEnd parameters which are required for using Power BI’s incremental refresh capabilities. Be aware that these parameters should be Date/Time type. Then I applied a Custom filter to the Date column based on the RangeStart and RangeEnd parameters. Be aware that the datatypes (Date/Time) for the parameter and the column you’re filtering should both match.
When applying the custom filter, ensure that you do not use “is after or equal to” and “is before or equal to” at the same time as this can cause you to have duplicated data in adjacent partitions, as noted in the docs.
Also, as a side note, I created a parameter to hold my API key which you saw referenced in the function we first discussed.
I perform a few transformations to construct the value that will be passed to the previously mentioned function’s “dt” parameter.
Then I’m ready to invoke my custom function I created in the first step. After I invoke the function and expand the contents, I end up with a view of the relevant weather data.
I clicked Close and Apply and the data is loaded into the model.
Configure the Incremental Refresh Policy
Now I’m ready to configure incremental refresh. I won’t go through the steps of configuring incremental refresh but it’s clearly documented here. In the below screenshot, you’ll see how I configured my incremental refresh policy. I want to keep 7 days worth of history and only refresh the 3 days previous to the current day. This means that when I first publish the dataset, Power BI will refresh all 7 days worth of data to load the dataset. Then on subsequent refreshes, only the past 3 days will be refreshed. Days older than 7 days will be dropped from the dataset.
Notice the warning above which states “Unable to confirm if the M query can be folded. It is not recommended to use incremental refresh with non-foldable queries.” I suspect this warning is appearing because of the function we’re using to call the API. In these scenarios, the documentation states that we should validate that the incremental refresh is occurring correctly and data is being incrementally refreshed. In a few moments, I’ll show you how I validated only the past 3 days were being refreshed.
Publishing and Refreshing the Data
Last, I published the data model and report to the Power BI service. In this example, I’m publishing to a workspace using Power BI Premium capacity so I can connect to the workspace with SQL Server Management Studio and view the partitions that will be created by Power BI.
Now that the data model and report are published, I can see the report. At this point I only have one day’s worth of data.
To load the previous 7 days worth of data, I need to update the data source credentials. To update the data source credentials, go the settings for the dataset and click Edit credentials next to each data source in the Data source credentials menu.
For my SQL Database I entered my SQL credentials and for the Web data source I used anonymous authentication but clicked the checkbox to Skip test connection because the connection will fail until the API key is provided (which happens in the function at the time of querying).
Verify the Incremental Refresh is Working
Because this workspace is on my Premium capacity, I can enable the XMLA endpoints to allow connectivity to the workspace with other tools. So before we process the data model, lets connect to the workspace and take a look at the partitions with SSMS.
As you can see, the WeatherData table only has the single, default partition. Now lets go back to Power BI and kick off a refresh of the dataset. Once the refresh completes, lets re-check the partitions in the WeatherData table in SSMS. You can see that we now have 8 partitions (7 days of history + the current day) and each partition has 24 rows. Power BI has created these partitions based on the incremental refresh policy we defined in the previous steps. Cool stuff!
What do you think will happen now if I refresh the data again? Lets find out! After the data refresh completes, I check the Last Processed timestamp for the partitions. Notice that only the last 3 partitions have been processed based on the timestamp! This indicates to me that my incremental refresh policy is now working correctly. I’m ready to schedule the data refresh in Power BI based on my requirements.
I can also check the Analytics on my API calls against WeatherAPI.com and see the additional API calls show up.
The Summary
In this blog post, I shared some steps for incrementally refreshing data in Power BI with data sources from a web API. If you’d like to take a look at my Power BI template that includes the logic, you download the .pbit file here.
Here are some of the resources I shared inline the blog post.
Chris Webb’s blog post on Web.Contents: https://blog.crossjoin.co.uk/2016/08/16/using-the-relativepath-and-query-options-with-web-contents-in-power-query-and-power-bi-m-code/
Miguel Escobar’s blog post on incremental refresh with SharePoint Online data source, which I found helpful because it follows a similar approach: https://www.poweredsolutions.co/2020/01/19/incremental-refresh-for-files-in-a-folder-or-sharepoint-power-bi/
What is Power BI Premium? https://docs.microsoft.com/en-us/power-bi/admin/service-premium-what-is
Incremental refresh with Power BI: https://docs.microsoft.com/en-us/power-bi/admin/service-premium-incremental-refresh
Dataset connectivity with the XMLA endpoint: https://docs.microsoft.com/en-us/power-bi/admin/service-premium-connect-tools
Recording Query Diagnostics in Power BI: https://docs.microsoft.com/en-us/power-query/recordingquerydiagnostics
Download SQL Server Management Studio: https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-ver15
Configure schedule refresh with Power BI: https://docs.microsoft.com/en-us/power-bi/connect-data/refresh-scheduled-refresh
WeatherAPI.com: https://www.weatherapi.com/
I hope you found this blog post useful! If you did, leave a comment and let me know and share it with your friends and teammates!