Power Query makes it easy to pull data from a web page. As easy as deflating a football. Poof. Gone. However, you’d need a few additional steps to iterate through and get data from multiple pages. Let’s take a look.
A use case of this case can be seen when I retrieved financials for restaurant stocks from a website.
Reading from a web page
First have the URL that contains the data you need handy. Then select “Get Data” and select “Web” from the “Other” category as shown below.
In the “Enter a Web Page URL” window, enter your URL. For example, I use “http://www.marketwatch.com/investing/stock/cmg/financials”. This URL shows financial data for the past five years for CMG, stock symbol for Chipotle Mexican Grill.
If the web page has multiple tables, they’ll be shown in the “Navigator” as below. Select the table you want – you can select more than one – and click “Load” at the bottom.
The table should appear as shown below.
You can add additional steps/transformations to the query to clean this table, if you want. For example, I’ll introduce the following steps:
- Remove the last column “5-year trend” as that’s unwanted
- Filter to include only “Sales/Revenue” row since this is the only row I want
- Rename the first column to “Metric”
After applying the above steps, my table looks like the following
Iterating over multiple pages
So far, we saw how to read and transform data from a web page. As you’ve seen earlier, the URL used here contains financial data for a stock. Now imagine you’ve to get similar data for multiple stocks. You can get this by supplying the correct stock symbol in the URL. In other words, replace “cmg” with the corresponding symbol.
Wrong way
One way to get data for multiple stocks is to create multiple queries – one query for each stock – and append all of them. Append works similar to Union All in T-SQL. But as you can probably guess, this method is inefficient and difficult to maintain because you’d have multiple queries basically created by copy-pasting and changing the symbol.
Correct way
An efficient alternative is to use a function and custom column. We need to convert the query we crated earlier to pull data from one web page into a function. To do this, add the following first line and replace “cmg” in the URL with the input argument.
Then, create another text file with the list of stocks that you need to pull data for. If you’re Power Query in Excel, you can create a new table instead of a text file. I’m working with Power BI Designer, hence I prefer a new text file. The contents of the text file are shown below:
Importing the text file using “Get Data” –> “CSV” will create a table as shown below.
Next, add a custom column by selecting “Add Column” –> “Add Custom Column”. And invoke the function we created earlier by passing the input argument “Ticker” as shown below.
Since we’re attempting to access external data, you’ll be prompted to configure privacy levels. Select the appropriate options and click continue.
After adding the custom column and invoking the function, your table should appear as shown below.
Finally, you can expand the Custom column to display data for all stocks.
Scripts
The following are the M scripts if you want to try this out.
Function: ReadMarketWatch
let
ReadMarketWatch = (ticker as text) as table =>
let
Source = Web.Page(Web.Contents(“http://www.marketwatch.com/investing/stock/”&(ticker)&”/financials”)),
Data0 = Source{0}[Data],
#”Changed Type” = Table.TransformColumnTypes(Data0,{{“Fiscal year is January-December. All values USD millions.”, type text}
, {“2010”, type text}, {“2011”, type text}, {“2012”, type text}, {“2013”, type text}, {“2014”, type text}
, {“5-year trend”, type text}}),
#”Removed Columns” = Table.RemoveColumns(#”Changed Type”,{“5-year trend”}),
#”Filtered Rows” = Table.SelectRows(#”Removed Columns”, each [#”Fiscal year is January-December. All values USD millions.”] = “Sales/Revenue”),
#”Renamed Columns” = Table.RenameColumns(#”Filtered Rows”,{{“Fiscal year is January-December. All values USD millions.”, “Metric”}})
in
#”Renamed Columns”
in
ReadMarketWatch
Full Query:
let
Source = Csv.Document(File.Contents(“C:\Users\sam\Downloads\rest\Tickers.txt”),[Delimiter=”,”,Encoding=1252]),
#”Changed Type” = Table.TransformColumnTypes(Source,{{“Column1”, type text}, {“Column2″, type text}}),
#”Promoted Headers” = Table.PromoteHeaders(#”Changed Type”),
#”Added Custom” = Table.AddColumn(#”Promoted Headers”, “Custom”, each ReadMarketWatch([Ticker])),
#”Expanded Custom” = Table.ExpandTableColumn(#”Added Custom”, “Custom”, {“Metric”, “2010”, “2011”, “2012”, “2013”, “2014”}, {“Custom.Metric”, “Custom.2010”, “Custom.2011”, “Custom.2012”, “Custom.2013”, “Custom.2014″})
in
#”Expanded Custom”
Summary
In this post, you learned how to iterate over multiple web pages when the data you need is spread across multiple pages. This example can also be used when you need to iterate over web pages using page numbers.