About 80% of the world’s population lives in about 13% of the countries. Okay, this statement is not as knee-jerking as I made it sound. Pareto Principle (also known as 80-20 rule, the law of the vital-few) states that, for many events, roughly 80% of effects come from 20% of the causes. Source: Wikipedia.
Some examples of Pareto Principle in businesses can be, 80% of profits come from 20% of customers, or 80% of sales are booked by 20% of employees, or 80% of revenue comes from 20% of products. Identifying these patterns helps businesses focus on the 20%, the powerful few, to maximize performance.
Let’s see how to create a Pareto chart using Power BI Desktop. You can download PBI Desktop for free if you want to play along.
Pareto Chart
There’re at least two versions of Pareto Chart.
The first one is a line and column chart. Line represents a cumulative total of the measure, and columns represent the different members. If you add up the value for each of the column, it’ll equal the end value of the line. In a typical Pareto chart, you should see a few members with very high values and several members with low value, as is the case in the below chart. This version of the chart is helpful as it tells the few powerful members that contribute the majority of the value, but you’d need to do calculations in your head to determine where 80% and 20% fall.
The second one is a simple line chart with % of total value on Y axis and % of members on X axis. This is visually easy to identify if the 80-20 rule holds true. As you can see, this doesn’t show the member names on columns and that’s fine because most of the time this version of Pareto chart is used during analysis to understand data distribution rather than on an actual dashboard.
Prepare Sample Data
To demonstrate creating these Pareto charts, I use population by country that’s available on Wikipedia. Do the following steps to get some useful data from a web page into Power BI model.
- In Power BI Desktop, click Get Data
- Select Web and enter the following URL: https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
- In the Navigator, select Countries and dependencies by population and click Ok
- You’ll see that the imported table contains headers, and same information is repeated in the first row. So, filter that row out by right-clicking on any column in the first row and select Text Filters and Does Not Equal
- Select Date, % of world, Source and select Remove Columns to remove these unwanted columns
- Double-click on column name Country (or dependent territory), type Country, and press enter to rename the column
- Select Rank and Population columns at the same time, and change Data Type to Whole Number in the Transform tab
Your table should look like the one below. Only a portion of the table is shown here.
Create Line and Column Pareto
For this chart, we need a running or cumulative total of population in the table. The following steps will add running total as a new column.
- Click Add Column in the Transform tab and select Add Custom Column
- Enter RunningPop as the column name and =Table.Range(#”Changed Type1″,0,[Rank]) as the formula. Table.Range function accepts table (result of #”Changed Type1″) as a parameter and returns a subset of rows (a number determined by Rank) starting at offset (0). For each row, this function returns all rows that have a Rank less than or equal to the Rank of the current row
- You should see a new column RunningPop created and the cell contains a Table described in the previous point
- Click on the Expand icon next to RunningPop, check the Aggregate radio button, and select only the Sum of Population column
- Double-click on the column name and type RunningPop to rename the column
- Select RunningPop and change the Data Type to Whole Number in the Transform tab
- Click Close & Load in the File menu
With this we’re ready to create the chart. If you remember from my last post, I prefer to first select the type of visualization I want and add fields to it. Select Line and Stacked Chart in the visualizations pane and Country to Axis and Population on Value, then sort on descending order of Population as shown below.
After the chart is sorted on Population, add cumulative total, RunningPop, we calculated earlier to Line Values. The fields pane should appear as shown below.
This completes creating the first version of the Pareto chart and it looks exactly like first image you saw above.
Create Line Pareto
We need two additional fields for the second chart: Cumulative percent share of total population and cumulative percent share of count of countries. What is Cumulative percent share? Say there’re rows like A = 40, B = 30, and C = 30. Percent share would be 40%, 30%, and 30% for A, B, and C respectively (value/total which is 100 time 100). Cumulative percent share would be 40% for A, 70% for B (A’s 40 + B’s 30), and 100% for C(A’s 40 + B’s 30 + C’s 30). In other words, we have to calculate the percent share of total and do a running total calculation on this.
Calculate Percent Share of Total
To calculate percent share of the total in the query editor using Power Query, we compute the grand total (total population) and divide population of each country by the grand total.
- Click Group By in the Transform tab. Don’t include any column under Group By. Select Sum for Operation and Population for Column
- You should the total population, TotalPop as a single cell
- Right-click on TotalPop and select Drill Down. This creates a single value that can be used in other steps. With this, we have the grant total and the full table. Now we can divide the value (each row in the table) with the grand total to get percent share. In my query, the names of the full table and grand total are #”Changed Type2″ and TotalPop
- Right-click the query, and in the Advanced Editor, add AddedPctShare = Table.AddColumn(#”Changed Type2″, “PctShare”, each [Population]/TotalPop) in the last line
- Change the Data Type of PctShare to Decimal Number
Your table should like this including PctShare in the right.
Now you can obtain cumulative percent share of PctShare by calculating running total for this field using the approach shown above.
The next step is to calculate the cumulative percent share of count of countries. You can repeat the same steps we used for cumulative percent share of population or you can cheat. I’ll cheat this time since you already know how to do these calculations. We know the total number of countries is 256 and Rank automatically gives me the running total for count of countries. So Add a Custom Column with % of Countries as the name, and =[Rank]/256 as the formula. When you add this field on the Y axis on a chart, It’ll automatically be summed, so we don’t need to calculate the running total for this.
This completes creating the two additional fields we needed and the table appears as shown below.
Click Close & Load. In the report view, ensure that these two fields are formatted as Percentage. Select Line Chart in the visualization pane. Add % of Countries on Axis and PctShare on Values as shown below.
We’ve completed creating the second version of the Pareto chart and it looks exactly like the second image you saw above.
Summary
Pareto Charts help identify the few powerful factors that contribute to majority of events in any business. In this post, you saw how to create Pareto Charts in Power BI. I know you paid close attention and also learned calculating running totals and percentage share of total using Power Query.
Check out my other posts of Power BI: http://samuelvanga.com/category/power-bi/. I think you’ll really like Baltimore Crime Report and Tracking S&P 500 using Power BI.