Data Visualisation for Beginners: How to create a Waterfall Chart in Python

A waterfall chart is a useful visual representation of changes in a value over time or through a series of events. In Python, we can create a waterfall chart using the matplotlib library. In this tutorial, we will walk you through the steps to create a waterfall chart in Python.
Step 1: Install Matplotlib
Before we can create a waterfall chart in Python, we need to install the matplotlib library. To do this, open your terminal or command prompt and enter the following command:
pip install matplotlib
Step 2: Import Libraries
Once we have installed matplotlib, we need to import it along with other libraries that we will use. In this example, we will use pandas and numpy to create our data. To import the libraries, use the following code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Step 3: Create Data
We need data to create a waterfall chart. In this example, we will create a simple data frame with five values. We will use the pandas library to create this data frame.
data = {'Category': ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5'],
'Value': [30, -20, 10, -5, 25]}
df = pd.DataFrame(data)
This code creates a data frame with two columns, ‘Category’ and ‘Value’, and five rows.
Step 4: Calculate Cumulative Sum
To create a waterfall chart, we need to calculate the cumulative sum of the values. We will use the numpy library to calculate the cumulative sum.
cumulative_sum = np.cumsum(df['Value'])
This code calculates the cumulative sum of the ‘Value’ column in our data frame.
Step 5: Create a Bar Chart
Next, we will create a bar chart using matplotlib. This will be the basis for our waterfall chart.
fig, ax = plt.subplots()
ax.bar(df['Category'], df['Value'], color='b', align='center')
ax.set_xlabel('Category')
ax.set_ylabel('Value')
ax.set_title('Waterfall Chart')
This code creates a bar chart with the ‘Category’ column on the x-axis and the ‘Value’ column on the y-axis.
Step 6: Add Waterfall Lines
To create the waterfall effect, we will add lines between the bars. We will use the axhline function to draw horizontal lines and the annotate function to add labels.
prev = 0
for i, val in enumerate(df['Value']):
if val < 0:
ax.axhline(y=cumulative_sum[i], color='r', linestyle='--')
ax.annotate(str(val), xy=(i, cumulative_sum[i]), xytext=(i+0.2, cumulative_sum[i]))
else:
ax.axhline(y=cumulative_sum[i], color='g', linestyle='--')
ax.annotate('+' + str(val), xy=(i, prev), xytext=(i+0.2, prev))
prev = cumulative_sum[i]
This code adds a red line and label for negative values and a green line and label for positive values. The prev variable keeps track of the previous cumulative sum so we can position the labels correctly.
Step 7: Show the Chart
Finally, we can display the waterfall chart using the show function.
plt.show()
This code displays the waterfall chart in a new window.
Final Code
Here’s the complete code to create a waterfall chart in Python:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = {'Category': ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5'],
'Value': [30, -20, 10, -5, 25]}
df = pd.DataFrame(data)
cumulative_sum = np.cumsum(df['Value'])
fig, ax = plt.subplots()
ax.bar(df['Category'], df['Value'], color='b', align='center')
ax.set_xlabel('Category')
ax.set_ylabel('Value')
ax.set_title('Waterfall Chart')
prev = 0
for i, val in enumerate(df['Value']):
if val < 0:
ax.axhline(y=cumulative_sum[i], color='r', linestyle='--')
ax.annotate(str(val), xy=(i, cumulative_sum[i]), xytext=(i+0.2, cumulative_sum[i]))
else:
ax.axhline(y=cumulative_sum[i], color='g', linestyle='--')
ax.annotate('+' + str(val), xy=(i, prev), xytext=(i+0.2, prev))
prev = cumulative_sum[i]
plt.show()

This code creates a simple waterfall chart with five values. You can customise the chart by changing the data or modifying the chart properties. With this tutorial, you should be able to create a waterfall chart in Python and add it to your data analysis toolkit.
Another Example:
Here’s another example of creating a waterfall chart in Python using a different data set:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create data
data = {'Category': ['Revenue', 'Cost of goods sold', 'Gross profit', 'Operating expenses', 'Net profit'],
'Value': [100000, -50000, '', -30000, '']}
df = pd.DataFrame(data)
# Calculate cumulative sum
cumulative_sum = np.cumsum(df['Value'].replace('', 0).astype(int))
# Create bar chart
fig, ax = plt.subplots()
ax.bar(df['Category'], df['Value'].replace('', 0).astype(int), color='b', align='center')
ax.set_xlabel('Category')
ax.set_ylabel('Value')
ax.set_title('Waterfall Chart')
# Add waterfall lines
prev = 0
for i, val in enumerate(df['Value'].replace('', 0).astype(int)):
if val < 0:
ax.axhline(y=cumulative_sum[i], color='r', linestyle='--')
ax.annotate('${:,.0f}'.format(val), xy=(i, cumulative_sum[i]), xytext=(i+0.2, cumulative_sum[i]))
else:
ax.axhline(y=cumulative_sum[i], color='g', linestyle='--')
ax.annotate('${:,.0f}'.format(val), xy=(i, prev), xytext=(i+0.2, prev))
prev = cumulative_sum[i]
plt.show()

In this example, we are creating a waterfall chart to show the financial performance of a company. We are using a data frame with five rows, each representing a category of financial data such as revenue, cost of goods sold, gross profit, operating expenses, and net profit. We are also using numpy to calculate the cumulative sum of the values and the axhline and annotate functions to create the waterfall lines and labels.
This example also includes formatting the chart to display dollar values and formatting the numbers with commas. You can use this example as a starting point to customise the chart to suit your data and visualisation needs.
If you like this article, please have a look at WACAMLDS. Thanking you very much for your time. Cheers!
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.