# Data Visualisation for Beginners: How to create a Waterfall Chart in Python

A waterfall chart is a useful visual representation of changes in a value over time or through a series of events. In Python, we can create a waterfall chart using the matplotlib library. In this tutorial, we will walk you through the steps to create a waterfall chart in Python.

## Step 1: Install Matplotlib

Before we can create a waterfall chart in Python, we need to install the matplotlib library. To do this, open your terminal or command prompt and enter the following command:

``pip install matplotlib``

## Step 2: Import Libraries

Once we have installed matplotlib, we need to import it along with other libraries that we will use. In this example, we will use pandas and numpy to create our data. To import the libraries, use the following code:

``````import matplotlib.pyplot as plt
import pandas as pd
import numpy as np``````

## Step 3: Create Data

We need data to create a waterfall chart. In this example, we will create a simple data frame with five values. We will use the pandas library to create this data frame.

``````data = {'Category': ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5'],
'Value': [30, -20, 10, -5, 25]}
df = pd.DataFrame(data)``````

This code creates a data frame with two columns, ‘Category’ and ‘Value’, and five rows.

## Step 4: Calculate Cumulative Sum

To create a waterfall chart, we need to calculate the cumulative sum of the values. We will use the numpy library to calculate the cumulative sum.

``cumulative_sum = np.cumsum(df['Value'])``

This code calculates the cumulative sum of the ‘Value’ column in our data frame.

## Step 5: Create a Bar Chart

Next, we will create a bar chart using matplotlib. This will be the basis for our waterfall chart.

``````fig, ax = plt.subplots()
ax.bar(df['Category'], df['Value'], color='b', align='center')
ax.set_xlabel('Category')
ax.set_ylabel('Value')
ax.set_title('Waterfall Chart')``````

This code creates a bar chart with the ‘Category’ column on the x-axis and the ‘Value’ column on the y-axis.

## Step 6: Add Waterfall Lines

To create the waterfall effect, we will add lines between the bars. We will use the axhline function to draw horizontal lines and the annotate function to add labels.

``````prev = 0
for i, val in enumerate(df['Value']):
if val < 0:
ax.axhline(y=cumulative_sum[i], color='r', linestyle='--')
ax.annotate(str(val), xy=(i, cumulative_sum[i]), xytext=(i+0.2, cumulative_sum[i]))
else:
ax.axhline(y=cumulative_sum[i], color='g', linestyle='--')
ax.annotate('+' + str(val), xy=(i, prev), xytext=(i+0.2, prev))
prev = cumulative_sum[i]``````

This code adds a red line and label for negative values and a green line and label for positive values. The prev variable keeps track of the previous cumulative sum so we can position the labels correctly.

## Step 7: Show the Chart

Finally, we can display the waterfall chart using the show function.

``plt.show()``

This code displays the waterfall chart in a new window.

## Final Code

Here’s the complete code to create a waterfall chart in Python:

``````import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

data = {'Category': ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5'],
'Value': [30, -20, 10, -5, 25]}
df = pd.DataFrame(data)

cumulative_sum = np.cumsum(df['Value'])

fig, ax = plt.subplots()
ax.bar(df['Category'], df['Value'], color='b', align='center')
ax.set_xlabel('Category')
ax.set_ylabel('Value')
ax.set_title('Waterfall Chart')

prev = 0
for i, val in enumerate(df['Value']):
if val < 0:
ax.axhline(y=cumulative_sum[i], color='r', linestyle='--')
ax.annotate(str(val), xy=(i, cumulative_sum[i]), xytext=(i+0.2, cumulative_sum[i]))
else:
ax.axhline(y=cumulative_sum[i], color='g', linestyle='--')
ax.annotate('+' + str(val), xy=(i, prev), xytext=(i+0.2, prev))
prev = cumulative_sum[i]

plt.show()``````

This code creates a simple waterfall chart with five values. You can customise the chart by changing the data or modifying the chart properties. With this tutorial, you should be able to create a waterfall chart in Python and add it to your data analysis toolkit.

## Another Example:

Here’s another example of creating a waterfall chart in Python using a different data set:

``````import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create data
data = {'Category': ['Revenue', 'Cost of goods sold', 'Gross profit', 'Operating expenses', 'Net profit'],
'Value': [100000, -50000, '', -30000, '']}
df = pd.DataFrame(data)
# Calculate cumulative sum
cumulative_sum = np.cumsum(df['Value'].replace('', 0).astype(int))
# Create bar chart
fig, ax = plt.subplots()
ax.bar(df['Category'], df['Value'].replace('', 0).astype(int), color='b', align='center')
ax.set_xlabel('Category')
ax.set_ylabel('Value')
ax.set_title('Waterfall Chart')
prev = 0
for i, val in enumerate(df['Value'].replace('', 0).astype(int)):
if val < 0:
ax.axhline(y=cumulative_sum[i], color='r', linestyle='--')
ax.annotate('\${:,.0f}'.format(val), xy=(i, cumulative_sum[i]), xytext=(i+0.2, cumulative_sum[i]))
else:
ax.axhline(y=cumulative_sum[i], color='g', linestyle='--')
ax.annotate('\${:,.0f}'.format(val), xy=(i, prev), xytext=(i+0.2, prev))
prev = cumulative_sum[i]
plt.show()``````

In this example, we are creating a waterfall chart to show the financial performance of a company. We are using a data frame with five rows, each representing a category of financial data such as revenue, cost of goods sold, gross profit, operating expenses, and net profit. We are also using numpy to calculate the cumulative sum of the values and the axhline and annotate functions to create the waterfall lines and labels.

This example also includes formatting the chart to display dollar values and formatting the numbers with commas. You can use this example as a starting point to customise the chart to suit your data and visualisation needs.

# Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

## Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

# Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

`Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.`

# Learn by Coding: Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!