Data Visualisation for Beginners: How to create a Waterfall Chart in R

A waterfall chart is a useful tool for visualizing how an initial value is affected by a series of intermediate values that can be either positive or negative. In this tutorial, we will explore how to create a waterfall chart in R using the ggplot2
library.
Step 1: Load Required Libraries
Before we start creating our waterfall chart, we need to load the required libraries. In this tutorial, we will use the following libraries:
ggplot2
dplyr
You can install these libraries using the following commands:
install.packages("ggplot2")
install.packages("dplyr")
Step 2: Load and Prepare Data
We will use a sample dataset to create our waterfall chart. Here is an example dataset:
Month,Revenue,Cost,Profit
January,1000,400,600
February,1200,500,700
March,800,300,500
April,900,400,500
May,1500,700,800
June,1300,600,700
July,1100,500,600
August,1000,400,600
September,1200,500,700
October,800,300,500
November,900,400,500
December,1500,700,800
R Codes for the dataframe:
df <- data.frame(Month = c("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"),
Revenue = c(1000, 1200, 800, 900, 1500, 1300, 1100, 1000, 1200, 800, 900, 1500),
Cost = c(400, 500, 300, 400, 700, 600, 500, 400, 500, 300, 400, 700),
Profit = c(600, 700, 500, 500, 800, 700, 600, 600, 700, 500, 500, 800))
The dataset has four columns: Month
, Revenue
, Cost
, and Profit
. We will use the Profit
column to create our waterfall chart.
Next, we need to prepare the data for the waterfall chart. We will create a new data frame that contains the change in profit between each month. Here is the code to do that:
library(dplyr)
df <- df %>%
mutate(change = c(Profit[1], diff(Profit)),
cumulative = cumsum(change),
start = cumulative - change)
In this code, we used the mutate
function from the dplyr
library to create three new columns: change
, cumulative
, and start
. The change
column is created using the diff
function to calculate the difference in profit between each month. We use the cumsum
function to create the cumulative
column, which contains the running total of the changes in profit. Finally, we calculate the starting value of each bar by subtracting the change in profit from the cumulative total.
Step 3: Create the Waterfall Chart
Now that we have prepared the data, we can create the waterfall chart. We will use the ggplot2
library to create the chart. Here is the code to create the chart:
library(ggplot2)
ggplot(df, aes(x = Month, y = change, fill = factor(change > 0))) +
geom_bar(stat = "identity", position = "identity", color = "black") +
scale_x_discrete(limits=df$Month) +
geom_line(aes(x = Month, y = cumulative), color = "blue") +
scale_fill_manual(values = c("#FF7F7F", "#7FBFFF")) +
theme_bw() +
labs(x = "Month", y = "Profit", title = "Waterfall Chart")

Let’s break down this code:
- We used
ggplot
to create the plot and specify thex
andy
aesthetics, as well as thefill
aesthetic for the bars. - We used
geom_bar
to create the bars and set thestat
parameter to “identity” so that the heights of the bars correspond to the values in thechange
column. We also set theposition
parameter to “identity” to place the bars directly on the x-axis. We set thecolor
parameter to “black” to add a border to the bars. - We used
geom_line
to create the line for the cumulative values, and mapped thex
andy
aesthetics to theMonth
andcumulative
columns of the data frame, respectively. We set thecolor
parameter to “blue” to color the line. - We used
scale_fill_manual
to set the colors of the bars based on whether they represent a positive or negative change in profit. - We used
theme_bw
to set the background to white and usedlabs
to add labels for the x-axis, y-axis, and chart title.
Step 4: Customize the Waterfall Chart
You can customize the waterfall chart by adjusting the colors, font sizes, and other properties. Here is an example code to change the colors of the bars and the line, and increase the font size of the labels:
ggplot(df, aes(x = Month, y = change, fill = factor(change > 0))) +
geom_bar(stat = "identity", position = "identity", color = "black") +
scale_x_discrete(limits=df$Month) +
geom_line(aes(x = Month, y = cumulative), color = "#3F51B5", size = 1.5) +
scale_fill_manual(values = c("#FF7043", "#4CAF50")) +
theme_bw() +
theme(text = element_text(size = 16)) +
labs(x = "Month", y = "Profit", title = "Waterfall Chart: Profit by Month")

In this example, we have:
- Changed the colors of the bars and the line using the
scale_fill_manual
andcolor
parameters of thegeom_line
function, respectively. - Increased the size of the line using the
size
parameter of thegeom_line
function. - Changed the font size of the labels using the
theme
function and theelement_text
function with thesize
parameter.
You can further customize the chart by adjusting other properties such as the width of the bars, the placement of the axis ticks, and the font family.
And that’s it! You now know how to create a waterfall chart in R using the ggplot2
library.
Another Example of Waterfall Chart in R
Here’s another example of creating a waterfall chart in R, this time using a different dataset:
library(dplyr)
library(ggplot2)
# Create sample data
data <- data.frame(Category = c("A", "B", "C", "D", "E"),
Start = c(0, 50, 80, 100, 120),
Increase = c(40, 30, 20, 10, 50),
Decrease = c(10, 20, 10, 20, 5))
# Prepare data for waterfall chart
data <- data %>%
mutate(End = Start + Increase - Decrease) %>%
select(Category, Start, End)
# Create waterfall chart
ggplot(data, aes(x = Category, y = End - Start, fill = End - Start > 0)) +
geom_col(position = "identity", color = "black") +
geom_text(aes(label = End), vjust = ifelse(data$End - data$Start > 0, -0.5, 1.5)) +
geom_text(aes(label = Start), vjust = ifelse(data$End - data$Start > 0, 1.5, -0.5)) +
coord_flip() +
scale_fill_manual(values = c("#7FBFFF", "#FF7F7F")) +
theme_bw() +
labs(x = "", y = "Value", title = "Waterfall Chart")

In this example, we are using a sample dataset that contains data for five categories, with the starting value, increase, and decrease for each category. The process for creating the waterfall chart is as follows:
- We prepare the data for the waterfall chart by calculating the ending value for each category based on the starting value, increase, and decrease. We then select the columns we need for the chart, which are
Category
,Start
, andEnd
. - We create the waterfall chart using the
ggplot2
library. We specify thex
andy
aesthetics, and usegeom_col
to create the bars. We set theposition
parameter to “identity” to place the bars directly on the y-axis, and set thecolor
parameter to “black” to add a border to the bars. - We use
geom_text
to add labels for the starting and ending values of each category. We useifelse
to determine the placement of the labels, based on whether the change in value is positive or negative. - We use
coord_flip
to rotate the chart by 90 degrees so that the categories are shown on the y-axis and the values are shown on the x-axis. - We use
scale_fill_manual
to set the colors of the bars based on whether they represent a positive or negative change in value. - We use
theme_bw
to set the background to white and usedlabs
to add labels for the y-axis and chart title. We set the x-axis label to an empty string, since we don’t need it for this chart.
Note that the formatting options such as the font size, axis ticks, and font family can be customized to suit your needs.
If you like this article, please have a look at WACAMLDS. Thanking you very much for your time. Cheers!
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.