Data Visualisation for Beginners: How to create a Waterfall Chart in R

Data Visualisation for Beginners: How to create a Waterfall Chart in R

A waterfall chart is a useful tool for visualizing how an initial value is affected by a series of intermediate values that can be either positive or negative. In this tutorial, we will explore how to create a waterfall chart in R using the ggplot2 library.

Step 1: Load Required Libraries

Before we start creating our waterfall chart, we need to load the required libraries. In this tutorial, we will use the following libraries:

  • ggplot2
  • dplyr

You can install these libraries using the following commands:

install.packages("ggplot2")
install.packages("dplyr")

Step 2: Load and Prepare Data

We will use a sample dataset to create our waterfall chart. Here is an example dataset:

Month,Revenue,Cost,Profit
January,1000,400,600
February,1200,500,700
March,800,300,500
April,900,400,500
May,1500,700,800
June,1300,600,700
July,1100,500,600
August,1000,400,600
September,1200,500,700
October,800,300,500
November,900,400,500
December,1500,700,800
R Codes for the dataframe:

df <- data.frame(Month = c("January", "February", "March", "April", "May", "June",
                             "July", "August", "September", "October", "November", "December"),
                   Revenue = c(1000, 1200, 800, 900, 1500, 1300, 1100, 1000, 1200, 800, 900, 1500),
                   Cost = c(400, 500, 300, 400, 700, 600, 500, 400, 500, 300, 400, 700),
                   Profit = c(600, 700, 500, 500, 800, 700, 600, 600, 700, 500, 500, 800))
 

The dataset has four columns: Month, Revenue, Cost, and Profit. We will use the Profit column to create our waterfall chart.

Next, we need to prepare the data for the waterfall chart. We will create a new data frame that contains the change in profit between each month. Here is the code to do that:

library(dplyr)

df <- df %>%
  mutate(change = c(Profit[1], diff(Profit)),
         cumulative = cumsum(change),
         start = cumulative - change)

In this code, we used the mutate function from the dplyr library to create three new columns: change, cumulative, and start. The change column is created using the diff function to calculate the difference in profit between each month. We use the cumsum function to create the cumulative column, which contains the running total of the changes in profit. Finally, we calculate the starting value of each bar by subtracting the change in profit from the cumulative total.

Step 3: Create the Waterfall Chart

Now that we have prepared the data, we can create the waterfall chart. We will use the ggplot2 library to create the chart. Here is the code to create the chart:

library(ggplot2)

ggplot(df, aes(x = Month, y = change, fill = factor(change > 0))) +
  geom_bar(stat = "identity", position = "identity", color = "black") + 
  scale_x_discrete(limits=df$Month) +
  geom_line(aes(x = Month, y = cumulative), color = "blue") +
  scale_fill_manual(values = c("#FF7F7F", "#7FBFFF")) +
  theme_bw() +
  labs(x = "Month", y = "Profit", title = "Waterfall Chart")

Let’s break down this code:

  • We used ggplot to create the plot and specify the x and y aesthetics, as well as the fill aesthetic for the bars.
  • We used geom_bar to create the bars and set the stat parameter to “identity” so that the heights of the bars correspond to the values in the change column. We also set the position parameter to “identity” to place the bars directly on the x-axis. We set the color parameter to “black” to add a border to the bars.
  • We used geom_line to create the line for the cumulative values, and mapped the x and y aesthetics to the Month and cumulative columns of the data frame, respectively. We set the color parameter to “blue” to color the line.
  • We used scale_fill_manual to set the colors of the bars based on whether they represent a positive or negative change in profit.
  • We used theme_bw to set the background to white and used labs to add labels for the x-axis, y-axis, and chart title.

Step 4: Customize the Waterfall Chart

You can customize the waterfall chart by adjusting the colors, font sizes, and other properties. Here is an example code to change the colors of the bars and the line, and increase the font size of the labels:

ggplot(df, aes(x = Month, y = change, fill = factor(change > 0))) +
geom_bar(stat = "identity", position = "identity", color = "black") +
scale_x_discrete(limits=df$Month) +
geom_line(aes(x = Month, y = cumulative), color = "#3F51B5", size = 1.5) +
scale_fill_manual(values = c("#FF7043", "#4CAF50")) +
theme_bw() +
theme(text = element_text(size = 16)) +
labs(x = "Month", y = "Profit", title = "Waterfall Chart: Profit by Month")

In this example, we have:

  • Changed the colors of the bars and the line using the scale_fill_manual and color parameters of the geom_line function, respectively.
  • Increased the size of the line using the size parameter of the geom_line function.
  • Changed the font size of the labels using the theme function and the element_text function with the size parameter.

You can further customize the chart by adjusting other properties such as the width of the bars, the placement of the axis ticks, and the font family.

And that’s it! You now know how to create a waterfall chart in R using the ggplot2 library.

Another Example of Waterfall Chart in R

Here’s another example of creating a waterfall chart in R, this time using a different dataset:

library(dplyr)
library(ggplot2)

# Create sample data
data <- data.frame(Category = c("A", "B", "C", "D", "E"),
Start = c(0, 50, 80, 100, 120),
Increase = c(40, 30, 20, 10, 50),
Decrease = c(10, 20, 10, 20, 5))

# Prepare data for waterfall chart
data <- data %>%
mutate(End = Start + Increase - Decrease) %>%
select(Category, Start, End)

# Create waterfall chart
ggplot(data, aes(x = Category, y = End - Start, fill = End - Start > 0)) +
geom_col(position = "identity", color = "black") +
geom_text(aes(label = End), vjust = ifelse(data$End - data$Start > 0, -0.5, 1.5)) +
geom_text(aes(label = Start), vjust = ifelse(data$End - data$Start > 0, 1.5, -0.5)) +
coord_flip() +
scale_fill_manual(values = c("#7FBFFF", "#FF7F7F")) +
theme_bw() +
labs(x = "", y = "Value", title = "Waterfall Chart")

In this example, we are using a sample dataset that contains data for five categories, with the starting value, increase, and decrease for each category. The process for creating the waterfall chart is as follows:

  • We prepare the data for the waterfall chart by calculating the ending value for each category based on the starting value, increase, and decrease. We then select the columns we need for the chart, which are Category, Start, and End.
  • We create the waterfall chart using the ggplot2 library. We specify the x and y aesthetics, and use geom_col to create the bars. We set the position parameter to “identity” to place the bars directly on the y-axis, and set the color parameter to “black” to add a border to the bars.
  • We use geom_text to add labels for the starting and ending values of each category. We use ifelse to determine the placement of the labels, based on whether the change in value is positive or negative.
  • We use coord_flip to rotate the chart by 90 degrees so that the categories are shown on the y-axis and the values are shown on the x-axis.
  • We use scale_fill_manual to set the colors of the bars based on whether they represent a positive or negative change in value.
  • We use theme_bw to set the background to white and used labs to add labels for the y-axis and chart title. We set the x-axis label to an empty string, since we don’t need it for this chart.

Note that the formatting options such as the font size, axis ticks, and font family can be customized to suit your needs.

If you like this article, please have a look at WACAMLDS. Thanking you very much for your time. Cheers!

 

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!