# Data Visualisation for Beginners: How to create a Waterfall Chart in R

A waterfall chart is a useful tool for visualizing how an initial value is affected by a series of intermediate values that can be either positive or negative. In this tutorial, we will explore how to create a waterfall chart in R using the `ggplot2` library.

# Step 1: Load Required Libraries

Before we start creating our waterfall chart, we need to load the required libraries. In this tutorial, we will use the following libraries:

• `ggplot2`
• `dplyr`

You can install these libraries using the following commands:

``````install.packages("ggplot2")
install.packages("dplyr")``````

# Step 2: Load and Prepare Data

We will use a sample dataset to create our waterfall chart. Here is an example dataset:

``````Month,Revenue,Cost,Profit
January,1000,400,600
February,1200,500,700
March,800,300,500
April,900,400,500
May,1500,700,800
June,1300,600,700
July,1100,500,600
August,1000,400,600
September,1200,500,700
October,800,300,500
November,900,400,500
December,1500,700,800``````
```R Codes for the dataframe:

```
``````df <- data.frame(Month = c("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"),
Revenue = c(1000, 1200, 800, 900, 1500, 1300, 1100, 1000, 1200, 800, 900, 1500),
Cost = c(400, 500, 300, 400, 700, 600, 500, 400, 500, 300, 400, 700),
Profit = c(600, 700, 500, 500, 800, 700, 600, 600, 700, 500, 500, 800))``````
` `

The dataset has four columns: `Month`, `Revenue`, `Cost`, and `Profit`. We will use the `Profit` column to create our waterfall chart.

Next, we need to prepare the data for the waterfall chart. We will create a new data frame that contains the change in profit between each month. Here is the code to do that:

``````library(dplyr)

df <- df %>%
mutate(change = c(Profit[1], diff(Profit)),
cumulative = cumsum(change),
start = cumulative - change)``````

In this code, we used the `mutate` function from the `dplyr` library to create three new columns: `change`, `cumulative`, and `start`. The `change` column is created using the `diff` function to calculate the difference in profit between each month. We use the `cumsum` function to create the `cumulative` column, which contains the running total of the changes in profit. Finally, we calculate the starting value of each bar by subtracting the change in profit from the cumulative total.

# Step 3: Create the Waterfall Chart

Now that we have prepared the data, we can create the waterfall chart. We will use the `ggplot2` library to create the chart. Here is the code to create the chart:

``````library(ggplot2)

ggplot(df, aes(x = Month, y = change, fill = factor(change > 0))) +
geom_bar(stat = "identity", position = "identity", color = "black") +
scale_x_discrete(limits=df\$Month) +
geom_line(aes(x = Month, y = cumulative), color = "blue") +
scale_fill_manual(values = c("#FF7F7F", "#7FBFFF")) +
theme_bw() +
labs(x = "Month", y = "Profit", title = "Waterfall Chart")``````

Let’s break down this code:

• We used `ggplot` to create the plot and specify the `x` and `y` aesthetics, as well as the `fill` aesthetic for the bars.
• We used `geom_bar` to create the bars and set the `stat` parameter to “identity” so that the heights of the bars correspond to the values in the `change` column. We also set the `position` parameter to “identity” to place the bars directly on the x-axis. We set the `color` parameter to “black” to add a border to the bars.
• We used `geom_line` to create the line for the cumulative values, and mapped the `x` and `y` aesthetics to the `Month` and `cumulative` columns of the data frame, respectively. We set the `color` parameter to “blue” to color the line.
• We used `scale_fill_manual` to set the colors of the bars based on whether they represent a positive or negative change in profit.
• We used `theme_bw` to set the background to white and used `labs` to add labels for the x-axis, y-axis, and chart title.

# Step 4: Customize the Waterfall Chart

You can customize the waterfall chart by adjusting the colors, font sizes, and other properties. Here is an example code to change the colors of the bars and the line, and increase the font size of the labels:

``````ggplot(df, aes(x = Month, y = change, fill = factor(change > 0))) +
geom_bar(stat = "identity", position = "identity", color = "black") +
scale_x_discrete(limits=df\$Month) +
geom_line(aes(x = Month, y = cumulative), color = "#3F51B5", size = 1.5) +
scale_fill_manual(values = c("#FF7043", "#4CAF50")) +
theme_bw() +
theme(text = element_text(size = 16)) +
labs(x = "Month", y = "Profit", title = "Waterfall Chart: Profit by Month")``````

In this example, we have:

• Changed the colors of the bars and the line using the `scale_fill_manual` and `color` parameters of the `geom_line` function, respectively.
• Increased the size of the line using the `size` parameter of the `geom_line` function.
• Changed the font size of the labels using the `theme` function and the `element_text` function with the `size` parameter.

You can further customize the chart by adjusting other properties such as the width of the bars, the placement of the axis ticks, and the font family.

And that’s it! You now know how to create a waterfall chart in R using the `ggplot2` library.

# Another Example of Waterfall Chart in R

Here’s another example of creating a waterfall chart in R, this time using a different dataset:

``````library(dplyr)
library(ggplot2)

# Create sample data
data <- data.frame(Category = c("A", "B", "C", "D", "E"),
Start = c(0, 50, 80, 100, 120),
Increase = c(40, 30, 20, 10, 50),
Decrease = c(10, 20, 10, 20, 5))

# Prepare data for waterfall chart
data <- data %>%
mutate(End = Start + Increase - Decrease) %>%
select(Category, Start, End)

# Create waterfall chart
ggplot(data, aes(x = Category, y = End - Start, fill = End - Start > 0)) +
geom_col(position = "identity", color = "black") +
geom_text(aes(label = End), vjust = ifelse(data\$End - data\$Start > 0, -0.5, 1.5)) +
geom_text(aes(label = Start), vjust = ifelse(data\$End - data\$Start > 0, 1.5, -0.5)) +
coord_flip() +
scale_fill_manual(values = c("#7FBFFF", "#FF7F7F")) +
theme_bw() +
labs(x = "", y = "Value", title = "Waterfall Chart")``````

In this example, we are using a sample dataset that contains data for five categories, with the starting value, increase, and decrease for each category. The process for creating the waterfall chart is as follows:

• We prepare the data for the waterfall chart by calculating the ending value for each category based on the starting value, increase, and decrease. We then select the columns we need for the chart, which are `Category`, `Start`, and `End`.
• We create the waterfall chart using the `ggplot2` library. We specify the `x` and `y` aesthetics, and use `geom_col` to create the bars. We set the `position` parameter to “identity” to place the bars directly on the y-axis, and set the `color` parameter to “black” to add a border to the bars.
• We use `geom_text` to add labels for the starting and ending values of each category. We use `ifelse` to determine the placement of the labels, based on whether the change in value is positive or negative.
• We use `coord_flip` to rotate the chart by 90 degrees so that the categories are shown on the y-axis and the values are shown on the x-axis.
• We use `scale_fill_manual` to set the colors of the bars based on whether they represent a positive or negative change in value.
• We use `theme_bw` to set the background to white and used `labs` to add labels for the y-axis and chart title. We set the x-axis label to an empty string, since we don’t need it for this chart.

Note that the formatting options such as the font size, axis ticks, and font family can be customized to suit your needs.

# Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

## Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

# Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

`Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.`

# Learn by Coding: Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!