R Data Visualisation Example: Box plot by group using ggplot2

Import library

library(ggplot2)

Generate Sample Data for Plotting purposes

# Sample data
set.seed(007)
y <- round(rnorm(500), 1)

df <- data.frame(y = y,
                group = sample(c("G1", "G2", "G3"), size = 500, replace = TRUE))

head(df)
##      y group
## 1  2.3    G3
## 2 -1.2    G2
## 3 -0.7    G1
## 4 -0.4    G3
## 5 -1.0    G3
## 6 -0.9    G2

Basic BOX plot by group using ggplot2

Box plot by group with geom_boxplot

In order to create a basic grouped box plot in R you need to pass the variables to aes and use the geom_boxplot geom as in the following example.

# Box plot by group
ggplot(df, aes(x = group, y = y)) + 
  geom_boxplot() 

Adding error bars with stat_boxplot

The default box plot in ggplot2 doesn’t add error bars. If you want to add them use the stat_boxplot stat and set geom = “errorbar”. The width of the bars can be customized with width argument.

# Box plot by group with error bars
ggplot(df, aes(x = group, y = y)) + 
  stat_boxplot(geom = "errorbar", # Error bars
               width = 0.25) +    # Bars width
  geom_boxplot() 

Horizontal box plot by group

The box plots can also be displayed in horizontal or landscape mode. To accomplish it you can change the order of your variables inside aes or use coord_flip, as shown above.

# Option 1: change the order of the variables

# Horizontal box plot in ggplot2
ggplot(df, aes(x = y, y = group)) + 
  stat_boxplot(geom = "errorbar",
               width = 0.25) + 
  geom_boxplot() 

# Option 2: use coord_flip

# Horizontal box plot
ggplot(df, aes(x = group, y = y)) + 
  stat_boxplot(geom = "errorbar",
               width = 0.25) + 
  geom_boxplot() +
  coord_flip() 

Color customization

If you pass the categorical variable to the fill argument of aes, each box plot will be filled with a color and a legend will be displayed.

ggplot(df, aes(x = group, y = y, fill = group)) + 
  stat_boxplot(geom = "errorbar",
               width = 0.25) + 
  geom_boxplot() 

ggplot(df, aes(x = group, y = y, fill = group)) + 
  stat_boxplot(geom = "errorbar",
               width = 0.25) + 
  geom_boxplot() + coord_flip() 

The colors or the box plots are fully customizable. In the following example we are setting a fill color for each group, changing the border color of the boxes and setting the color of the outliers to black.

# Fill colors
cols <- c("#CFD8DC", "#90A4AE", "#455A64")

ggplot(df, aes(x = group, y = y, fill = group)) + 
  stat_boxplot(geom = "errorbar",
               width = 0.25) + 
  geom_boxplot(alpha = 0.8,          # Fill transparency
               colour = "#474747",   # Border color
               outlier.colour = 1) + # Outlier color
  scale_fill_manual(values = cols)   # Fill colors 

Legend customization

Change the title

ggplot(df, aes(x = group, y = y, fill = group)) + 
  stat_boxplot(geom = "errorbar", width = 0.25) + 
  geom_boxplot() +
  guides(fill = guide_legend(title = "Group Name")) 

Change the labels

# install.packages("ggplot2")
library(ggplot2)

ggplot(df, aes(x = group, y = y, fill = group)) + 
  stat_boxplot(geom = "errorbar", width = 0.25) + 
  geom_boxplot() +
  scale_fill_hue(labels = c("A", "B", "C")) +
  guides(fill = guide_legend(title = "Group Name"))

Remove the legend

ggplot(df, aes(x = group, y = y, fill = group)) + 
  stat_boxplot(geom = "errorbar", width = 0.25) + 
  geom_boxplot() +
  theme(legend.position = "none")