R Data Visualisation Example: Histogram by group using ggplot2

Import library

library(ggplot2)

Generate Sample Data for Plotting purposes

# Sample data
set.seed(007)

set.seed(3)
x1 <- rnorm(500)
x2 <- rnorm(500, mean = 3)
x <- c(x1, x2)
group <- c(rep("G1", 500), rep("G2", 500))

df <- data.frame(x, group = group)

head(df)
##             x group
## 1 -0.96193342    G1
## 2 -0.29252572    G1
## 3  0.25878822    G1
## 4 -1.15213189    G1
## 5  0.19578283    G1
## 6  0.03012394    G1

Basic Histogram bins and binwidth using ggplot2

Grouped histogram with geom_histogram

Fill

In order to create a histogram by group in ggplot2 you will need to input the numerical and the categorical variable inside aes and use geom_histogram as follows.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group)) + 
  geom_histogram(binwidth = 0.15) 

Colour

You can also set the categorical variable to the colour argument, so the border lines of each histogram will have a different color.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, colour = group)) + 
  geom_histogram(binwidth = 0.15) 

identity position

Setting position = “identity” is the most common use case, but recall to set a level of transparency with alpha so both histograms are completely visible.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity", binwidth = 0.15) 

dodge position

Other option is using position = “dodge”, which will add an space between each bar so you will be able to see both histograms.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(position = "dodge", binwidth = 0.15) 

Histogram by group with custom colors

Borders color

If you set fill inside aes but not colour you can change the border color of all histograms as well as its width and linetype with geom_histogram arguments.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group)) + 
  geom_histogram(colour = "blue",
                 lwd = 0.75,
                 linetype = 1,
                 position = "identity") 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Fill color

If you set colour but not fill you can change the fill color of all histograms with the fill argument of geom_histogram.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, colour = group)) + 
  geom_histogram(fill  = "white", binwidth = 0.15,
                 position = "identity") 

Custom border colors for each group

The borders color can be customized individually with scale_color_manual. If you want to use a palette you can use scale_color_brewer, for instance.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, colour = group)) + 
  geom_histogram(fill  = "white",
                 position = "identity", binwidth = 0.15) +
  scale_color_manual(values = c("blue", "orange")) 

Custom fill colors for each group

Similarly to customizing the borders color, the fill colors can be set with scale_fill_manual or any function supporting fills.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group)) + 
  geom_histogram(color = 1, alpha = 0.75,
                 position = "identity", binwidth = 0.2) +
  scale_fill_manual(values = c("#8795E8", "#FE9AD5")) 

Legend customization

Custom legend title

The legend title is the name of the column of the categorical value of the data set. You can change it with the fill and/or colour arguments of the guides function. As we are passing fill and colour to aes we are setting both or two legends will be displayed.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity", binwidth = 0.2) + 
  guides(fill = guide_legend(title = "Title"),
         colour = guide_legend(title = "Title")) 

Custom legend labels

The legend will display the names of the categorical variable by default, but you can change them with scale_color_discrete and/or scale_fill_discrete. Note that this will depend to which aes you set.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity", binwidth = 0.2) + 
  scale_color_discrete(labels = c("A", "B")) +
  scale_fill_discrete(labels = c("A", "B")) 

Legend position

The position of the legend defaults to the right, but can be changed with the legend.position component of the theme function as in the example below.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity", binwidth = 0.2) + 
  theme(legend.position = "left") 

Remove the legend

Setting position = “none” the legend will be completely removed.

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity", binwidth = 0.2) + 
  theme(legend.position = "none")